Hi, >From what I have read on the Nutch user@ archives [1] it is possible to crawl a MS Sharepoint server which includes setting up NTLM authentication for your crawler. It is becoming a pretty major problem now the the protocol-httpclient plugin is unstable, there are Jira issues open for this.
Unfortunately as Manifold CF is in incubation status, it can only be expected that they might have not completed all documentation yet, however I advise you to try there as well, as them about the Sharepoint configuration/documentation if it is not possible for you to work with Nutch protocol-httpclient. hth [1] http://www.mail-archive.com/search?q=sharepoint&l=user%40nutch.apache.org On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]> wrote: > Hello guys, > > I read the wiki on > "HttpAuthenticationSchemes< > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>". > I previously managed to make Nutch crawl local folders and websites (with > SSL authentication). However, I'm trying to crawl some sites in a corporate > intranet environment running under MS Sharepoint. I was unsucceful so far > and I believe it's because of authentication. > > > - Is Nutch able to crawl Sharepoint? If yes, do you have a link/mail > tutorial on this? > > > I was recently aware of the ManifoldCF initiative and it seems to be an > eventual solution to my problem. But it's currently poorly documented (as > far as Sharepoint connector is concerned). > > - Do you have any recommendation on this regards? > > > Thanks in advance for your help, I'll really appreciate it! > > -- > Remi Tassing > -- *Lewis*

