Hi Arkadi, Are you saying that this has been solved and that are successfully able to crawl the server?
Thanks On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote: > Hi, > > I am crawling a SharePoint server, no major problems. I do have to use > protocol-httpclient for this. Here is an extract from my > httpclient-auth.xml file, if it helps: > > <auth-configuration> > <credentials username="myusername" password="mypassword"> > <default realm="myrealm" /> > </credentials> > </auth-configuration> > > Regards, > > Arkadi > > > -----Original Message----- > > From: Lewis John Mcgibbney [mailto:[email protected]] > > Sent: Tuesday, 22 November 2011 9:43 PM > > To: [email protected] > > Subject: Re: Nutch and Sharepoint authentication > > > > Hi, > > > > From what I have read on the Nutch user@ archives [1] it is possible to > > crawl a MS Sharepoint server which includes setting up NTLM > > authentication > > for your crawler. It is becoming a pretty major problem now the the > > protocol-httpclient plugin is unstable, there are Jira issues open for > > this. > > > > Unfortunately as Manifold CF is in incubation status, it can only be > > expected that they might have not completed all documentation yet, > > however > > I advise you to try there as well, as them about the Sharepoint > > configuration/documentation if it is not possible for you to work with > > Nutch protocol-httpclient. > > > > hth > > > > [1] > > http://www.mail- > > archive.com/search?q=sharepoint&l=user%40nutch.apache.org > > > > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]> > > wrote: > > > > > Hello guys, > > > > > > I read the wiki on > > > "HttpAuthenticationSchemes< > > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>". > > > I previously managed to make Nutch crawl local folders and websites > > (with > > > SSL authentication). However, I'm trying to crawl some sites in a > > corporate > > > intranet environment running under MS Sharepoint. I was unsucceful so > > far > > > and I believe it's because of authentication. > > > > > > > > > - Is Nutch able to crawl Sharepoint? If yes, do you have a > > link/mail > > > tutorial on this? > > > > > > > > > I was recently aware of the ManifoldCF initiative and it seems to be > > an > > > eventual solution to my problem. But it's currently poorly documented > > (as > > > far as Sharepoint connector is concerned). > > > > > > - Do you have any recommendation on this regards? > > > > > > > > > Thanks in advance for your help, I'll really appreciate it! > > > > > > -- > > > Remi Tassing > > > > > > > > > > > -- > > *Lewis* > -- *Lewis*

