Hi Lewis, I am saying that my configuration works with our SharePoint server. The authentication scheme is NTLM. Two versions of Nutch are working: a snapshot of Nutch 1.4 in my development and Nutch 1.2 that is being used in production.
I have to admit that it took some tweaking to get authentication working. Regards, Arkadi > -----Original Message----- > From: Lewis John Mcgibbney [mailto:[email protected]] > Sent: Thursday, 24 November 2011 10:29 PM > To: [email protected] > Subject: Re: Nutch and Sharepoint authentication > > Hi Arkadi, > > Are you saying that this has been solved and that are successfully able > to > crawl the server? > > Thanks > > On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote: > > > Hi, > > > > I am crawling a SharePoint server, no major problems. I do have to > use > > protocol-httpclient for this. Here is an extract from my > > httpclient-auth.xml file, if it helps: > > > > <auth-configuration> > > <credentials username="myusername" password="mypassword"> > > <default realm="myrealm" /> > > </credentials> > > </auth-configuration> > > > > Regards, > > > > Arkadi > > > > > -----Original Message----- > > > From: Lewis John Mcgibbney [mailto:[email protected]] > > > Sent: Tuesday, 22 November 2011 9:43 PM > > > To: [email protected] > > > Subject: Re: Nutch and Sharepoint authentication > > > > > > Hi, > > > > > > From what I have read on the Nutch user@ archives [1] it is > possible to > > > crawl a MS Sharepoint server which includes setting up NTLM > > > authentication > > > for your crawler. It is becoming a pretty major problem now the the > > > protocol-httpclient plugin is unstable, there are Jira issues open > for > > > this. > > > > > > Unfortunately as Manifold CF is in incubation status, it can only > be > > > expected that they might have not completed all documentation yet, > > > however > > > I advise you to try there as well, as them about the Sharepoint > > > configuration/documentation if it is not possible for you to work > with > > > Nutch protocol-httpclient. > > > > > > hth > > > > > > [1] > > > http://www.mail- > > > archive.com/search?q=sharepoint&l=user%40nutch.apache.org > > > > > > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing > <[email protected]> > > > wrote: > > > > > > > Hello guys, > > > > > > > > I read the wiki on > > > > "HttpAuthenticationSchemes< > > > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>". > > > > I previously managed to make Nutch crawl local folders and > websites > > > (with > > > > SSL authentication). However, I'm trying to crawl some sites in a > > > corporate > > > > intranet environment running under MS Sharepoint. I was > unsucceful so > > > far > > > > and I believe it's because of authentication. > > > > > > > > > > > > - Is Nutch able to crawl Sharepoint? If yes, do you have a > > > link/mail > > > > tutorial on this? > > > > > > > > > > > > I was recently aware of the ManifoldCF initiative and it seems to > be > > > an > > > > eventual solution to my problem. But it's currently poorly > documented > > > (as > > > > far as Sharepoint connector is concerned). > > > > > > > > - Do you have any recommendation on this regards? > > > > > > > > > > > > Thanks in advance for your help, I'll really appreciate it! > > > > > > > > -- > > > > Remi Tassing > > > > > > > > > > > > > > > > -- > > > *Lewis* > > > > > > -- > *Lewis*

