hi one of a available solution is to set up webdav and crawl resoutses as files e.g. file://. but it wont exclude authentication.
Alexander On 24/11/2011, Lewis John Mcgibbney <[email protected]> wrote: > Hi Arkadi, > > Are you saying that this has been solved and that are successfully able to > crawl the server? > > Thanks > > On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote: > >> Hi, >> >> I am crawling a SharePoint server, no major problems. I do have to use >> protocol-httpclient for this. Here is an extract from my >> httpclient-auth.xml file, if it helps: >> >> <auth-configuration> >> <credentials username="myusername" password="mypassword"> >> <default realm="myrealm" /> >> </credentials> >> </auth-configuration> >> >> Regards, >> >> Arkadi >> >> > -----Original Message----- >> > From: Lewis John Mcgibbney [mailto:[email protected]] >> > Sent: Tuesday, 22 November 2011 9:43 PM >> > To: [email protected] >> > Subject: Re: Nutch and Sharepoint authentication >> > >> > Hi, >> > >> > From what I have read on the Nutch user@ archives [1] it is possible to >> > crawl a MS Sharepoint server which includes setting up NTLM >> > authentication >> > for your crawler. It is becoming a pretty major problem now the the >> > protocol-httpclient plugin is unstable, there are Jira issues open for >> > this. >> > >> > Unfortunately as Manifold CF is in incubation status, it can only be >> > expected that they might have not completed all documentation yet, >> > however >> > I advise you to try there as well, as them about the Sharepoint >> > configuration/documentation if it is not possible for you to work with >> > Nutch protocol-httpclient. >> > >> > hth >> > >> > [1] >> > http://www.mail- >> > archive.com/search?q=sharepoint&l=user%40nutch.apache.org >> > >> > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]> >> > wrote: >> > >> > > Hello guys, >> > > >> > > I read the wiki on >> > > "HttpAuthenticationSchemes< >> > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>". >> > > I previously managed to make Nutch crawl local folders and websites >> > (with >> > > SSL authentication). However, I'm trying to crawl some sites in a >> > corporate >> > > intranet environment running under MS Sharepoint. I was unsucceful so >> > far >> > > and I believe it's because of authentication. >> > > >> > > >> > > - Is Nutch able to crawl Sharepoint? If yes, do you have a >> > link/mail >> > > tutorial on this? >> > > >> > > >> > > I was recently aware of the ManifoldCF initiative and it seems to be >> > an >> > > eventual solution to my problem. But it's currently poorly documented >> > (as >> > > far as Sharepoint connector is concerned). >> > > >> > > - Do you have any recommendation on this regards? >> > > >> > > >> > > Thanks in advance for your help, I'll really appreciate it! >> > > >> > > -- >> > > Remi Tassing >> > > >> > >> > >> > >> > -- >> > *Lewis* >> > > > > -- > *Lewis* > -- Best Regards Alexander Aristov

