Hello Alexander, I'm considering trying your suggestion.
I have one question thought. After Webdav does the crawling and saves the files locally, does it keep the link intact? Remi On Fri, Nov 25, 2011 at 1:17 AM, Alexander Aristov < [email protected]> wrote: > hi > > one of a available solution is to set up webdav and crawl resoutses as > files e.g. file://. but it wont exclude authentication. > > > Alexander > > On 24/11/2011, Lewis John Mcgibbney <[email protected]> wrote: > > Hi Arkadi, > > > > Are you saying that this has been solved and that are successfully able > to > > crawl the server? > > > > Thanks > > > > On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote: > > > >> Hi, > >> > >> I am crawling a SharePoint server, no major problems. I do have to use > >> protocol-httpclient for this. Here is an extract from my > >> httpclient-auth.xml file, if it helps: > >> > >> <auth-configuration> > >> <credentials username="myusername" password="mypassword"> > >> <default realm="myrealm" /> > >> </credentials> > >> </auth-configuration> > >> > >> Regards, > >> > >> Arkadi > >> > >> > -----Original Message----- > >> > From: Lewis John Mcgibbney [mailto:[email protected]] > >> > Sent: Tuesday, 22 November 2011 9:43 PM > >> > To: [email protected] > >> > Subject: Re: Nutch and Sharepoint authentication > >> > > >> > Hi, > >> > > >> > From what I have read on the Nutch user@ archives [1] it is possible > to > >> > crawl a MS Sharepoint server which includes setting up NTLM > >> > authentication > >> > for your crawler. It is becoming a pretty major problem now the the > >> > protocol-httpclient plugin is unstable, there are Jira issues open for > >> > this. > >> > > >> > Unfortunately as Manifold CF is in incubation status, it can only be > >> > expected that they might have not completed all documentation yet, > >> > however > >> > I advise you to try there as well, as them about the Sharepoint > >> > configuration/documentation if it is not possible for you to work with > >> > Nutch protocol-httpclient. > >> > > >> > hth > >> > > >> > [1] > >> > http://www.mail- > >> > archive.com/search?q=sharepoint&l=user%40nutch.apache.org > >> > > >> > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]> > >> > wrote: > >> > > >> > > Hello guys, > >> > > > >> > > I read the wiki on > >> > > "HttpAuthenticationSchemes< > >> > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>". > >> > > I previously managed to make Nutch crawl local folders and websites > >> > (with > >> > > SSL authentication). However, I'm trying to crawl some sites in a > >> > corporate > >> > > intranet environment running under MS Sharepoint. I was unsucceful > so > >> > far > >> > > and I believe it's because of authentication. > >> > > > >> > > > >> > > - Is Nutch able to crawl Sharepoint? If yes, do you have a > >> > link/mail > >> > > tutorial on this? > >> > > > >> > > > >> > > I was recently aware of the ManifoldCF initiative and it seems to be > >> > an > >> > > eventual solution to my problem. But it's currently poorly > documented > >> > (as > >> > > far as Sharepoint connector is concerned). > >> > > > >> > > - Do you have any recommendation on this regards? > >> > > > >> > > > >> > > Thanks in advance for your help, I'll really appreciate it! > >> > > > >> > > -- > >> > > Remi Tassing > >> > > > >> > > >> > > >> > > >> > -- > >> > *Lewis* > >> > > > > > > > > -- > > *Lewis* > > > > > -- > Best Regards > Alexander Aristov > -- Remi Tassing

