Hi,
I am crawling a SharePoint server, no major problems. I do have to use
protocol-httpclient for this. Here is an extract from my httpclient-auth.xml
file, if it helps:
<auth-configuration>
<credentials username="myusername" password="mypassword">
<default realm="myrealm" />
</credentials>
</auth-configuration>
Regards,
Arkadi
> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:[email protected]]
> Sent: Tuesday, 22 November 2011 9:43 PM
> To: [email protected]
> Subject: Re: Nutch and Sharepoint authentication
>
> Hi,
>
> From what I have read on the Nutch user@ archives [1] it is possible to
> crawl a MS Sharepoint server which includes setting up NTLM
> authentication
> for your crawler. It is becoming a pretty major problem now the the
> protocol-httpclient plugin is unstable, there are Jira issues open for
> this.
>
> Unfortunately as Manifold CF is in incubation status, it can only be
> expected that they might have not completed all documentation yet,
> however
> I advise you to try there as well, as them about the Sharepoint
> configuration/documentation if it is not possible for you to work with
> Nutch protocol-httpclient.
>
> hth
>
> [1]
> http://www.mail-
> archive.com/search?q=sharepoint&l=user%40nutch.apache.org
>
> On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]>
> wrote:
>
> > Hello guys,
> >
> > I read the wiki on
> > "HttpAuthenticationSchemes<
> > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
> > I previously managed to make Nutch crawl local folders and websites
> (with
> > SSL authentication). However, I'm trying to crawl some sites in a
> corporate
> > intranet environment running under MS Sharepoint. I was unsucceful so
> far
> > and I believe it's because of authentication.
> >
> >
> > - Is Nutch able to crawl Sharepoint? If yes, do you have a
> link/mail
> > tutorial on this?
> >
> >
> > I was recently aware of the ManifoldCF initiative and it seems to be
> an
> > eventual solution to my problem. But it's currently poorly documented
> (as
> > far as Sharepoint connector is concerned).
> >
> > - Do you have any recommendation on this regards?
> >
> >
> > Thanks in advance for your help, I'll really appreciate it!
> >
> > --
> > Remi Tassing
> >
>
>
>
> --
> *Lewis*