Hi Lewis,

I am saying that my configuration works with our SharePoint server. The 
authentication scheme is NTLM. Two versions of Nutch are working: a snapshot of 
Nutch 1.4 in my development and Nutch 1.2 that is being used in production.

I have to admit that it took some tweaking to get authentication working. 

Regards,

Arkadi

> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:[email protected]]
> Sent: Thursday, 24 November 2011 10:29 PM
> To: [email protected]
> Subject: Re: Nutch and Sharepoint authentication
> 
> Hi Arkadi,
> 
> Are you saying that this has been solved and that are successfully able
> to
> crawl the server?
> 
> Thanks
> 
> On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote:
> 
> > Hi,
> >
> > I am crawling a SharePoint server, no major problems. I do have to
> use
> > protocol-httpclient for this. Here is an extract from my
> > httpclient-auth.xml file, if it helps:
> >
> > <auth-configuration>
> >  <credentials username="myusername" password="mypassword">
> >    <default realm="myrealm" />
> >  </credentials>
> > </auth-configuration>
> >
> > Regards,
> >
> > Arkadi
> >
> > > -----Original Message-----
> > > From: Lewis John Mcgibbney [mailto:[email protected]]
> > > Sent: Tuesday, 22 November 2011 9:43 PM
> > > To: [email protected]
> > > Subject: Re: Nutch and Sharepoint authentication
> > >
> > > Hi,
> > >
> > > From what I have read on the Nutch user@ archives [1] it is
> possible to
> > > crawl a MS Sharepoint server which includes setting up NTLM
> > > authentication
> > > for your crawler. It is becoming a pretty major problem now the the
> > > protocol-httpclient plugin is unstable, there are Jira issues open
> for
> > > this.
> > >
> > > Unfortunately as Manifold CF is in incubation status, it can only
> be
> > > expected that they might have not completed all documentation yet,
> > > however
> > > I advise you to try there as well, as them about the Sharepoint
> > > configuration/documentation if it is not possible for you to work
> with
> > > Nutch protocol-httpclient.
> > >
> > > hth
> > >
> > > [1]
> > > http://www.mail-
> > > archive.com/search?q=sharepoint&l=user%40nutch.apache.org
> > >
> > > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing
> <[email protected]>
> > > wrote:
> > >
> > > > Hello guys,
> > > >
> > > > I read the wiki on
> > > > "HttpAuthenticationSchemes<
> > > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
> > > > I previously managed to make Nutch crawl local folders and
> websites
> > > (with
> > > > SSL authentication). However, I'm trying to crawl some sites in a
> > > corporate
> > > > intranet environment running under MS Sharepoint. I was
> unsucceful so
> > > far
> > > > and I believe it's because of authentication.
> > > >
> > > >
> > > >   - Is Nutch able to crawl Sharepoint? If yes, do you have a
> > > link/mail
> > > >   tutorial on this?
> > > >
> > > >
> > > > I was recently aware of the ManifoldCF initiative and it seems to
> be
> > > an
> > > > eventual solution to my problem. But it's currently poorly
> documented
> > > (as
> > > > far as Sharepoint connector is concerned).
> > > >
> > > >   - Do you have any recommendation on this regards?
> > > >
> > > >
> > > > Thanks in advance for your help, I'll really appreciate it!
> > > >
> > > > --
> > > > Remi Tassing
> > > >
> > >
> > >
> > >
> > > --
> > > *Lewis*
> >
> 
> 
> 
> --
> *Lewis*

Reply via email to