Yes thanks for the feedback Arkadi.

I know this is possibly outside the scope of your work, but it would be
really great if you could add some of your experience to
http://wiki.apache.org/nutch/HttpAuthenticationSchemes

This is an area which has been unclear for some users for sometime, if you
are happy with your working implementation, your thoughts would be
extremely appreciated from the rest of the community.

Thank you, and glad to hear that things are working.

On Fri, Nov 25, 2011 at 7:16 AM, <[email protected]> wrote:

> Hi Lewis,
>
> I am saying that my configuration works with our SharePoint server. The
> authentication scheme is NTLM. Two versions of Nutch are working: a
> snapshot of Nutch 1.4 in my development and Nutch 1.2 that is being used in
> production.
>
> I have to admit that it took some tweaking to get authentication working.
>
> Regards,
>
> Arkadi
>
> > -----Original Message-----
> > From: Lewis John Mcgibbney [mailto:[email protected]]
> > Sent: Thursday, 24 November 2011 10:29 PM
> > To: [email protected]
> > Subject: Re: Nutch and Sharepoint authentication
> >
> > Hi Arkadi,
> >
> > Are you saying that this has been solved and that are successfully able
> > to
> > crawl the server?
> >
> > Thanks
> >
> > On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am crawling a SharePoint server, no major problems. I do have to
> > use
> > > protocol-httpclient for this. Here is an extract from my
> > > httpclient-auth.xml file, if it helps:
> > >
> > > <auth-configuration>
> > >  <credentials username="myusername" password="mypassword">
> > >    <default realm="myrealm" />
> > >  </credentials>
> > > </auth-configuration>
> > >
> > > Regards,
> > >
> > > Arkadi
> > >
> > > > -----Original Message-----
> > > > From: Lewis John Mcgibbney [mailto:[email protected]]
> > > > Sent: Tuesday, 22 November 2011 9:43 PM
> > > > To: [email protected]
> > > > Subject: Re: Nutch and Sharepoint authentication
> > > >
> > > > Hi,
> > > >
> > > > From what I have read on the Nutch user@ archives [1] it is
> > possible to
> > > > crawl a MS Sharepoint server which includes setting up NTLM
> > > > authentication
> > > > for your crawler. It is becoming a pretty major problem now the the
> > > > protocol-httpclient plugin is unstable, there are Jira issues open
> > for
> > > > this.
> > > >
> > > > Unfortunately as Manifold CF is in incubation status, it can only
> > be
> > > > expected that they might have not completed all documentation yet,
> > > > however
> > > > I advise you to try there as well, as them about the Sharepoint
> > > > configuration/documentation if it is not possible for you to work
> > with
> > > > Nutch protocol-httpclient.
> > > >
> > > > hth
> > > >
> > > > [1]
> > > > http://www.mail-
> > > > archive.com/search?q=sharepoint&l=user%40nutch.apache.org
> > > >
> > > > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing
> > <[email protected]>
> > > > wrote:
> > > >
> > > > > Hello guys,
> > > > >
> > > > > I read the wiki on
> > > > > "HttpAuthenticationSchemes<
> > > > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
> > > > > I previously managed to make Nutch crawl local folders and
> > websites
> > > > (with
> > > > > SSL authentication). However, I'm trying to crawl some sites in a
> > > > corporate
> > > > > intranet environment running under MS Sharepoint. I was
> > unsucceful so
> > > > far
> > > > > and I believe it's because of authentication.
> > > > >
> > > > >
> > > > >   - Is Nutch able to crawl Sharepoint? If yes, do you have a
> > > > link/mail
> > > > >   tutorial on this?
> > > > >
> > > > >
> > > > > I was recently aware of the ManifoldCF initiative and it seems to
> > be
> > > > an
> > > > > eventual solution to my problem. But it's currently poorly
> > documented
> > > > (as
> > > > > far as Sharepoint connector is concerned).
> > > > >
> > > > >   - Do you have any recommendation on this regards?
> > > > >
> > > > >
> > > > > Thanks in advance for your help, I'll really appreciate it!
> > > > >
> > > > > --
> > > > > Remi Tassing
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Lewis*
> > >
> >
> >
> >
> > --
> > *Lewis*
>



-- 
*Lewis*

Reply via email to