Hi Arkadi,

Are you saying that this has been solved and that are successfully able to
crawl the server?

Thanks

On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote:

> Hi,
>
> I am crawling a SharePoint server, no major problems. I do have to use
> protocol-httpclient for this. Here is an extract from my
> httpclient-auth.xml file, if it helps:
>
> <auth-configuration>
>  <credentials username="myusername" password="mypassword">
>    <default realm="myrealm" />
>  </credentials>
> </auth-configuration>
>
> Regards,
>
> Arkadi
>
> > -----Original Message-----
> > From: Lewis John Mcgibbney [mailto:[email protected]]
> > Sent: Tuesday, 22 November 2011 9:43 PM
> > To: [email protected]
> > Subject: Re: Nutch and Sharepoint authentication
> >
> > Hi,
> >
> > From what I have read on the Nutch user@ archives [1] it is possible to
> > crawl a MS Sharepoint server which includes setting up NTLM
> > authentication
> > for your crawler. It is becoming a pretty major problem now the the
> > protocol-httpclient plugin is unstable, there are Jira issues open for
> > this.
> >
> > Unfortunately as Manifold CF is in incubation status, it can only be
> > expected that they might have not completed all documentation yet,
> > however
> > I advise you to try there as well, as them about the Sharepoint
> > configuration/documentation if it is not possible for you to work with
> > Nutch protocol-httpclient.
> >
> > hth
> >
> > [1]
> > http://www.mail-
> > archive.com/search?q=sharepoint&l=user%40nutch.apache.org
> >
> > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]>
> > wrote:
> >
> > > Hello guys,
> > >
> > > I read the wiki on
> > > "HttpAuthenticationSchemes<
> > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
> > > I previously managed to make Nutch crawl local folders and websites
> > (with
> > > SSL authentication). However, I'm trying to crawl some sites in a
> > corporate
> > > intranet environment running under MS Sharepoint. I was unsucceful so
> > far
> > > and I believe it's because of authentication.
> > >
> > >
> > >   - Is Nutch able to crawl Sharepoint? If yes, do you have a
> > link/mail
> > >   tutorial on this?
> > >
> > >
> > > I was recently aware of the ManifoldCF initiative and it seems to be
> > an
> > > eventual solution to my problem. But it's currently poorly documented
> > (as
> > > far as Sharepoint connector is concerned).
> > >
> > >   - Do you have any recommendation on this regards?
> > >
> > >
> > > Thanks in advance for your help, I'll really appreciate it!
> > >
> > > --
> > > Remi Tassing
> > >
> >
> >
> >
> > --
> > *Lewis*
>



-- 
*Lewis*

Reply via email to