On Tue, Nov 22, 2011 at 10:57 AM, remi tassing <[email protected]> wrote:
> Hello guys,
>
> I read the wiki on
> "HttpAuthenticationSchemes<http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
> I previously managed to make Nutch crawl local folders and websites (with
> SSL authentication). However, I'm trying to crawl some sites in a corporate
> intranet environment running under MS Sharepoint. I was unsucceful so far
> and I believe it's because of authentication.
>
>
>   - Is Nutch able to crawl Sharepoint? If yes, do you have a link/mail
>   tutorial on this?
>
>
> I was recently aware of the ManifoldCF initiative and it seems to be an
> eventual solution to my problem. But it's currently poorly documented (as
> far as Sharepoint connector is concerned).
>
>   - Do you have any recommendation on this regards?
>
>
> Thanks in advance for your help, I'll really appreciate it!
>
> --
> Remi Tassing
>

Hi Remi,

I am sorry, I was not able to reply you earlier. I have been pretty
busy this week.

I haven't ever tried crawling SharePoint with Nutch, so, I am not very
sure if it works fine. My work on authentication assumes that a
website is properly configured to challenge the client or crawler with
NTLM authentication.

In case, it doesn't work, I would suggest that you follow the "Need
Help?" section at
http://wiki.apache.org/nutch/HttpAuthenticationSchemes#Need_Help.3F
accurately and post the relevant information in [email protected]
(with me in CC possibly since I am not actively monitoring the mailing
list) and we as a community might be able to help you out.

Once again, I am sorry, I couldn't help you sooner and good luck with
this experiment.

Regards,
Susam Pal

Reply via email to