We are looking at solutions for crawling and indexing documents in
Sharepoint Online (Office 365) into Elasticsearch. We already use Nutch
1.14 for crawling websites and are looking to extend the solution to crawl
Sharepoint as well.

Looking around on the Wiki, it seems adding a custom authentication scheme
and implementing an AuthScheme interface is a path available for Nutch
users.

I just wanted to see if anyone has recently crawled Sharepoint content and
if there are any caveats or tips to keep in mind.

Thanks.

Reply via email to