We are looking at solutions for crawling and indexing documents in Sharepoint Online (Office 365) into Elasticsearch. We already use Nutch 1.14 for crawling websites and are looking to extend the solution to crawl Sharepoint as well.
Looking around on the Wiki, it seems adding a custom authentication scheme and implementing an AuthScheme interface is a path available for Nutch users. I just wanted to see if anyone has recently crawled Sharepoint content and if there are any caveats or tips to keep in mind. Thanks.

