Hi Sebastian ,
Thanks for the update, with the default settings it's not crawling/indexing
for Microsoft office documents(ppt,word,excel etc).
For *http.content.limit* property value we already make it as
unlimited*(-1)*.
Do we need to change any kind of updates in development(AEM 6.3 is
technol
Hi,
crawling and indexing Office documents should work out-of-the-box without any
configuration changes, the plugin parse-tika is enabled by default in recent
Nutch versions. The only recommended change is to increase the content limit:
http.content.limit
65536
The length limit for downlo
Hi All,
We are trying to crawl and index ppt and msword,excel mime type documents
as part of seed url which .html page, i mean a seed url which is having
*ppt,msword,ppt* as an attachment.
ex: http://abc.com/solr-tika.html
I have added below changes to check pdf/ppt crawling, I gone through th
3 matches
Mail list logo