Hi Sebastian ,
Thanks for the update, with the default settings it's not crawling/indexing
for Microsoft office documents(ppt,word,excel etc).
For *http.content.limit* property value we already make it as
unlimited*(-1)*.
Do we need to change any kind of updates in development(AEM 6.3 is
technology,where we are developing a page) side for office kind of
documents? or any solr side changes?
Note: I passed solr url properly(seems it's was missed in ticket) as part of
crawl script
:>*bin/crawl -i -D
solr.server.url=http://localhost:8983/solr/tikaparsecollection -s urls/
crawl/ -1*
solr collection name: tikaparsecollection
seed.txt: http://abc.com/solr-tika.html
Kindly, assist us on how to achieve these kind of case in nutch crawling.
Thanks,
Amarnath Polu
--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html