date:20180911

Speakers needed for Apache DC Roadshow

2018-09-11 Thread Rich Bowen

We need your help to make the Apache Washington DC Roadshow on Dec 4th a 
success.


What do we need most? Speakers!

We're bringing a unique DC flavor to this event by mixing Open Source 
Software with talks about Apache projects as well as OSS CyberSecurity, 
OSS in Government and and OSS Career advice.


Please take a look at: http://www.apachecon.com/usroadshow18/

(Note: You are receiving this message because you are subscribed to one 
or more mailing lists at The Apache Software Foundation.)


Rich, for the ApacheCon Planners

--
rbo...@apache.org
http://apachecon.com
@ApacheCon

Re: crwal and index ppt,msword,excel(xls,.xlsx) in apache nutch 1.14

2018-09-11 Thread polu.amar

Hi Sebastian ,

Thanks for the update, with the default settings it's not crawling/indexing
for Microsoft office documents(ppt,word,excel etc).

For *http.content.limit* property value we already make it as
unlimited*(-1)*.

Do we need to change any kind of updates in development(AEM 6.3 is
technology,where we are developing a page) side for office kind of
documents? or any solr side changes?

Note: I passed solr url properly(seems it's was missed in ticket) as part of
crawl script

:>*bin/crawl -i -D
solr.server.url=http://localhost:8983/solr/tikaparsecollection  -s urls/
crawl/  -1*

solr collection name: tikaparsecollection
seed.txt: http://abc.com/solr-tika.html  

Kindly, assist us on how to achieve these kind of case in nutch crawling. 


Thanks,
Amarnath Polu



--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html

Speakers needed for Apache DC Roadshow

Re: crwal and index ppt,msword,excel(xls,.xlsx) in apache nutch 1.14

2 matches

Site Navigation

Mail list logo

Footer information