OK, I somehow assumed the plugins from 'plugin.includes' in conf/nutch-site.xml would be added to the default list in conf/nutch-default.xml, but now I suppose the former completely overrides the latter. So I copied the missing plugins from nutch-default.xml to nutch-site.xml and now it works fine.
Thanks for the quick reply! JM On 2013-09-09, at 18:16, Markus Jelsma <[email protected]> wrote: Hi - those fields are added by default by IndexerMapReduce. Fields such as content, contentLength or anchors are added by resp. index-basic, index-more and index-anchor if i'm correct. So it seems you have not enabled those indexing plugins. Can you check? Our clients get the correct data in ES using 1.7 and some indexing plugins so it seems the code is working. Cheers -----Original message----- > From:Jean-Michel Tremblay <[email protected]> > Sent: Tuesday 10th September 2013 0:03 > To: [email protected] > Subject: Nutch 1.7 and ElasticSearch: content not sent to ElasticSearch > > Hi, > > I'm using Nutch 1.7 and ElasticSearch (installed on the same CentOS machine). > > I could run a quick test using the "bin/nutch crawl …" command on > nutch.apache.org (using domain-urlfilter). I can run the ElasticSearch > indexer successfully, but all entries in ElasticSearch only have "segment", > "digest", and "boost" fields in the "_source" object. I would expect to see > "content" as well, right? > > I know some content was parsed when running "bin/nutch readseg -get <seg> > <url>. > > I see that schema.xml is used as mapping for Solr. I think ElasticSearch > doesn't need any pre-defined mapping, right? > > Logs under $NUTCH_HOME/logs are not helping (no error, 1 warning from > NativeCodeLoader). > > Am I missing some config somewhere? > > (Yes, I'm new to Nutch and ElasticSearch.) > > JM

