Hi Eric, the ability to add binary content was implemented in Nutch 1.11, you need to upgrade (an upgrade to 1.14 is recommended).
The command-line help of $NUTCH_HOME/bin/nutch index indicates how to add a Solr field with the "binary" HTML content: Usage: Indexer ... [-addBinaryContent] [-base64] Best, Sebastian On 03/24/2018 11:31 PM, Eric Valencia wrote: > Hello guys, > > I was able to get nutch 1.4 in the most basic of basic setups - local and > default options for the most part. While I am getting some results in Solr, > it's not getting all the prices and variations from the pages. > > Previously, I learned nutch could get all this information and the export > is in base64, and the field it comes in under is "binaryContent". > > So, I need to know how to get binaryContent or base64 results out of > nutch. I tried to run bin/nutch and find it there but it's giving me the > following list (which I don't see any way from these): > > readdb > mergedb > readlinkdb > inject > generate > freegen > fetch > parse > readseg > mergesegs > updatedb > invertlinks > mergelinkdb > index > dedup > dump > commoncrawldump > solrindex > solrdedup > solrclean > clean > parsechecker > indexchecker > filterchecker > normalizerchecker > domainstats > protocolstats > crawlcomplete > webgraph > linkrank > scoreupdater > nodedumper > plugin > junit > startserver > webapp > warc > updatehostdb > readhostdb > sitemap > CLASSNAME > > > Please if any of you could let me know how it's done in 1.4 it would be > highly appreciated. > > Thank you!! > > Eric >

