Hi Eric,

the ability to add binary content was implemented in Nutch 1.11,
you need to upgrade (an upgrade to 1.14 is recommended).

The command-line help of
  $NUTCH_HOME/bin/nutch index
indicates how to add a Solr field with the "binary" HTML content:
  Usage: Indexer ... [-addBinaryContent] [-base64]

Best,
Sebastian

On 03/24/2018 11:31 PM, Eric Valencia wrote:
> Hello guys,
> 
> I was able to get nutch 1.4 in the most basic of basic setups - local and
> default options for the most part. While I am getting some results in Solr,
> it's not getting all the prices and variations from the pages.
> 
> Previously, I learned nutch could get all this information and the export
> is in base64, and the field it comes in under is "binaryContent".
> 
> So, I need to know how to get binaryContent or base64 results out of
> nutch.  I tried to run bin/nutch and find it there but it's giving me the
> following list (which I don't see any way from these):
> 
> readdb
> mergedb
> readlinkdb
> inject
> generate
> freegen
> fetch
> parse
> readseg
> mergesegs
> updatedb
> invertlinks
> mergelinkdb
> index
> dedup
> dump
> commoncrawldump
> solrindex
> solrdedup
> solrclean
> clean
> parsechecker
> indexchecker
> filterchecker
> normalizerchecker
> domainstats
> protocolstats
> crawlcomplete
> webgraph
> linkrank
> scoreupdater
> nodedumper
> plugin
> junit
> startserver
> webapp
> warc
> updatehostdb
> readhostdb
> sitemap
> CLASSNAME
> 
> 
> Please if any of you could let me know how it's done in 1.4 it would be
> highly appreciated.
> 
> Thank you!!
> 
> Eric
> 

Reply via email to