update (or whatever the actual name of the command is) after parsing?

On 25 June 2012 22:35, <[email protected]> wrote:

> Hello,
>
> I have tested nutch-2.0 with hbase and mysql trying to index only one url
> with depth 1.
>
>  I tried to fetch an html tag value and parse it to metadata column in
> webpage object by adding parse-tag plugin. I saw there is no metadata
> member variable in Parse class, so I used putToMetadata function from
> Webpage class and it turned  out that this function overwrites values for
> the same key, i.e, it keeps only the last tag value if there are multiple
> tags.
>
> Next
>
> bin/nutch solrindex http://127.0.0.1:8983/solr/ -all
> SolrIndexerJob: starting
> SolrIndexerJob: done.
>
> I did
> 1.bin/nutch inject
> 2.bin/nutch generate
> 3.bin/nutch fetch batchId
> 4.bin/nutch parse batchId
> 5.bin/nutch bin/nutch solrindex http://127.0.0.1:8983/solr/ -all
>
> There is no data added to solr index with the url I tried to index.
>
> Besides these, nutch-2.0 keeps content in the content column of webpage
> table if I put in the config
>
>  <property>
>    <name>fetcher.store.content</name>
>      <value>false</value>
>      <description>If true, fetcher will store content.</description>
>  </property>
>
>
> Any ideas, what is done wrong or how to fix these issues are welcome.
>
> Thanks.
> Alex.
>
>
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to