Hello,

I have tested nutch-2.0 with hbase and mysql trying to index only one url with 
depth 1.

 I tried to fetch an html tag value and parse it to metadata column in webpage 
object by adding parse-tag plugin. I saw there is no metadata member variable 
in Parse class, so I used putToMetadata function from Webpage class and it 
turned  out that this function overwrites values for the same key, i.e, it 
keeps only the last tag value if there are multiple tags.   
 
Next 

bin/nutch solrindex http://127.0.0.1:8983/solr/ -all
SolrIndexerJob: starting
SolrIndexerJob: done.

I did 
1.bin/nutch inject
2.bin/nutch generate
3.bin/nutch fetch batchId
4.bin/nutch parse batchId
5.bin/nutch bin/nutch solrindex http://127.0.0.1:8983/solr/ -all

There is no data added to solr index with the url I tried to index.

Besides these, nutch-2.0 keeps content in the content column of webpage table 
if I put in the config 

  <property>
    <name>fetcher.store.content</name>
      <value>false</value>
      <description>If true, fetcher will store content.</description>
  </property>


Any ideas, what is done wrong or how to fix these issues are welcome.

Thanks.
Alex.




Reply via email to