Hi,

I have successfully setup nutch 2.x with hbase-0.90.6 and my jobs are
running fine. But there is one issue for which I need your help.

Earlier I was using Cassandra with nutch 2.x and data from my all jobs were
used to go to 'webpage'  keyspace. But in case of hbase-0.90.6 I can see
there are 2 tables created , one is 'webpage' which always have 0 rows and
other is 'crawlId_webpage' and that has some data.

But when I run my solrIndexJob ,  no documents are added and I think this
is due to the face that there is no parsed text present in
'crawlId_webpage' table for my crawled pages.

I can also verify this in my ParseFilter plugin when I do Utf8 text =
page.getText();
my text is always null and thats why I think solrindexjob is not inserting
any doc to Solr.

So what should I do here ? Why I am not having any text in hbase table ?
And why there are two tables created 'webpage' & 'crawlid_webpage' ?

Thanks guys for help & support.

Tony.

Reply via email to