Ok, I have crawled again . And on checking my hbase  "scan 'C23_webpage',
{COLUMNS => ['p:c']}" I can see the parsed text but in my ParseFilter
plugin I still don't get null for page.getText().

But disturbing thing for me is that why there is blank 'webpage' table and
for every new crawlId I get new 'crawld_webpage' table ?
And my SolrIndex job was working fine with cassandra but its not working
with hbase now , is this due to different table structure in hbase, how
should I solve it ?

Thanks,
Tony


On Mon, Jun 24, 2013 at 2:39 PM, Tony Mullins <[email protected]>wrote:

> Hi,
>
> I have successfully setup nutch 2.x with hbase-0.90.6 and my jobs are
> running fine. But there is one issue for which I need your help.
>
> Earlier I was using Cassandra with nutch 2.x and data from my all jobs
> were used to go to 'webpage'  keyspace. But in case of hbase-0.90.6 I can
> see there are 2 tables created , one is 'webpage' which always have 0 rows
> and other is 'crawlId_webpage' and that has some data.
>
> But when I run my solrIndexJob ,  no documents are added and I think this
> is due to the face that there is no parsed text present in
> 'crawlId_webpage' table for my crawled pages.
>
> I can also verify this in my ParseFilter plugin when I do Utf8 text =
> page.getText();
> my text is always null and thats why I think solrindexjob is not inserting
> any doc to Solr.
>
> So what should I do here ? Why I am not having any text in hbase table ?
> And why there are two tables created 'webpage' & 'crawlid_webpage' ?
>
> Thanks guys for help & support.
>
> Tony.
>

Reply via email to