Re: Id based crawling with nutch2.x/hbase and multiple webpage tables

Lewis John Mcgibbney Wed, 26 Jun 2013 14:35:02 -0700

On Wed, Jun 26, 2013 at 4:30 AM, Tony Mullins <[email protected]>wrote:


>
> Is it possible to crawl with crawlId but HBase only crates 'webpage' table
> without crawlId prefix , just like Cassandra does?
>

I can't understand this question Tony.


>
> And my other problems of DBUpdateJob's exception on some random urls and
> repeating/mixed html of all urls present in seed.txt are also resolved
> (disappeared) with HBase backend.
>

Good


> Am I suppose to get proper values here or these are the expected output in
> ParseFilter plugin ?
>
> What is the status of the URLs which have the null or 0 values for the
fields you posted?



> PS. Now I am getting correct HTML in ParseFilter with HBase backend.
>
> Good

Re: Id based crawling with nutch2.x/hbase and multiple webpage tables

Reply via email to