I am grateful for the help community is giving me and I wont be able to do it without their help.
When I was using Cassandra, it only created sinlge 'webpage' table, if I ran my jobs without crawlId (directly from eclipse) or with crawlId it always used the same 'webpage' table. This is not the case with HBase, as HBase creates a table like 'crawlId_webpage' , so what I was saying is it possible to achieve the same behavior (Cassandra's) with Hbase ( to make HBase only create single 'webpage' table even if I give crawlId to my bin/crawl script ) ? And I think this log is generated due to the same issue I mentioned above : "Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'C11_webpage' , assuming they are the same." And what do you meant by the "status of URLs" ? These are the logs when I run my job for the first time ( Inject -> generate -> fetch -> parse -> DBUpdate) and for 2 or 3 depth levels ( generate -> fetch -> parse -> DBUpdate) I always get these *status: 2 (status_fetched)* fetchTime: 0 prevFetchTime: 0 fetchInterval: 0 retriesSinceFetch: 0 modifiedTime: 0 prevModifiedTime: 0 protocolStatus: (null) Thanks again for your help. Tony. On Thu, Jun 27, 2013 at 2:33 AM, Lewis John Mcgibbney < [email protected]> wrote: > On Wed, Jun 26, 2013 at 4:30 AM, Tony Mullins <[email protected] > >wrote: > > > > > Is it possible to crawl with crawlId but HBase only crates 'webpage' > table > > without crawlId prefix , just like Cassandra does? > > > > I can't understand this question Tony. > > > > > > And my other problems of DBUpdateJob's exception on some random urls and > > repeating/mixed html of all urls present in seed.txt are also resolved > > (disappeared) with HBase backend. > > > > Good > > > > Am I suppose to get proper values here or these are the expected output > in > > ParseFilter plugin ? > > > > What is the status of the URLs which have the null or 0 values for the > fields you posted? > > > > > PS. Now I am getting correct HTML in ParseFilter with HBase backend. > > > > Good >

