I am grateful for the help community is giving me and I wont be able to do
it without their help.

When I was using Cassandra, it only created sinlge 'webpage' table,  if I
ran my jobs without crawlId (directly from eclipse) or with crawlId it
always used the same 'webpage' table.
This is not the case with HBase, as HBase creates a table like
'crawlId_webpage' , so what I was saying is it possible to achieve the same
behavior (Cassandra's)  with Hbase  ( to make HBase only create single
'webpage' table even if I give crawlId to my bin/crawl script ) ?

And I think this log is generated due to the same issue I mentioned above :
"Keyclass and nameclass match but mismatching table names  mappingfile
schema is 'webpage' vs actual schema 'C11_webpage' , assuming they are the
same."

And what do you meant by the "status of URLs" ?
These are the logs when I run my job for the first time ( Inject ->
generate -> fetch -> parse -> DBUpdate) and for 2 or 3 depth levels (
generate -> fetch -> parse -> DBUpdate)

I always get these
*status:    2 (status_fetched)*
fetchTime:    0
prevFetchTime:    0
fetchInterval:    0
retriesSinceFetch:    0
modifiedTime:    0
prevModifiedTime:    0
protocolStatus:    (null)


Thanks again for your help.
Tony.



On Thu, Jun 27, 2013 at 2:33 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> On Wed, Jun 26, 2013 at 4:30 AM, Tony Mullins <[email protected]
> >wrote:
>
> >
> > Is it possible to crawl with crawlId but HBase only crates 'webpage'
> table
> > without crawlId prefix , just like Cassandra does?
> >
>
> I can't understand this question Tony.
>
>
> >
> > And my other problems of DBUpdateJob's exception on some random urls and
> > repeating/mixed html of all urls present in seed.txt are also resolved
> > (disappeared) with HBase backend.
> >
>
> Good
>
>
> > Am I suppose to get proper values here or these are the expected output
> in
> > ParseFilter plugin ?
> >
> > What is the status of the URLs which have the null or 0 values for the
> fields you posted?
>
>
>
> > PS. Now I am getting correct HTML in ParseFilter with HBase backend.
> >
> > Good
>

Reply via email to