Its the problem with gora v0.2.1 which does not work with current nutch 2.
Have also tested with sql store also fails.
Changing dependency to gora v0.2 and rebuilding solves the problem



On Wed, Sep 19, 2012 at 9:07 AM, Žygimantas Medelis <[email protected]>wrote:

>
> > Can you read your db and see if there are any pages pending a fetch?
>
> After inject
>
> [default@webpage] list f;
> Using default limit of 100
> -------------------
> RowKey: 6c742e62616c7361732e7777773a687474702f
> => (column=6669, value=00278d00, timestamp=1348032953800000)
> => (column=73, value=3f800000, timestamp=1348032953802000)
> => (column=7473, value=00000139dd066f6b, timestamp=1348032953798000)
> -------------------
> RowKey: 6c742e6c72797461732e7777773a687474702f
> => (column=6669, value=00278d00, timestamp=1348032953811000)
> => (column=73, value=3f800000, timestamp=1348032953814000)
> => (column=7473, value=00000139dd066f6b, timestamp=1348032953809000)
> -------------------
> RowKey: 6c742e31356d696e2e7777773a687474702f
> => (column=6669, value=00278d00, timestamp=1348032953787000)
> => (column=73, value=3f800000, timestamp=1348032953789000)
> => (column=7473, value=00000139dd066f6b, timestamp=1348032953785000)
> -------------------
> RowKey: 6c742e64656c66692e7777773a687474702f
> => (column=6669, value=00278d00, timestamp=1348032953749000)
> => (column=73, value=3f800000, timestamp=1348032953752000)
> => (column=7473, value=00000139dd066f6b, timestamp=1348032953656000)
>
> 4 Rows Returned.
>
> Then after fetch
>
>
> Very very long sequence of this .......
> d3e3c2f6469763e0a3c2f6469763e3c212d2d2064656c666920636f6e7461696e6572207772617070657220626567696e202d2d3e0a0a0a202020200a3c2f626f64793e0a3c2f68746d6c3e0a0a,
> timestamp=1347972430537000)
> => (column=6669, value=00278d00, timestamp=1347972384062000)
> => (column=707473, value=00000139d96a3c7a, timestamp=1347972430534000)
> => (column=73, value=3f800000, timestamp=1347972384065000)
> => (column=7374, value=00000002, timestamp=1347972430531000)
> => (column=7473, value=0000013b0e6877e8, timestamp=1347974904068000)
> => (column=747970, value=6170706c69636174696f6e2f7868746d6c2b786d6c,
> timestamp=1347972430640000)
> 4 Rows Returned.
> Elapsed time: 10255 msec(s).
>
> parse and list p returns similar very long sequence of bite codes.
>
> updatedb apparently no changes.
>
> Then starting new generate, fetch, parse iteration
>
> list f
>
>
> ....02020200a3c2f626f64793e0a3c2f68746d6c3e0a0a,
> timestamp=1348033056939000)
> => (column=6669, value=00278d00, timestamp=1348032953749000)
> => (column=707473, value=00000139dd066f6b, timestamp=1348033056934000)
> => (column=73, value=3f800000, timestamp=1348032953752000)
> => (column=7374, value=00000002, timestamp=1348033056931000)
> => (column=7473, value=0000013a7786c6be, timestamp=1348033211184000)
> => (column=747970, value=6170706c69636174696f6e2f7868746d6c2b786d6c,
> timestamp=1348033056949000)
>
> 4 Rows Returned.
> Elapsed time: 11825 msec(s).
>
>
> Also I have added those jar's to nutch lib, maybe versions are not right?
>
> cassandra-all-1.1.2.jar
> cassandra-thrift-1.1.2.jar
> gora-core-0.2.1.jar
> gora-cassandra-0.2.1.jar
> hector-core-1.1-0.jar
> thrift-0.2.0.jar (not needed I think, libtrift has all what is necessary)
> libthrift-0.7.0.jar
>
> cassandra -v
> 1.0.11
> That's a bit strange for I have downloaded v1.1.5 (also tried the one
> which installs via aptitude on ubuntu)
>
>
> On Tue, Sep 18, 2012 at 5:16 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Hi,
>>
>> On Tue, Sep 18, 2012 at 2:34 PM, Žygimantas Medelis <[email protected]>
>> wrote:
>>
>> > Commands I am issuing
>> >
>>
>> Can you read your db and see if there are any pages pending a fetch?
>>
>> >
>> > Also I was getting NullPointerException on inject before
>> > changing conf/gora-cassandra-mapping.xml
>> > from:  <class keyClass="java.lang.String"
>> > name="org.apache.nutch.storage.WebPage">
>> > to: <class keyClass="java.lang.String"
>> > name="org.apache.nutch.storage.WebPage" keyspace="webpage">
>>
>> I've now fixed this in the 2.x branch. Thank you for reporting
>>
>
>

Reply via email to