> Can you read your db and see if there are any pages pending a fetch?

After inject

[default@webpage] list f;
Using default limit of 100
-------------------
RowKey: 6c742e62616c7361732e7777773a687474702f
=> (column=6669, value=00278d00, timestamp=1348032953800000)
=> (column=73, value=3f800000, timestamp=1348032953802000)
=> (column=7473, value=00000139dd066f6b, timestamp=1348032953798000)
-------------------
RowKey: 6c742e6c72797461732e7777773a687474702f
=> (column=6669, value=00278d00, timestamp=1348032953811000)
=> (column=73, value=3f800000, timestamp=1348032953814000)
=> (column=7473, value=00000139dd066f6b, timestamp=1348032953809000)
-------------------
RowKey: 6c742e31356d696e2e7777773a687474702f
=> (column=6669, value=00278d00, timestamp=1348032953787000)
=> (column=73, value=3f800000, timestamp=1348032953789000)
=> (column=7473, value=00000139dd066f6b, timestamp=1348032953785000)
-------------------
RowKey: 6c742e64656c66692e7777773a687474702f
=> (column=6669, value=00278d00, timestamp=1348032953749000)
=> (column=73, value=3f800000, timestamp=1348032953752000)
=> (column=7473, value=00000139dd066f6b, timestamp=1348032953656000)

4 Rows Returned.

Then after fetch


Very very long sequence of this .......
d3e3c2f6469763e0a3c2f6469763e3c212d2d2064656c666920636f6e7461696e6572207772617070657220626567696e202d2d3e0a0a0a202020200a3c2f626f64793e0a3c2f68746d6c3e0a0a,
timestamp=1347972430537000)
=> (column=6669, value=00278d00, timestamp=1347972384062000)
=> (column=707473, value=00000139d96a3c7a, timestamp=1347972430534000)
=> (column=73, value=3f800000, timestamp=1347972384065000)
=> (column=7374, value=00000002, timestamp=1347972430531000)
=> (column=7473, value=0000013b0e6877e8, timestamp=1347974904068000)
=> (column=747970, value=6170706c69636174696f6e2f7868746d6c2b786d6c,
timestamp=1347972430640000)
4 Rows Returned.
Elapsed time: 10255 msec(s).

parse and list p returns similar very long sequence of bite codes.

updatedb apparently no changes.

Then starting new generate, fetch, parse iteration

list f


....02020200a3c2f626f64793e0a3c2f68746d6c3e0a0a, timestamp=1348033056939000)
=> (column=6669, value=00278d00, timestamp=1348032953749000)
=> (column=707473, value=00000139dd066f6b, timestamp=1348033056934000)
=> (column=73, value=3f800000, timestamp=1348032953752000)
=> (column=7374, value=00000002, timestamp=1348033056931000)
=> (column=7473, value=0000013a7786c6be, timestamp=1348033211184000)
=> (column=747970, value=6170706c69636174696f6e2f7868746d6c2b786d6c,
timestamp=1348033056949000)

4 Rows Returned.
Elapsed time: 11825 msec(s).


Also I have added those jar's to nutch lib, maybe versions are not right?

cassandra-all-1.1.2.jar
cassandra-thrift-1.1.2.jar
gora-core-0.2.1.jar
gora-cassandra-0.2.1.jar
hector-core-1.1-0.jar
thrift-0.2.0.jar (not needed I think, libtrift has all what is necessary)
libthrift-0.7.0.jar

cassandra -v
1.0.11
That's a bit strange for I have downloaded v1.1.5 (also tried the one which
installs via aptitude on ubuntu)


On Tue, Sep 18, 2012 at 5:16 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi,
>
> On Tue, Sep 18, 2012 at 2:34 PM, Žygimantas Medelis <[email protected]>
> wrote:
>
> > Commands I am issuing
> >
>
> Can you read your db and see if there are any pages pending a fetch?
>
> >
> > Also I was getting NullPointerException on inject before
> > changing conf/gora-cassandra-mapping.xml
> > from:  <class keyClass="java.lang.String"
> > name="org.apache.nutch.storage.WebPage">
> > to: <class keyClass="java.lang.String"
> > name="org.apache.nutch.storage.WebPage" keyspace="webpage">
>
> I've now fixed this in the 2.x branch. Thank you for reporting
>

Reply via email to