This is wrong behavior. As I said, this content is a concatenation of multiple pages instead of one page identified by a given key.
Regards, Alexey Tigarev <[email protected]> Jabber: [email protected] Skype: t__gra On Wed, Feb 20, 2013 at 2:08 AM, Lewis John Mcgibbney <[email protected]> wrote: > On Tue, Feb 19, 2013 at 3:40 PM, t_gra <[email protected]> wrote: > >> I tried skipping pages with large content size, and it figured that >> ALL my pages have content 125981292 bytes long (and probably the same >> contents). >> > > And this is okay? I don;t really understand. > > >> >> BTW, what number of version of gora-cassandra do I have to have with >> Nutch 2.1 and Cassandra API version 19.33.0 ? >> >> If you are working with Nutch 2.1 release [0], you will be working with > gora-cassandra 0.2 by default. Do not upgrade to gora-cassandra 0.2.1 as it > is buggy. > gora-cassandra 0.2 is fitted to work with Cassandra version 1.0.2 over > hector client 1.0-1 > > Lewis > > [0] http://svn.apache.org/repos/asf/nutch/tags/release-2.1/ivy/ivy.xml > [1] http://svn.apache.org/repos/asf/gora/tags/gora-0.2/pom.xml

