It looks like this functionality is not in the 0.7.3 version of 
CassandraStorage. I tried to add the constructor which takes the limit to the 
class, but I ran into some Pig parsing errors, so I had to make the parameter a 
string. How did you get around this for the version of CassandraStorage in 
trunk? I'm running Pig 0.8.0.

Also, when I bump the limit up very high (e.g. 1M columns), my Cassandra starts 
eating up huge amounts of memory, maxing out my 16GB heap size. I suspect this 
is because of the get_range_slices() call from ColumnFamilyRecordReader. Are 
there plans to make this streaming/paged?


-----Original Message-----
From: Jeremy Hanna [] 
Sent: Thursday, March 24, 2011 11:34 AM
Subject: Re: pig counting question

The limit defaults to 1024 but you can set it when you use CassandraStorage in 
pig, like so:
rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096);
or whatever value you wish.

Give that a try and see if it gives you more of what you're looking for.

On Mar 24, 2011, at 1:16 PM, Jeffrey Wang wrote:

> Hey all,
> I'm trying to run a very simple Pig script against my Cassandra cluster (5 
> nodes, 0.7.3). I've gotten it all set up and working, but the script is 
> giving me some strange results. Here is my script:
> rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage();
> rowct = FOREACH rows GENERATE $0, COUNT($1);
> dump rowct;
> If I understand Pig correctly, this should output (row name, column count) 
> tuples, but I'm always seeing 1024 for the column count even though the rows 
> have highly variable number of columns. Am I missing something? Thanks.
> -Jeffrey

Reply via email to