Sorry for sending this out again, but this issue is rather pressing and the
internet seems to be failing me...

Thank you guys!
Dan F.

---------- Forwarded message ----------
From: Dan Feldman <[email protected]>
Date: Fri, Mar 23, 2012 at 6:24 PM
Subject: Slice columns by TimeUUID while loading to Pig
To: [email protected]


Hi everyone,

I have a Cassandra SCF where each super column has a name which is
dynamically assigned as TimeUUID at the time that that super column was
inserted into the database.

Now, I'm trying to write a Pig script that would automatically calculate
the number of new super columns added to the database during specified
period of time (let's say, in the last hour). For that, I thought it would
be nice to be able to do something along the lines of:

last_hour_data = LOAD
'cassandra://Keyspace/ColumnFamily&slice_start=Time(one hour
ago)&slice_end=Time(now)' USING CassandraStorage()...

However,
1) I'm not sure what that "Time(one hour ago)" and "Time(now)" syntax is
(so that it would translate those times into TimeUUIDs that cassandra
understands) and
2) The LOAD line above that I took from the bottom of
http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig/README.txtproduces
an error thinking that 'CF&slice_start...' is one gigantic column
family name (which of course does not exist).

Alternatively, I could try generating my specified range of columns in Pig
after loading the whole database. But looking at the data, the super column
names look like 'S.?,uF?    ?B#q'    or    '    ??VuI??-gFd?' instead of
"normal-looking" UUIDs like '275564bc4f52f81573b4cfe0ea615ae0', even when I
try to load the super column names as chararrays. I'm thinking it's because
the latter representation of UUID differs from its string representation,
but is there a way to load it into Pig the "normal-looking" way?


Thank you in advance for your time!
Dan F.

Reply via email to