Slice columns by TimeUUID while loading to Pig

Dan Feldman Fri, 23 Mar 2012 18:25:10 -0700

Hi everyone,

I have a Cassandra SCF where each super column has a name which is
dynamically assigned as TimeUUID at the time that that super column was
inserted into the database:


create column family CF
  with key_validation_class = UTF8Type
  and comparator = TimeUUIDType
  and subcomparator = UTF8Type
  and column_type = 'Super';

Now, I'm trying to write a Pig script that would automatically calculate
the number of new super columns added to the database during specified
period of time (let's say, in the last hour). For that, I thought it would
be nice to be able to do something along the lines of:

last_hour_data = LOAD
'cassandra://Keyspace/ColumnFamily&slice_start=Time(one hour
ago)&slice_end=Time(now)' USING CassandraStorage()...

However,
1) I'm not sure what that "Time(one hour ago)" and "Time(now)" syntax is
(so that it would translate those times into TimeUUIDs that cassandra
understands) and
2) The LOAD line above that I took from the bottom of
http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig/README.txtproduces
an error thinking that 'CF&slice_start...' is one gigantic column
family name (which of course does not exist).


Alternatively, I could try generating my specified range of columns in Pig
after loading the whole database. But looking at the data, the super column
names look like 'S.?,uF?    ?B#q'    or    '    ??VuI??-gFd?' instead of
"normal-looking" UUIDs like '275564bc4f52f81573b4cfe0ea615ae0', even when I
try to load the super column names as chararrays. I'm thinking it's because
the latter representation of UUID differs from its string representation,
but is there a way to load it into Pig the "normal-looking" way?


Thank you in advance for your time!
Dan F.

Slice columns by TimeUUID while loading to Pig

Reply via email to