Sorry for sending this out again, but this issue is rather pressing and the internet seems to be failing me...
Thank you guys! Dan F. ---------- Forwarded message ---------- From: Dan Feldman <[email protected]> Date: Fri, Mar 23, 2012 at 6:24 PM Subject: Slice columns by TimeUUID while loading to Pig To: [email protected] Hi everyone, I have a Cassandra SCF where each super column has a name which is dynamically assigned as TimeUUID at the time that that super column was inserted into the database. Now, I'm trying to write a Pig script that would automatically calculate the number of new super columns added to the database during specified period of time (let's say, in the last hour). For that, I thought it would be nice to be able to do something along the lines of: last_hour_data = LOAD 'cassandra://Keyspace/ColumnFamily&slice_start=Time(one hour ago)&slice_end=Time(now)' USING CassandraStorage()... However, 1) I'm not sure what that "Time(one hour ago)" and "Time(now)" syntax is (so that it would translate those times into TimeUUIDs that cassandra understands) and 2) The LOAD line above that I took from the bottom of http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig/README.txtproduces an error thinking that 'CF&slice_start...' is one gigantic column family name (which of course does not exist). Alternatively, I could try generating my specified range of columns in Pig after loading the whole database. But looking at the data, the super column names look like 'S.?,uF? ?B#q' or ' ??VuI??-gFd?' instead of "normal-looking" UUIDs like '275564bc4f52f81573b4cfe0ea615ae0', even when I try to load the super column names as chararrays. I'm thinking it's because the latter representation of UUID differs from its string representation, but is there a way to load it into Pig the "normal-looking" way? Thank you in advance for your time! Dan F.
