Howdy, I'm coming from cassandra, and I'm actually trying to count all columns in a column family. I believe that is similar to counting the number tuples in a bag in the lingo in the pig manual. It was harder than I expected, but I think this works: rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)}); counts = FOREACH rows GENERATE COUNT(columns); counts_in_bag = GROUP counts ALL; sum_of_bag = FOREACH counts_in_bag GENERATE SUM($1); dump sum_of_bag;
My question is: am I right that it works? I started with 3 keys having a total of 5 columns and got (5). Then I added a new key/column, and another column on an existing key and got (7). So, it seems like it's working. But, was there a better way to write it? Thanks! will