The Dachis Group (where I just came from, now at DataStax) uses pig with 
cassandra for a lot of things.  However, we weren't using the widerow 
implementation yet since wide row support is new to 1.1.x and we were on 0.7, 
then 0.8, then 1.0.x.

I think since it's new to 1.1's hadoop support, it sounds like there are some 
rough edges like you say.  But issues that are reproducible on tickets for any 
problems are much appreciated and they will get addressed.

On Oct 11, 2012, at 10:43 AM, William Oberman <ober...@civicscience.com> wrote:

> I'm wondering how many people are using cassandra + pig out there?  I 
> recently went through the effort of validating things at a much higher level 
> than I previously did(*), and found a few issues:
> https://issues.apache.org/jira/browse/CASSANDRA-4748
> https://issues.apache.org/jira/browse/CASSANDRA-4749
> https://issues.apache.org/jira/browse/CASSANDRA-4789
> 
> In general, it seems like the widerow implementation still has rough edges.  
> I'm concerned I'm not understanding why other people aren't using the 
> feature, and thus finding these problems.  Is everyone else just setting a 
> high static limit?  E.g.  LOAD 'cassandra://KEYSPACE/CF?limit=X" where X >= 
> the max size of any key?  Is everyone else using data models that result in 
> keys with # columns always less than 1024?  Do newer version of hadoop 
> consume the cassandra API in a way that work around these issues?  I'm using 
> CDH3 == hadoop 0.20.2, pig 0.8.1.
> 
> (*) I took a random subsample of 50,000 keys of my production data (approx 1M 
> total key/value pairs, some keys having only a single value and some having 
> 1000's).  I then wrote both a pig script and simple procedural version of the 
> pig script.  Then I compared the results.  Obviously I started with 
> differences, though after locally patching my code to fix the above 3 bugs 
> (though, really only two issues), I now (finally) get the same results.

Reply via email to