On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy <stinkym...@gmail.com> wrote: > Found the culprit. There is a new feature in Pig 0.8 that will try to > reduce the number of splits used to speed up the whole job. Since the > ColumnFamilyInputFormat lists the input size as zero, this feature > eliminates all of the splits except for one. > > The workaround is to disable this feature for jobs that use CassandraStorage > by setting -Dpig.splitCombination=false in the pig_cassandra script. > > Hope somebody finds this useful, you wouldn't believe how many dead-ends I > ran down trying to figure this out.
Ouch, thanks for tracking that down. What should CFIF be returning differently? Do you mean the InputSplit.getLength? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com