Re: cassandra/hadoop BulkOutputFormat failures

Brian Jeltema Mon, 17 Sep 2012 03:54:29 -0700

As suggested, it was a version-skew problem. 

Thanks.


Brian

On Sep 14, 2012, at 11:34 PM, Jeremy Hanna wrote:

> A couple of guesses:
> - are you mixing versions of Cassandra?  Streaming differences between 
> versions might throw this error.  That is, are you bulk loading with one 
> version of Cassandra into a cluster that's a different version?
> - (shot in the dark) is your cluster overwhelmed for some reason?
> 
> If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw.
> 
> Jeremy
> 
> On Sep 14, 2012, at 1:34 PM, Brian Jeltema <brian.jelt...@digitalenvoy.net> 
> wrote:
> 
>> I'm trying to do a bulk load from a Cassandra/Hadoop job using the 
>> BulkOutputFormat class.
>> It appears that the reducers are generating the SSTables, but is failing to 
>> load them into the cluster:
>> 
>> 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : 
>> attempt_201208201337_0184_r_000004_0, Status : FAILED
>> java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, 
>> /10.4.0.2, /10.4.0.1, /10.4.0.3, /10.4.0.4] 
>>       at 
>> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
>>       at 
>> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
>>       at 
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
>>       at 
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
>>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at javax.security.auth.Subject.doAs(Subject.java:396)   
>>       at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>       at org.apache.hadoop.mapred.Child.main(Child.java:249)  
>> 
>> A brief look at the BulkOutputFormat class shows that it depends on 
>> SSTableLoader. My Hadoop cluster
>> and my Cassandra cluster are co-located on the same set of machines. I 
>> haven't found any stated restrictions,
>> but does this technique only work if the Hadoop cluster is distinct from the 
>> Cassandra cluster? Any suggestions
>> on how to get past this problem?
>> 
>> Thanks in advance.
>> 
>> Brian
> 
>

Re: cassandra/hadoop BulkOutputFormat failures

Reply via email to