Is the first() being computed locally on the driver program? Maybe it's to hard 
to compute with the memory, etc available there. Take a look at the driver's 
log and see whether it has the message "Computing the requested partition 
locally". 

Matei

On Jul 22, 2014, at 12:04 PM, Nathan Kronenfeld <nkronenf...@oculusinfo.com> 
wrote:

> I was wondering if anyone could provide an explanation for the behavior I'm 
> seeing.
> 
> I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG 
> with 2 shuffles, not empty, not even terribly big - small enough that some 
> partitions could be empty.
> 
> When I run foo.first, I get workers disconnecting, and applications die
> When I run foo.mapPartitions.saveAsHadoopDataset, it works fine.
> 
> Anyone got an explanation for why that might be?
> 
>                     -Thanks, Nathan
> 
> 
> -- 
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenf...@oculusinfo.com

Reply via email to