Hi All,

I'm running hadoop streaming job over 100 GB of data on 50 node cluster.
Job succeeds for the small amounts of data. But when running on 100 GB of
data, I get "memory error" and "BrokenPipe " error. I have enough memory on
each node.

Is there a way to increase the memory for python streaming tasks ?

below are sample error logs

cause:java.io.IOException: subprocess still running
R/W/S=32771708/10/0 in:34752=32771708/943 [rec/s] out:0=10/943 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=root
HADOOP_USER=null
last Hadoop input: |null|
Broken pipe


Any help appreciated.

Thanks,
Srinivas

Reply via email to