Hi All, I'm running hadoop streaming job over 100 GB of data on 50 node cluster. Job succeeds for the small amounts of data. But when running on 100 GB of data, I get "memory error" and "BrokenPipe " error. I have enough memory on each node.
Is there a way to increase the memory for python streaming tasks ? below are sample error logs cause:java.io.IOException: subprocess still running R/W/S=32771708/10/0 in:34752=32771708/943 [rec/s] out:0=10/943 [rec/s] minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null HOST=null USER=root HADOOP_USER=null last Hadoop input: |null| Broken pipe Any help appreciated. Thanks, Srinivas
