Hi All, I'm using Pig-0.14.0 over Tez-0.7.0 for running some basic pig scripts. I'm not able to see any performance gain using Tez. My pig scripts are taking same amount of time on mapred executionType as well.
Following are the parameters which are in mapred-site.xml and being read by Tez and I'm not able to override them even if i mention them in my tez-site.xml: tez.runtime.shuffle.merge.percent=0.66 tez.runtime.shuffle.fetch.buffer.percent=0.70 tez.runtime.io.sort.mb=256 tez.runtime.shuffle.memory.limit.percent=0.25 tez.runtime.io.sort.factor=64 tez.runtime.shuffle.connect.timeout=180000 tez.runtime.internal.sorter.class=org.apache.hadoop.util.QuickSort tez.runtime.merge.progress.records=10000 tez.runtime.compress=true tez.runtime.sort.spill.percent=0.8 tez.runtime.shuffle.ssl.enable=false tez.runtime.ifile.readahead=true tez.runtime.shuffle.parallel.copies=10 tez.runtime.ifile.readahead.bytes=4194304 tez.runtime.task.input.post-merge.buffer.percent=0.0 tez.runtime.shuffle.read.timeout=180000 tez.runtime.compress.codec=org.apache.hadoop.io.compress.SnappyCodec PFA the list of task counter. I can see a lot of data is being spilled but if i try to increase tez.runtime.io.sort.mb through mapred-site.xml then my script terminates with OOM exception. Can you please suggest what parameters i should change to improve the performance of pig using Tez? Regards, Sandeep
Task-Counter
Description: Binary data
