Hi, > Here's the Tez DAG swimlane. Haven't gotten vertex.py to work.. will >send that too soon.
Pretty clear that the map-side is fine - splitting sort buffers isn't bothering this at all. We want to over-partition Reducer 7 and possibly have all of them pick the total # of reducers dynamically set hive.exec.parallel=false; -- bad idea on Tez set hive.tez.auto.reducer.parallelism=true; -- decide on total # of reducers dynamically set hive.tez.min.partition.factor=0.1; set hive.tez.max.partition.factor=10; set tez.shuffle-vertex-manager.min-src-fraction=0.9; -- slow start min (reducer counts are picked at this point) set tez.shuffle-vertex-manager.max-src-fraction=0.99; set tez.runtime.report.partition.stats=true; (experimental!! - I'm still testing this for machine failure tolerance) set tez.runtime.pipelined-shuffle.enabled=true; Cheers, Gopal