@r7raul1984, would you mind filing a documentation jira for your question. The 
list that Rajesh provided might be good to formalize into a doc and/or wiki.  
Also, please take a look at https://issues.apache.org/jira/browse/TEZ-2294 to 
see all the list of parameters. If you see something off or not clear enough, 
please add your comments to the jira. 

@Rohini,

We recently changed tez.runtime.optimize.local.fetch to true as the default 
value in master. The feature was introduced and probably kept as false 
initially as it had not been fully battle tested. 

The latter I am assuming depends on how many open connections a cluster’s setup 
can sustain and needs to be tuned in combination with 
“tez.runtime.shuffle.keep-alive.max.connections”.  Good point on whether we 
should make this true by default. Will wait for @Rajesh/@Gopal/@Sid to chime in 
and they can open a new jira if this is generally beneficial in most setups. 

thanks
— Hitesh 

On Apr 24, 2015, at 9:34 AM, Rohini Palaniswamy <[email protected]> wrote:

> Rajesh,
>    What are the problems with having tez.runtime.shuffle.keep-alive.enabled 
> and tez.runtime.optimize.local.fetch set to true always by default?
> 
> Regards,
> Rohini
> 
> On Fri, Apr 24, 2015 at 1:54 AM, Rajesh Balamohan 
> <[email protected]> wrote:
> Listing some details at very high level,
> 
> - Set "tez.task.generate.counters.per.io=true" to get more details on the 
> task counters. Basically this starts printinng the counters per edge, which 
> can be a lot more useful for debugging.
> 
> - In case you want to avoid container launches etc when you analyze for first 
> time, try hive.prewarm.enabled=true & hive.prewarm.numcontainers=<no of 
> containers you want in your sesssion to be prewarmed>
> 
> - Container reuse is enabled by default in tez. 
> (tez.am.container.idle.release-timeout-min.millis, 
> tez.am.container.idle.release-timeout-max.millis controls the amount of time 
> a container is held by AM before releasing it)
> 
> - Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check 
> task counters in the logs to find out the spills and adjust it accordingly)
> 
> - Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot 
> performant than DefaultSorter (this is the default in master branch. But if 
> you are using earlier releases, you can turn it on by setting 
> tez.runtime.sort.threads=2).
> 
> - Set tez.runtime.compress=true and set tez.runtime.compress.codec 
> (SnappyCodec is preferred, but it is upto you to choose)
> 
> - Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle 
> heavy workload. This reduces number of connections in shuffle.
> 
> - Adjust memory allocated to different inputs/outputs based on 
> tez.task.scale.memory.ratios (but this is more of expert level setting which 
> you might want to touch after nailing down any memory pressure)
> 
> - Adjusting shuffle buffers are also possible, but would advise only when you 
> nail down an issue related to shuffle/merge codepath.
> 
> - Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when 
> data is locally present)
> 
> 
> Feel free to refer to 
> https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for any 
> commonly used settings for benchmarks.
> 
> On Fri, Apr 24, 2015 at 1:52 PM, [email protected] <[email protected]> 
> wrote:
> I want to  Tuning Tez Task Performance. This Tez Task is created by Hive.  
> How to Tuning Tez Task Performance?
> Analyze performance  by Tez Task Counts  of Tez Log ? Any Suggestion?
> 
> [email protected]
> 
> 
> 
> -- 
> ~Rajesh.B
> 

Reply via email to