[jira] [Commented] (TEZ-4087) Shuffle: Fix shuffle cleanup to prevent thread leaks

2020-07-21 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162299#comment-17162299
 ] 

Jonathan Turner Eagles commented on TEZ-4087:
-

[~rajesh.balamohan], [~ashutoshc]. This commit has caused findbugs warning 
filed in TEZ-4203. Can either of you put up a patch to address?

> Shuffle: Fix shuffle cleanup to prevent thread leaks
> 
>
> Key: TEZ-4087
> URL: https://issues.apache.org/jira/browse/TEZ-4087
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: TEZ-4087.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In certain cases, Shuffle's cleanupIgnoreErrors() is not called. This leaves 
> 4 threads (inmem, diskmerger, Referee, ShuffleAndMergeRunner) run forever.
> When these are run in long running processes (e.g LLAP in Hive), they reach 
> the thread limits over time.
> Note: Root cause why cleanupIgnoreErrors() is not invoked is not yet known. I 
> will share the details when i get more details on this. Creating this ticket 
> to add additional safety knobs to ensure that thread leaks do not happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4133) key class implements writableComparable and configurable use default configuration

2020-07-21 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162114#comment-17162114
 ] 

Jonathan Turner Eagles commented on TEZ-4133:
-

+1. Will commit this change to master and branch-0.9

> key class implements writableComparable and configurable use default 
> configuration
> --
>
> Key: TEZ-4133
> URL: https://issues.apache.org/jira/browse/TEZ-4133
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: wang qiang
>Assignee: wang qiang
>Priority: Major
> Attachments: TEZ-4133-1.patch, TEZ-4133-2.patch
>
>
> key class implments writableComparable and configurable ,setConf be invoked 
> with a default configuration,it should  like 
> mapreduce(org.apache.hadoop.mapred.JobConf#getOutputKeyComparator)
> {code:java}
> //org.apache.tez.runtime.library.common.ConfigUtils#getIntermediateOutputKeyComparator
> public static  RawComparator 
> getIntermediateOutputKeyComparator(Configuration conf) {
>   Class theClass = conf.getClass(
>   TezRuntimeConfiguration.TEZ_RUNTIME_KEY_COMPARATOR_CLASS, null,
>   RawComparator.class);
>   if (theClass != null)
> return ReflectionUtils.newInstance(theClass, conf);
> // aybe change to 
> // return 
> WritableComparator.get(getIntermediateOutputKeyClass(conf).asSubclass( 
> //WritableComparable.class),conf);
>   return 
> WritableComparator.get(getIntermediateOutputKeyClass(conf).asSubclass(
>   WritableComparable.class));
> }
> public static  RawComparator 
> getIntermediateInputKeyComparator(Configuration conf) {
>   Class theClass = conf.getClass(
>   TezRuntimeConfiguration.TEZ_RUNTIME_KEY_COMPARATOR_CLASS, null,
>   RawComparator.class);
>   if (theClass != null)
> return ReflectionUtils.newInstance(theClass, conf);
> // maybe change to 
> // return 
> WritableComparator.get(getIntermediateInputKeyClass(conf).asSubclass( 
> //WritableComparable.class),conf);
>   return WritableComparator.get(getIntermediateInputKeyClass(conf).asSubclass(
>   WritableComparable.class));
> }
>  {code}
>  
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4196) Add a possibility to provide configuration when creating TezClient

2020-07-21 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161837#comment-17161837
 ] 

Peter Vary commented on TEZ-4196:
-

[~abstractdog]: TEZ-4175 seems like a perfect solution for this specific 
problem - I am not sure about the other consequences. Any timeline for that 
patch to get committed?

Thanks,

Peter

> Add a possibility to provide configuration when creating TezClient
> --
>
> Key: TEZ-4196
> URL: https://issues.apache.org/jira/browse/TEZ-4196
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
> Attachments: TEZ-4196.01.patch, TEZ-4196.02.patch
>
>
> Currently Tez/YarnConfiguration creation is not too flexible.
> It would be good to have a possibility to provide YarnConfiguration from 
> outside, and also making sure that when a TezConfiguration is created it 
> should not parse the files again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4175) Consider removing YarnConfiguration where it's possible

2020-07-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161824#comment-17161824
 ] 

László Bodor commented on TEZ-4175:
---

could you please take a look at this patch [~jeagles], [~gopalv]?

1. eliminates the doubled tez + yarn config from AMConfiguration and relies on 
single tezconf
2. eliminates the yarn conf from classes extending FrameworkClient (the only 
relevant is TezYarnClient which can use tezconf with this patch)
{code}
  public void init(TezConfiguration tezConf) {
yarnClient.init(tezConf);
  } 
{code}

I don't know the history of doubled tez/yarn confs, but is seems like this 
patch could prevent creating unnecessary conf objects

> Consider removing YarnConfiguration where it's possible
> ---
>
> Key: TEZ-4175
> URL: https://issues.apache.org/jira/browse/TEZ-4175
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4175.01.patch, TEZ-4175.02.patch, TEZ-4175.03.patch, 
> TEZ-4175.03.patch
>
>
> A comment in DAGAppmaster made me think that we don't need to rely on 
> [YarnConfiguration|https://github.com/apache/hadoop/blob/branch-3.1.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java]
>  in all cases, what if it can be replace with base Configuration...
> {code}
>   // TODO Does this really need to be a YarnConfiguration ?
>   Configuration conf = new Configuration(new YarnConfiguration());
> {code}
> In hadoop 3.1.3 source, I cannot see that it adds e.g. yarn-site as a 
> resource:
> {code}
>   public YarnConfiguration() {
> super();
>   }
>   
>   public YarnConfiguration(Configuration conf) {
> super(conf);
> if (! (conf instanceof YarnConfiguration)) {
>   this.reloadConfiguration();
> }
>   }
> {code}
> in current codebase:
> {code}
> grep -iRH "new YarnConfiguration" --include="*.java"
> tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/ATSImportTool.java:
> YarnConfiguration yarnConf = new YarnConfiguration(conf);
> tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:
> super.serviceInit(new YarnConfiguration(conf));
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/client/TestTezClient.java:
> tezClient.init(new TezConfiguration(false), new YarnConfiguration());
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:return 
> getDAGClient(appId, tezConf, new YarnConfiguration(tezConf), frameworkClient, 
> ugi);
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>   tezConf = new TezConfiguration(new YarnConfiguration());
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>tezConf = new TezConfiguration(new YarnConfiguration(this.conf));
> tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
> Configuration testConf = new YarnConfiguration(
> tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java:  
>  this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new 

[jira] [Commented] (TEZ-4196) Add a possibility to provide configuration when creating TezClient

2020-07-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161819#comment-17161819
 ] 

László Bodor commented on TEZ-4196:
---

[~pvary]: thanks for the patch, I can understand the purpose of this fix

according to your internal conversation [~rbalamohan], this indeed opens an API 
where an incorrect YarnConfiguration can be passed by the user...however, the 
getClient API (introduced in TEZ-3892) already assumes the user is aware what's 
going on with already running AM's (I know this is not a 100% valid explanation)

before merging this particular patch, please let's consider taking a look at 
TEZ-4175, as it also eliminates the overhead which is supposed to be eliminated 
by this patch (right?):
{code}
amConfig.setYarnConfiguration(new 
YarnConfiguration(amConfig.getTezConfiguration()));
{code}

> Add a possibility to provide configuration when creating TezClient
> --
>
> Key: TEZ-4196
> URL: https://issues.apache.org/jira/browse/TEZ-4196
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
> Attachments: TEZ-4196.01.patch, TEZ-4196.02.patch
>
>
> Currently Tez/YarnConfiguration creation is not too flexible.
> It would be good to have a possibility to provide YarnConfiguration from 
> outside, and also making sure that when a TezConfiguration is created it 
> should not parse the files again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)