[jira] [Commented] (TEZ-4175) Consider removing YarnConfiguration where it's possible

2020-08-13 Thread Mustafa Iman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177362#comment-17177362
 ] 

Mustafa Iman commented on TEZ-4175:
---

Thanks [~abstractdog]

TEZ-4175.05.patch looks good to me

> Consider removing YarnConfiguration where it's possible
> ---
>
> Key: TEZ-4175
> URL: https://issues.apache.org/jira/browse/TEZ-4175
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4175.01.patch, TEZ-4175.02.patch, TEZ-4175.03.patch, 
> TEZ-4175.03.patch, TEZ-4175.04.patch, TEZ-4175.05.patch
>
>
> A comment in DAGAppmaster made me think that we don't need to rely on 
> [YarnConfiguration|https://github.com/apache/hadoop/blob/branch-3.1.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java]
>  in all cases, what if it can be replace with base Configuration...
> {code}
>   // TODO Does this really need to be a YarnConfiguration ?
>   Configuration conf = new Configuration(new YarnConfiguration());
> {code}
> In hadoop 3.1.3 source, I cannot see that it adds e.g. yarn-site as a 
> resource:
> {code}
>   public YarnConfiguration() {
> super();
>   }
>   
>   public YarnConfiguration(Configuration conf) {
> super(conf);
> if (! (conf instanceof YarnConfiguration)) {
>   this.reloadConfiguration();
> }
>   }
> {code}
> in current codebase:
> {code}
> grep -iRH "new YarnConfiguration" --include="*.java"
> tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/ATSImportTool.java:
> YarnConfiguration yarnConf = new YarnConfiguration(conf);
> tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:
> super.serviceInit(new YarnConfiguration(conf));
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/client/TestTezClient.java:
> tezClient.init(new TezConfiguration(false), new YarnConfiguration());
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:return 
> getDAGClient(appId, tezConf, new YarnConfiguration(tezConf), frameworkClient, 
> ugi);
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>   tezConf = new TezConfiguration(new YarnConfiguration());
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>tezConf = new TezConfiguration(new YarnConfiguration(this.conf));
> tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
> Configuration testConf = new YarnConfiguration(
> tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java:  
>  this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new 

[jira] [Updated] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.

2020-08-13 Thread Harish JP (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish JP updated TEZ-4223:
---
Fix Version/s: 0.9.3
   0.10.1

> Adding new jars or resources after the first DAG runs does not work.
> 
>
> Key: TEZ-4223
> URL: https://issues.apache.org/jira/browse/TEZ-4223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
> Fix For: 0.10.1, 0.9.3
>
> Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch
>
>
> If we executed DAG which needs additional jars after the first DAG is run, we 
> get ClassNotFoundException.
>  
>  
> {noformat}
> 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
> Added additional resources : 
> [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]]
>  to classpath
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hive.storage.jdbc.JdbcInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> ...
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hive.storage.jdbc.JdbcInputFormat
> at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
> at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 46 more{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4175) Consider removing YarnConfiguration where it's possible

2020-08-13 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176986#comment-17176986
 ] 

TezQA commented on TEZ-4175:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} tez-api: The patch generated 1 new + 126 
unchanged - 3 fixed = 127 total (was 129) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} tez-mapreduce: The patch generated 1 new + 16 
unchanged - 1 fixed = 17 total (was 17) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} tez-dag: The patch generated 0 new + 167 unchanged - 
2 fixed = 167 total (was 169) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} tez-tests: The patch generated 0 new + 8 unchanged - 
2 fixed = 8 total (was 10) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
50s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} tez-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
6s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 32m 
40s{color} | {color:green} tez-tests in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/522/artifact/out/Dockerfile |
| JIRA Issue | TEZ-4175 |
| JIRA Patch URL | 

[jira] [Commented] (TEZ-4213) Bound appContext executor capacity using a configurable property

2020-08-13 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176954#comment-17176954
 ] 

TezQA commented on TEZ-4213:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
20s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 25s{color} 
| {color:red} tez-dag generated 3 new + 5 unchanged - 3 fixed = 8 total (was 8) 
{color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} tez-dag: The patch generated 2 new + 88 
unchanged - 2 fixed = 90 total (was 90) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
24s{color} | {color:red} tez-dag generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
47s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
10s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/523/artifact/out/Dockerfile |
| JIRA Issue | TEZ-4213 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13009770/TEZ-4213.08.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 086f48c23923 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 8f7209fcb |
| Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
| javac | 
https://builds.apache.org/job/PreCommit-TEZ-Build/523/artifact/out/diff-compile-javac-tez-dag.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/523/artifact/out/diff-checkstyle-tez-dag.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/523/artifact/out/diff-javadoc-javadoc-tez-dag.txt
 |
|  Test Results | 

[jira] [Updated] (TEZ-4213) Bound appContext executor capacity using a configurable property

2020-08-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated TEZ-4213:

Attachment: TEZ-4213.08.patch

> Bound appContext executor capacity using a configurable property
> 
>
> Key: TEZ-4213
> URL: https://issues.apache.org/jira/browse/TEZ-4213
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: TEZ-4213.01.patch, TEZ-4213.02.patch, TEZ-4213.03.patch, 
> TEZ-4213.04.patch, TEZ-4213.05.patch, TEZ-4213.06.patch, TEZ-4213.07.patch, 
> TEZ-4213.08.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After TEZ-4170 was merged, appContext executor pool is also used by the 
> RootInputInitializerManager to speed up SplitGeneration.
> However, this executor pool currently has not capacity limit 
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L624
> The problem the occurs when generating splits for larger inputs (thousands or 
> more) is that it can could result to
> {color:red}java.lang.OutOfMemoryError{color}
> that is also reproducible with a test case.
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L130
> To avoid such errors, I propose to limit the capacity of this pool to a 
> configurable value that can be for example the number of physical cores by 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (TEZ-4207) Provide approximate number of input records to be processed in UnorderedKVInput

2020-08-13 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-4207:
-

Fix Version/s: 0.10.1
 Assignee: Rajesh Balamohan
   Resolution: Fixed

> Provide approximate number of input records to be processed in 
> UnorderedKVInput
> ---
>
> Key: TEZ-4207
> URL: https://issues.apache.org/jira/browse/TEZ-4207
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-4207.1.patch, TEZ-4207.wip.patch
>
>
> There are cases when broadcasted data is loaded into hashtable in upstream 
> applications (e.g Hive). Apps tends to predict the number of entries in the 
> hashtable diligently, but there are cases where these estimates can be very 
> complicated at compile time.
>  
> Tez can help in such cases, by providing "approximate number of input records 
> counter", to be processed in UnorderedKVInput. This is to avoid expensive 
> rehash when hashtable sizes are not estimated correctly. It would be good to 
> start with broadcast first and then to move on to unordered partitioned case 
> later.
>  
> This would help in predicting the number of entries at runtime & can get 
> better estimates for hashtable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4207) Provide approximate number of input records to be processed in UnorderedKVInput

2020-08-13 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176856#comment-17176856
 ] 

Rajesh Balamohan commented on TEZ-4207:
---

Thanks for the review [~ashutoshc]. Committed to master. 

> Provide approximate number of input records to be processed in 
> UnorderedKVInput
> ---
>
> Key: TEZ-4207
> URL: https://issues.apache.org/jira/browse/TEZ-4207
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: TEZ-4207.1.patch, TEZ-4207.wip.patch
>
>
> There are cases when broadcasted data is loaded into hashtable in upstream 
> applications (e.g Hive). Apps tends to predict the number of entries in the 
> hashtable diligently, but there are cases where these estimates can be very 
> complicated at compile time.
>  
> Tez can help in such cases, by providing "approximate number of input records 
> counter", to be processed in UnorderedKVInput. This is to avoid expensive 
> rehash when hashtable sizes are not estimated correctly. It would be good to 
> start with broadcast first and then to move on to unordered partitioned case 
> later.
>  
> This would help in predicting the number of entries at runtime & can get 
> better estimates for hashtable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4175) Consider removing YarnConfiguration where it's possible

2020-08-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176859#comment-17176859
 ] 

László Bodor commented on TEZ-4175:
---

thanks [~mustafaiman] for the review! for the same reason, I reverted back to 
YarnConfiguration in ResourceMgrDelegate and ShuffleHandler as well

> Consider removing YarnConfiguration where it's possible
> ---
>
> Key: TEZ-4175
> URL: https://issues.apache.org/jira/browse/TEZ-4175
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4175.01.patch, TEZ-4175.02.patch, TEZ-4175.03.patch, 
> TEZ-4175.03.patch, TEZ-4175.04.patch, TEZ-4175.05.patch
>
>
> A comment in DAGAppmaster made me think that we don't need to rely on 
> [YarnConfiguration|https://github.com/apache/hadoop/blob/branch-3.1.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java]
>  in all cases, what if it can be replace with base Configuration...
> {code}
>   // TODO Does this really need to be a YarnConfiguration ?
>   Configuration conf = new Configuration(new YarnConfiguration());
> {code}
> In hadoop 3.1.3 source, I cannot see that it adds e.g. yarn-site as a 
> resource:
> {code}
>   public YarnConfiguration() {
> super();
>   }
>   
>   public YarnConfiguration(Configuration conf) {
> super(conf);
> if (! (conf instanceof YarnConfiguration)) {
>   this.reloadConfiguration();
> }
>   }
> {code}
> in current codebase:
> {code}
> grep -iRH "new YarnConfiguration" --include="*.java"
> tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/ATSImportTool.java:
> YarnConfiguration yarnConf = new YarnConfiguration(conf);
> tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:
> super.serviceInit(new YarnConfiguration(conf));
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/client/TestTezClient.java:
> tezClient.init(new TezConfiguration(false), new YarnConfiguration());
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:return 
> getDAGClient(appId, tezConf, new YarnConfiguration(tezConf), frameworkClient, 
> ugi);
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>   tezConf = new TezConfiguration(new YarnConfiguration());
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>tezConf = new TezConfiguration(new YarnConfiguration(this.conf));
> tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
> Configuration testConf = new YarnConfiguration(
> tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java:  
>  this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> 

[jira] [Updated] (TEZ-4175) Consider removing YarnConfiguration where it's possible

2020-08-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TEZ-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4175:
--
Attachment: TEZ-4175.05.patch

> Consider removing YarnConfiguration where it's possible
> ---
>
> Key: TEZ-4175
> URL: https://issues.apache.org/jira/browse/TEZ-4175
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-4175.01.patch, TEZ-4175.02.patch, TEZ-4175.03.patch, 
> TEZ-4175.03.patch, TEZ-4175.04.patch, TEZ-4175.05.patch
>
>
> A comment in DAGAppmaster made me think that we don't need to rely on 
> [YarnConfiguration|https://github.com/apache/hadoop/blob/branch-3.1.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java]
>  in all cases, what if it can be replace with base Configuration...
> {code}
>   // TODO Does this really need to be a YarnConfiguration ?
>   Configuration conf = new Configuration(new YarnConfiguration());
> {code}
> In hadoop 3.1.3 source, I cannot see that it adds e.g. yarn-site as a 
> resource:
> {code}
>   public YarnConfiguration() {
> super();
>   }
>   
>   public YarnConfiguration(Configuration conf) {
> super(conf);
> if (! (conf instanceof YarnConfiguration)) {
>   this.reloadConfiguration();
> }
>   }
> {code}
> in current codebase:
> {code}
> grep -iRH "new YarnConfiguration" --include="*.java"
> tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/ATSImportTool.java:
> YarnConfiguration yarnConf = new YarnConfiguration(conf);
> tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:
> super.serviceInit(new YarnConfiguration(conf));
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/dag/api/client/rpc/TestDAGClient.java:   
>  YarnConfiguration yarnConf = new YarnConfiguration(tezConf);
> tez-api/src/test/java/org/apache/tez/client/TestTezClient.java:
> tezClient.init(new TezConfiguration(false), new YarnConfiguration());
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:
> amConfig.setYarnConfiguration(new 
> YarnConfiguration(amConfig.getTezConfiguration()));
> tez-api/src/main/java/org/apache/tez/client/TezClient.java:return 
> getDAGClient(appId, tezConf, new YarnConfiguration(tezConf), frameworkClient, 
> ugi);
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>   tezConf = new TezConfiguration(new YarnConfiguration());
> tez-tests/src/test/java/org/apache/tez/test/FaultToleranceTestRunner.java:
>tezConf = new TezConfiguration(new YarnConfiguration(this.conf));
> tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
> Configuration testConf = new YarnConfiguration(
> tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java:  
>  this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration conf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestContainerReuse.java:
> Configuration tezConf = new Configuration(new YarnConfiguration());
> 

[jira] [Commented] (TEZ-4223) Adding new jars or resources after the first DAG runs does not work.

2020-08-13 Thread Harish JP (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176787#comment-17176787
 ] 

Harish JP commented on TEZ-4223:


Sorry, I messed up and committed the older patch to master and branch-0.9. I've 
reverted both and added a new patch, to master: 
c047fde127a7ec2448c7851e89366cb3b0b03136.

> Adding new jars or resources after the first DAG runs does not work.
> 
>
> Key: TEZ-4223
> URL: https://issues.apache.org/jira/browse/TEZ-4223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
> Attachments: TEZ-4223.02.patch, TEZ-4223.03.patch, TEZ-4223.04.patch
>
>
> If we executed DAG which needs additional jars after the first DAG is run, we 
> get ClassNotFoundException.
>  
>  
> {noformat}
> 2020-08-03 13:57:14,776 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
> Added additional resources : 
> [[file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-pool-1.5.4.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/postgresql-42.2.8.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/hive-jdbc-handler-3.1.3000.7.2.2.0-73.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/mssql-jdbc-6.2.1.jre7.jar,
>  
> file:/dataroot/ycloud/yarn/nm/usercache/hive/appcache/application_1596442677646_0012/container_1596442677646_0012_01_01/commons-dbcp-1.4.jar]]
>  to classpath
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hive.storage.jdbc.JdbcInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> ...
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hive.storage.jdbc.JdbcInputFormat
> at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
> at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 46 more{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)