[jira] [Commented] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2021-01-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271946#comment-17271946
 ] 

Hadoop QA commented on TEZ-3985:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 12s{color} 
| {color:red} tez-runtime-internals generated 2 new + 2 unchanged - 0 fixed = 4 
total (was 2) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} tez-runtime-internals: The patch generated 2 new 
+ 73 unchanged - 0 fixed = 75 total (was 73) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
47s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} tez-runtime-internals in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/94/artifact/out/Dockerfile 
|
| JIRA Issue | TEZ-3985 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13019402/TEZ-3985.6.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 587575b524a9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 7374b69ed |
| Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
| javac | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/94/artifact/out/diff-compile-javac-tez-runtime-internals.txt
 |
| checkstyle | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/94/artifact/out/diff-checkstyle-tez-runtime-internals.txt
 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/94/testReport/ |
| Max. process+thread co

[jira] [Updated] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2021-01-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3985:
--
Attachment: TEZ-3985.7.patch

> Correctness: Throw a clear exception for DMEs sent during cleanup
> -
>
> Key: TEZ-3985
> URL: https://issues.apache.org/jira/browse/TEZ-3985
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3985.1.patch, TEZ-3985.2.patch, TEZ-3985.3.patch, 
> TEZ-3985.3.patch, TEZ-3985.4.patch, TEZ-3985.5.patch, TEZ-3985.6.patch, 
> TEZ-3985.7.patch
>
>
> If a DME is sent during cleanup, that implies that the .close() of the 
> LogicalIOProcessorRuntimeTask did not succeed and therefore these events are 
> an error condition.
> These events should not be sent and more importantly should be received by 
> the AM.
> Throw a clear exception, in case of this & allow the developers to locate the 
> extraneous event from the backtrace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271964#comment-17271964
 ] 

László Bodor commented on TEZ-4271:
---

_[~rajesh.balamohan]_ 
_Have you tried adjusting tez.grouping.min/max-size instead to control the 
number of mappers being spun up?_
Yeah, the grouping size should work assuming that the customer knows data 
characteristics and can set that, but the problem is they bump into this issue 
from time to time because there is no hard limit. The solution we're trying to 
achieve is to set the hard limit by default, avoiding the problem even coming 
up and letting escalations be raised.

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.r

[jira] [Commented] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2021-01-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271970#comment-17271970
 ] 

Hadoop QA commented on TEZ-3985:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} tez-runtime-internals: The patch generated 2 new 
+ 72 unchanged - 1 fixed = 74 total (was 73) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
50s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} tez-runtime-internals in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/95/artifact/out/Dockerfile 
|
| JIRA Issue | TEZ-3985 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13019404/TEZ-3985.7.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux d69c64cb9434 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 7374b69ed |
| Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
| checkstyle | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/95/artifact/out/diff-checkstyle-tez-runtime-internals.txt
 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/95/testReport/ |
| Max. process+thread count | 257 (vs. ulimit of 5500) |
| modules | C: tez-api tez-runtime-internals U: . |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/95/console |
| versions 

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272068#comment-17272068
 ] 

Rajesh Balamohan commented on TEZ-4271:
---

"tez.grouping.by-count" is kind of experimental in nature and has not been 
battle tested. Also, setting it to much higher value (e.g 4096) can have 
adverse impact on other jobs, as it would bail out, when the original splits 
are lesser than "desired split size". This can have undesired impact on jobs 
when there is enough cluster capacity available.

There were other corner cases with columnar storage which caused skew in the 
split sizes vs actual uncompressed data. This was the reason for introducing 
TEZ-1993 (split estimator). Group by count tries to do a fair computation, but 
it also depends on the incoming data. 
[https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java#L374]
 

If the usecase has to really restrict split counts, it would be good to try 
disabling "tez.grouping.by-length" along with "tez.grouping.by-count".

Another option could be to set "tez.grouping.by-count" (& related configs) as 
corrective option from upstream and recompute the splits depending on the 
usecase (i.e when the computed splits are much higher specific value e.g 4096).

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(N

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272078#comment-17272078
 ] 

Attila Magyar commented on TEZ-4271:


??Another option could be to set "tez.grouping.by-count" (& related configs) as 
corrective option from upstream and recompute the splits depending on the 
usecase (i.e when the computed splits are much higher specific value e.g 
4096)??.

Hi [~rajesh.balamohan], are you suggesting to set "tez.grouping.by-count" on 
the Hive side or in TezSplitGrouper? In any case this would need to run the 
grouping logic, see the result count and rerun the grouping logic again with 
"tez.grouping.by-count" if I understand correctly.

 

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 

[jira] [Updated] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated TEZ-4271:
---
Description: 
raThere are multiple config parameters (like tez.grouping.min/max-size, 
tez.grouping.by-length, tez.grouping.by-count,

tez.grouping.node.local.only) that impacts the number of grouped input splits 
but there is no single property for setting an exact top limit on the desired 
count.

In Hive the max number of buckets is 4095. During an insert overwrite each 
tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
with a bucketId out of range exception.

 

When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
be easy. However when "tez.grouping.by-length" is enabled (which is the 
default) clamping desiredNumSplits is not enough since TEZ might generate a few 
more splits than the desired.

For example:
 * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
is on node0 the other 5 is on node1.
 * desiredNumSplits: 4
 * Total size: 100
 * lengthPerGroup: 100 / 4 = 25
 * group0: [node0=>10, node0=>10]
 * group1: [node1=>10, node1=>10]
 * group2: [node0=>10, node0=>10]
 * group2: [node1=>10, node1=>10]
 * group4: default-rack=>[node0=>10, node1=>10]

 

The lengthPerGroup prevents adding more than 2 splits into the group resulting 
5 groups instead of the 4 desired.

 

If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
generate 3. But we can't assume all splits have the same size (?)

We might need to detect if groupedSplits.size() is greater than desired in the 
loop, and redistribute the remaining splits across the existing groups (either 
in a round robin fashion or by selecting the smallest), instead of creating new 
groups. This might cause existing groups to be converted rackLocal groups if 
the node locality of the remaining is different then locality of the existing 
groups.

Or doing a second pass after groupedSplits is fully calculated and trying to 
merge existing groups. Either way this complicates the logic even further. At 
this point I'm not sure what would be the best. [~rajesh.balamohan], 
[~t3rmin4t0r] do you have any suggestions?
{code:java}
Error while compiling statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) 
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
 at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
 at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
 ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
Runtime Error while processing row at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
 ... 18 more Caused by: org.apache.ha

[jira] [Commented] (TEZ-4240) Remove SHA-256 from Tez

2021-01-26 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272243#comment-17272243
 ] 

Jonathan Turner Eagles commented on TEZ-4240:
-

+1. Seems like a good change to get in.

> Remove SHA-256 from Tez
> ---
>
> Key: TEZ-4240
> URL: https://issues.apache.org/jira/browse/TEZ-4240
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.9.2
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.1, 0.9.3
>
> Attachments: TEZ-4240.01.patch
>
>
> SHA-256 is being deprecated, and it's recommended to be replaced by at least 
> SHA-384. Tez uses SHA-256 for resource validation, and even if it doesn't 
> seem to be a direct vulnerability, it's better to upgrade it in order to 
> remain FIPS compliant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4240) Remove SHA-256 from Tez

2021-01-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272253#comment-17272253
 ] 

László Bodor commented on TEZ-4240:
---

thanks, I'll rebase and retrigger tests and then commit if everything is fine

> Remove SHA-256 from Tez
> ---
>
> Key: TEZ-4240
> URL: https://issues.apache.org/jira/browse/TEZ-4240
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.9.2
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 0.10.1, 0.9.3
>
> Attachments: TEZ-4240.01.patch
>
>
> SHA-256 is being deprecated, and it's recommended to be replaced by at least 
> SHA-384. Tez uses SHA-256 for resource validation, and even if it doesn't 
> seem to be a direct vulnerability, it's better to upgrade it in order to 
> remain FIPS compliant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272469#comment-17272469
 ] 

Rajesh Balamohan commented on TEZ-4271:
---

Hi [~amagyar]: yes, from hive side.

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> raThere are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>  at 
> org.apache.hadoop.hive.ql.exe