subject:"\[jira\] \[Commented\] \(TEZ\-4271\) Add config to limit desiredNumSplits"

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Rajesh Balamohan (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272469#comment-17272469
 ] 

Rajesh Balamohan commented on TEZ-4271:
---

Hi [~amagyar]: yes, from hive side.

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> raThere are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>  at 
>

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272078#comment-17272078
 ] 

Attila Magyar commented on TEZ-4271:


??Another option could be to set "tez.grouping.by-count" (& related configs) as 
corrective option from upstream and recompute the splits depending on the 
usecase (i.e when the computed splits are much higher specific value e.g 
4096)??.

Hi [~rajesh.balamohan], are you suggesting to set "tez.grouping.by-count" on 
the Hive side or in TezSplitGrouper? In any case this would need to run the 
grouping logic, see the result count and rerun the grouping logic again with 
"tez.grouping.by-count" if I understand correctly.

 

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
>

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Rajesh Balamohan (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272068#comment-17272068
 ] 

Rajesh Balamohan commented on TEZ-4271:
---

"tez.grouping.by-count" is kind of experimental in nature and has not been 
battle tested. Also, setting it to much higher value (e.g 4096) can have 
adverse impact on other jobs, as it would bail out, when the original splits 
are lesser than "desired split size". This can have undesired impact on jobs 
when there is enough cluster capacity available.

There were other corner cases with columnar storage which caused skew in the 
split sizes vs actual uncompressed data. This was the reason for introducing 
TEZ-1993 (split estimator). Group by count tries to do a fair computation, but 
it also depends on the incoming data. 
[https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java#L374]
 

If the usecase has to really restrict split counts, it would be good to try 
disabling "tez.grouping.by-length" along with "tez.grouping.by-count".

Another option could be to set "tez.grouping.by-count" (& related configs) as 
corrective option from upstream and recompute the splits depending on the 
usecase (i.e when the computed splits are much higher specific value e.g 4096).

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271964#comment-17271964
 ] 

László Bodor commented on TEZ-4271:
---

_[~rajesh.balamohan]_ 
_Have you tried adjusting tez.grouping.min/max-size instead to control the 
number of mappers being spun up?_
Yeah, the grouping size should work assuming that the customer knows data 
characteristics and can set that, but the problem is they bump into this issue 
from time to time because there is no hard limit. The solution we're trying to 
achieve is to set the hard limit by default, avoiding the problem even coming 
up and letting escalations be raised.

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
>

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

2021-01-25 Thread Rajesh Balamohan (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271883#comment-17271883
 ] 

Rajesh Balamohan commented on TEZ-4271:
---

{{tez.grouping.split-count}} mainly helps in initializing "desired number of 
splits" to the requested value set in the config. It tries to approximate the 
number of splits to the requested value (when original splits is higher than 
desired number of splits). It is not hard bound to generate exactly the same 
split count.

Have you tried adjusting tez.grouping.min/max-size instead to control the 
number of mappers being spun up?

> Add config to limit desiredNumSplits
> 
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, 
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits 
> but there is no single property for setting an exact top limit on the desired 
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each 
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails 
> with a bucketId out of range exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would 
> be easy. However when "tez.grouping.by-length" is enabled (which is the 
> default) clamping desiredNumSplits is not enough since TEZ might generate a 
> few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 
> is on node0 the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group 
> resulting 5 groups instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would 
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in 
> the loop, and redistribute the remaining splits across the existing groups 
> (either in a round robin fashion or by selecting the smallest), instead of 
> creating new groups. This might cause existing groups to be converted 
> rackLocal groups if the node locality of the remaining is different then 
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to 
> merge existing groups. Either way this complicates the logic even further. At 
> this point I'm not sure what would be the best. [~rajesh.balamohan], 
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
>

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

[jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits

5 matches

Site Navigation

Mail list logo

Footer information