[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875389#comment-17875389 ] Chenyu Zheng commented on TEZ-4542: --- [~abstractdog] [~glapark] [~yigress] I submit [https://github.com/apache/tez/pull/367.] to try to fix this problem in another way, will solve the problem described in TEZ-4577. As for the previous discussion of a record particularly big problem, we will discuss again, first fix the problem of TEZ-4577. What about you? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875368#comment-17875368 ] Chenyu Zheng commented on TEZ-4542: --- [~glapark] [~abstractdog] If revert this patch, we may still have this problem. Consider an extreme case where the size of one particular record is particularly large, and the other records are normal. If we use below code, metasize will still be small. I think maybe we need to delete the optimization code about metasize size. {code:java} if(capacity < (metasize+dataSize)) { // try to allocate less meta space, because we have sample data metasize = METASIZE*(capacity/(perItem+METASIZE)); } {code} We can delete these code, even though may wast more memory. Or we can set a minimum value for metasize. [~rbalamohan] Can you give us some advice? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875356#comment-17875356 ] Chenyu Zheng commented on TEZ-4542: --- [~glapark] Do you have any performance test result after revert this patch? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875355#comment-17875355 ] Chenyu Zheng commented on TEZ-4542: --- [~glapark] OK, Let's revert this patch first, and then solve this problem in other ways. cc [~abstractdog] > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875338#comment-17875338 ] Sungwoo Park commented on TEZ-4542: --- I suggest reverting this patch (before releasing Tez 0.10.4) because it introduces severe memory pressure on the Tez runtime. The performance issue can be reproduced with 10TB TPC-DS query 67. In our local cluster, we see: Before applying the patch: about 600 seconds After applying the patch: about 9800 seconds This performance issue is probably what is reported in TEZ-4577. We used Hive-MR3 (instead of Hive-Tez), but I am certain that the same issue can be reproduced with Hive-LLAP. It makes sense to leave maxItems to a constant (1024 * 1024) for efficiency. For fixing the bug reported in this JIRA, we could employ a new logic that changes maxItems only when necessary. > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875307#comment-17875307 ] Yi Zhang commented on TEZ-4542: --- [~zhengchenyu] could you take a look at TEZ-4577? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846208#comment-17846208 ] Chenyu Zheng commented on TEZ-4542: --- Thanks [~abstractdog] and [~rbalamohan] for the review! [~abstractdog] BTW, do you mind taking a look at HIVE-27985 ? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846206#comment-17846206 ] László Bodor commented on TEZ-4542: --- merged to master, thanks [~zhengchenyu] for the patch and [~rbalamohan] for the review! > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)