[jira] [Created] (TEZ-2603) Archive old tez releases
Hitesh Shah created TEZ-2603: Summary: Archive old tez releases Key: TEZ-2603 URL: https://issues.apache.org/jira/browse/TEZ-2603 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah This requires updating the website links to point to dist archive instead of the public mirrors. Also, the old releases should be removed from dist/releases. We should probably just keep the latest versions of 0.5, 0.6 and 0.7 and drop all the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2496 PreCommit Build #888
Jira: https://issues.apache.org/jira/browse/TEZ-2496 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/888/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 3031 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743967/TEZ-2496.8.patch against master revision cb59851. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/888//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/888//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/888//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 272bccd126b7eae25bc48b9a20d825929d1390e9 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #886 Archived 47 artifacts Archive block size is 32768 Received 6 blocks and 2745509 bytes Compression is 6.7% Took 0.64 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616879#comment-14616879 ] TezQA commented on TEZ-2496: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743967/TEZ-2496.8.patch against master revision cb59851. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/888//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/888//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/888//console This message is automatically generated. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch, TEZ-2496.8.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider the following hive query running in a queue with limited capacity (42 slots in total) @ 200 GB scale {noformat} CREATE TEMPORARY TABLE sampleData AS SELECT CASE WHEN ss_sold_time_sk IS NULL THEN 70429 ELSE ss_sold_time_sk END AS ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk FROM store_sales distribute by ss_sold_time_sk; {noformat} This generated 39 maps and 134 reduce slots (3 reduce waves). When lots of nulls are there for ss_sold_time_sk, it would tend to have data skew towards 70429. If the reducer which gets this data gets scheduled much earlier (i.e in first wave itself), entire job would finish fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617299#comment-14617299 ] Hitesh Shah commented on TEZ-2602: -- \cc [~rajesh.balamohan] Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:219) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311)
[jira] [Comment Edited] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617299#comment-14617299 ] Hitesh Shah edited comment on TEZ-2602 at 7/7/15 8:17 PM: -- \cc [~rajesh.balamohan] [~ozawa] Does the job also fail if the sorter is set to LEGACY instead of pipelined? was (Author: hitesh): \cc [~rajesh.balamohan] Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:219) at
[jira] [Updated] (TEZ-2594) Fix licensing and notice file for minimal tarball
[ https://issues.apache.org/jira/browse/TEZ-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2594: - Issue Type: Task (was: Sub-task) Parent: (was: TEZ-2592) Fix licensing and notice file for minimal tarball - Key: TEZ-2594 URL: https://issues.apache.org/jira/browse/TEZ-2594 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Minimal tarball needs its own license and notice file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617362#comment-14617362 ] Bikas Saha commented on TEZ-2496: - Summarizing an offline discussion on the pros and cons on this approach 1) lives in user land - easier to iterate 2) memory efficient - shuffle vertex manager can apply policy specific to partition stats use case. deterministic sizes mean it can aggregate upon event receipt and discard the raw data 3) cpu efficient - because its not calling getStatistics() repeatedly 4) works with pipelining and getting early stats from running tasks. getStatistics() would get expensive for this. 5) allows for other use cases like sending partition stats to inputs for runtime optimizations. Only the shuffle vertex manager can correctly do this since it merges partitions during auto reduce. 6) Once this has stabilized and optimized then we can transfer the logic to a partition stats API that would be generally available as part of the system. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch, TEZ-2496.8.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider the following hive query running in a queue with limited capacity (42 slots in total) @ 200 GB scale {noformat} CREATE TEMPORARY TABLE sampleData AS SELECT CASE WHEN ss_sold_time_sk IS NULL THEN 70429 ELSE ss_sold_time_sk END AS ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk FROM store_sales distribute by ss_sold_time_sk; {noformat} This generated 39 maps and 134 reduce slots (3 reduce waves). When lots of nulls are there for ss_sold_time_sk, it would tend to have data skew towards 70429. If the reducer which gets this data gets scheduled much earlier (i.e in first wave itself), entire job would finish fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2592) [Umbrella] Make it easier to generate binary artifacts for user convenience
[ https://issues.apache.org/jira/browse/TEZ-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2592: - Attachment: TEZ-2592.initial.patch [Umbrella] Make it easier to generate binary artifacts for user convenience Key: TEZ-2592 URL: https://issues.apache.org/jira/browse/TEZ-2592 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Attachments: TEZ-2592.initial.patch Umbrella jira to track various sub-tasks needed to make it easier for a release manager to generate binary artifacts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617594#comment-14617594 ] Tsuyoshi Ozawa commented on TEZ-2602: - [~hitesh] no, no fail with LEGACY sorter. {quote} $ time hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapred.reduce.tasks=15 -Dtez.runtime.sort.threads=1 -Dtez.runtime.sorter.class=LEGACY wc10g tezwc10g9 {quote} succeeds. Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at
[jira] [Commented] (TEZ-2591) Remove unneeded methods from EdgeManagerPludinOnDemand
[ https://issues.apache.org/jira/browse/TEZ-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617646#comment-14617646 ] Rajesh Balamohan commented on TEZ-2591: --- Earlier it was handling both cases (i.e routing model onDemand). Now EdgeManagerPluginBase becomes the base which is extended by EdgeManagerPlugin and EdgeManagerPluginOnDemand. However, almost all impls except OneToOneEdgeManager are based on EdgeManagerPluginOnDemand. Would this be a problem for other jiras? (e.g TEZ-2255). For example, do we need to have separate set of impls for EdgeManagerPlugin based approach? Remove unneeded methods from EdgeManagerPludinOnDemand -- Key: TEZ-2591 URL: https://issues.apache.org/jira/browse/TEZ-2591 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2591.1.patch, TEZ-2591.2.patch, TEZ-2591.3.patch EdgeManagerPluginOnDemand inherits from EdgeManager (legacy) due to some common methods. Removing this dependency would be cleaner for the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617708#comment-14617708 ] Bikas Saha commented on TEZ-2496: - Looks good overall. This does not have to be in the API package. {code}diff --git tez-api/src/main/java/org/apache/tez/runtime/api/DATA_RANGE_IN_MB.java tez-api/src/main/java/org/apache/tez/runtime/api/DATA_RANGE_IN_MB.java {code} TEN? Also, looks like the constructor and member var are dead code? {code}+public enum DATA_RANGE_IN_MB { + THOUSAND(1000), HUNDRED(100), TEZ(10), ONE(1), ZERO(0);{code} Do we really need to do math.ceil()? There is probably a bit manipulation method to do this cheaper. {code}+ public static final DATA_RANGE_IN_MB getRange(long sizeInBytes) { +int sizeInMB = (int) Math.ceil(sizeInBytes / (1024 * 1024 * 1.0));{code} Does runtime-internals need roaring bitmaps? {code}diff --git tez-runtime-internals/pom.xml tez-runtime-internals/pom.xml ... + groupIdorg.roaringbitmap/groupId + artifactIdRoaringBitmap/artifactId {code} Unnecessary diff {code}diff --git tez-runtime-internals/src/main/java/org/apache/tez/runtime/api/impl/TezOutputContextImpl.java tez-runtime-internals/src/main/java/org/apache/tez/runtime/api/impl/TezOutputContextImpl.java {code} Why do the +1 here instead of in getBucket()? Spreading the bucket logic in 3 places - here + getBucket() + DATA_RANGE_MB is error prone. Perhaps replace all 3 with getBucket()? {code}+for (int i = 0; i sizes.length; i++) { + int bucket = getBucket(sizes[i]) + 1; {code} No point having 2 vars that can be tracked as one? reportPartitionStats === reportPartitionStats() { return partitions != null}, right? {code}+ protected OutputStatisticsReporter statsReporter; + protected final long[] partitionStats;{code} Still needed? {code} VertexManagerPluginContext mockContext = mock(VertexManagerPluginContext.class); + when(mockContext.getVertexStatistics(any(String.class))).thenReturn(mock(VertexStatistics.class));{code} Are there existing OrderedPartitionedOutput/PipeLinedSorter/ExternalSorter tests that can be enhanced to verify that partition stats are being recorded? Assuming the ShuffleVertexManager code is the same as when I looked at it the last time. Not sure why the second if part of each of the if checks is useful? Any issues in simply over-writing the new value? {code}+if ((totalStats 0) (taskInfo.outputStats != totalStats)) { + computedPartitionSizes = true; + taskInfo.outputStats = totalStats; +} + } else { +if ((stats[index] 0) (stats[index] != taskInfo.outputStats)) { + computedPartitionSizes = true; + taskInfo.outputStats = stats[index];{code} If I understand this right, the code is trying to not sort based on this check. But could this be done simply by whether we have received a new stats update event? And move the code from computePartitionSizes()+sortPendingTasksBasedOnDataSize into parsePartitionStats()? Nothing should change unless we have received new stats, right? So all stats dependent updates can be made when we receive new stats. Spurious change? {code}- @Test(timeout = 5000) + @Test(timeout = 500) public void testShuffleVertexManagerAutoParallelism() throws Exception {{code} Why did this change? {code} Assert.assertTrue(manager.pendingTasks.size() == 0); // all tasks scheduled -Assert.assertTrue(scheduledTasks.size() == 3); +Assert.assertTrue(scheduledTasks.size() == 1);{code} Can the shuffle vertex manager bucket calculation test be further enhanced to verify that the first task to be scheduled is the largest task? that is the final intent of the jira right :) In a separate jira we need to track the bug that the vertex manager event is not resilient to task retries because it does not provide that info. So the same task rerun would cause double counting. Its an existing bug not introduced in this patch. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch, TEZ-2496.8.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider
[jira] [Updated] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2496: -- Attachment: TEZ-2496.7.patch Based on VertexManagerEvent being sent to ShuffleVertexManager directly. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider the following hive query running in a queue with limited capacity (42 slots in total) @ 200 GB scale {noformat} CREATE TEMPORARY TABLE sampleData AS SELECT CASE WHEN ss_sold_time_sk IS NULL THEN 70429 ELSE ss_sold_time_sk END AS ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk FROM store_sales distribute by ss_sold_time_sk; {noformat} This generated 39 maps and 134 reduce slots (3 reduce waves). When lots of nulls are there for ss_sold_time_sk, it would tend to have data skew towards 70429. If the reducer which gets this data gets scheduled much earlier (i.e in first wave itself), entire job would finish fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
[ https://issues.apache.org/jira/browse/TEZ-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617873#comment-14617873 ] Rajesh Balamohan commented on TEZ-2604: --- [~ozawa] - This is covered as a part of TEZ-2574 (patch 3 tries to cover this; review pending). PipelinedSorter doesn't use number of items when creating SortSpan --- Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: TEZ-2604.001.patch {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
[ https://issues.apache.org/jira/browse/TEZ-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated TEZ-2604: Affects Version/s: 0.8.0 PipelinedSorter doesn't use number of items when creating SortSpan --- Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: TEZ-2604.001.patch {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
[ https://issues.apache.org/jira/browse/TEZ-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated TEZ-2604: Attachment: TEZ-2604.001.patch Attaching a first patch. PipelinedSorter doesn't use number of items when creating SortSpan --- Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: TEZ-2604.001.patch {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
Tsuyoshi Ozawa created TEZ-2604: --- Summary: PipelinedSorter doesn't use number of items when creating SortSpan Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2591) Remove unneeded methods from EdgeManagerPludinOnDemand
[ https://issues.apache.org/jira/browse/TEZ-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617771#comment-14617771 ] Bikas Saha commented on TEZ-2591: - The base class has all common methods that are shared. The routing methods have been isolated into their specific routing impls. EdgeManagerPlugin has the legacy API's for routing. EdgeManagerOnDemand has the on-demand API's. Any new routing policy will create its own routing API's and inherit the common (non-routing) API's from the base class. Does that make sense? Remove unneeded methods from EdgeManagerPludinOnDemand -- Key: TEZ-2591 URL: https://issues.apache.org/jira/browse/TEZ-2591 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2591.1.patch, TEZ-2591.2.patch, TEZ-2591.3.patch EdgeManagerPluginOnDemand inherits from EdgeManager (legacy) due to some common methods. Removing this dependency would be cleaner for the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616764#comment-14616764 ] Tsuyoshi Ozawa commented on TEZ-2602: - If making -Dtez.runtime.ifile.readahead false, IndexOutOfBoundsException is thrown: {quote} 15/07/07 14:16:35 INFO mapreduce.Job: Job job_1435943097882_0022 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0022_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0022_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException at java.io.DataInputStream.readFully(DataInputStream.java:192) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:219) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:267) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164) at org.apache.tez.mapreduce.processor.map.MapProcessor$NewOutputCollector.write(MapProcessor.java:363) at org.apache.tez.mapreduce.hadoop.mapreduce.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:90) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:47) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:36) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237) at org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:89, Vertex vertex_1435943097882_0022_1_00 [initialmap] killed/failed due to:null]. Vertex killed, vertexName=finalreduce, vertexId=vertex_1435943097882_0022_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:15, Vertex vertex_1435943097882_0022_1_01 [finalreduce] killed/failed due to:null]. DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 15/07/07 14:16:35 INFO mapreduce.Job: Counters: 0 {quote} Throwing EOFException when launching MR job
[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch
[ https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616765#comment-14616765 ] Tsuyoshi Ozawa commented on TEZ-1421: - The current error I faced is not NPE, but EOFException. I created TEZ-2602 to address the issue. MRCombiner throws NPE in MapredWordCount on master branch - Key: TEZ-1421 URL: https://issues.apache.org/jira/browse/TEZ-1421 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Critical I tested MapredWordCount against 70GB generated by RandowTextWriter. When a Combiner runs, it throws NPE. It looks setCombinerClass doesn't work correctly. {quote} Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605) at org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2602) Throwing EOFException when launching MR job
Tsuyoshi Ozawa created TEZ-2602: --- Summary: Throwing EOFException when launching MR job Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:219) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:267) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164)
[jira] [Commented] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616679#comment-14616679 ] TezQA commented on TEZ-2496: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743948/TEZ-2496.7.patch against master revision cb59851. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter org.apache.tez.runtime.library.common.sort.impl.dflt.TestDefaultSorter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/887//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/887//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/887//console This message is automatically generated. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider the following hive query running in a queue with limited capacity (42 slots in total) @ 200 GB scale {noformat} CREATE TEMPORARY TABLE sampleData AS SELECT CASE WHEN ss_sold_time_sk IS NULL THEN 70429 ELSE ss_sold_time_sk END AS ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk FROM store_sales distribute by ss_sold_time_sk; {noformat} This generated 39 maps and 134 reduce slots (3 reduce waves). When lots of nulls are there for ss_sold_time_sk, it would tend to have data skew towards 70429. If the reducer which gets this data gets scheduled much earlier (i.e in first wave itself), entire job would finish fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch
[ https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616670#comment-14616670 ] Tsuyoshi Ozawa commented on TEZ-1421: - Sorry for the delay. I met this bug again. Starting to work this. MRCombiner throws NPE in MapredWordCount on master branch - Key: TEZ-1421 URL: https://issues.apache.org/jira/browse/TEZ-1421 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Critical I tested MapredWordCount against 70GB generated by RandowTextWriter. When a Combiner runs, it throws NPE. It looks setCombinerClass doesn't work correctly. {quote} Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605) at org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2496 PreCommit Build #887
Jira: https://issues.apache.org/jira/browse/TEZ-2496 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/887/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2425 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743948/TEZ-2496.7.patch against master revision cb59851. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter org.apache.tez.runtime.library.common.sort.impl.dflt.TestDefaultSorter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/887//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/887//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/887//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 00c4108f000177c4b00298dbadac299dd4661b25 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #886 Archived 47 artifacts Archive block size is 32768 Received 22 blocks and 2169247 bytes Compression is 24.9% Took 0.83 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 19 tests failed. REGRESSION: org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testKVExceedsBuffer Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.reportStatistics(ExternalSorter.java:391) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.close(ExternalSorter.java:74) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.writeData(TestPipelinedSorter.java:392) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.basicTest(TestPipelinedSorter.java:292) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testKVExceedsBuffer(TestPipelinedSorter.java:155) REGRESSION: org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.basicTest Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.reportStatistics(ExternalSorter.java:391) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.close(ExternalSorter.java:74) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.writeData(TestPipelinedSorter.java:392) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.basicTest(TestPipelinedSorter.java:292) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.basicTest(TestPipelinedSorter.java:128) REGRESSION: org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeKeyValue Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.reportStatistics(ExternalSorter.java:391) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.close(ExternalSorter.java:74) at
[jira] [Commented] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617917#comment-14617917 ] Rajesh Balamohan commented on TEZ-2602: --- [~Tsuyoshi Ozawa] - I am trying to reproduce the issue on my local vm. I tried with 700 MB text file with 100 mb sort buffer for pipeliendsorter to ensure multiple spills. Job completed without error; Are there any other setting you enable/disable to get this issue? I am yet to run this at scale (~10GB). Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at
[jira] [Commented] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617955#comment-14617955 ] Tsuyoshi Ozawa commented on TEZ-2602: - [~rajesh.balamohan] I could reproduce the error with smaller input, mapreduce.map.sort.spill.percent and io.sort.mb because the error is combiner-related issue. {quote} $ time hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapred.reduce.tasks=15 -Dtez.runtime.sort.threads=1 -Dmapreduce.map.sort.spill.percent=0.1 -Dio.sort.mb=10 wc500mb tezdebug/7 {quote} As a reference, this is a complete configuration file I'm using about Tez: https://gist.github.com/oza/3ab356c25ec64a2298e0 Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at
[jira] [Comment Edited] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617917#comment-14617917 ] Rajesh Balamohan edited comment on TEZ-2602 at 7/8/15 3:38 AM: --- [~ozawa] - I am trying to reproduce the issue on my local vm. I tried with 700 MB text file with 100 mb sort buffer for pipeliendsorter to ensure multiple spills. Job completed without error; Are there any other setting you enable/disable to get this issue? I am yet to run this at scale (~10GB). was (Author: rajesh.balamohan): [~Tsuyoshi Ozawa] - I am trying to reproduce the issue on my local vm. I tried with 700 MB text file with 100 mb sort buffer for pipeliendsorter to ensure multiple spills. Job completed without error; Are there any other setting you enable/disable to get this issue? I am yet to run this at scale (~10GB). Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at
[jira] [Commented] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
[ https://issues.apache.org/jira/browse/TEZ-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617946#comment-14617946 ] Tsuyoshi Ozawa commented on TEZ-2604: - [~rajesh.balamohan] Thanks for the sharing! Closing this as duplicated one. PipelinedSorter doesn't use number of items when creating SortSpan --- Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: TEZ-2604.001.patch {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2604) PipelinedSorter doesn't use number of items when creating SortSpan
[ https://issues.apache.org/jira/browse/TEZ-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617948#comment-14617948 ] TezQA commented on TEZ-2604: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12744119/TEZ-2604.001.patch against master revision cb59851. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/889//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/889//console This message is automatically generated. PipelinedSorter doesn't use number of items when creating SortSpan --- Key: TEZ-2604 URL: https://issues.apache.org/jira/browse/TEZ-2604 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: TEZ-2604.001.patch {quote} int items = 1024*1024; int perItem = 16; if(span.length() != 0) { items = span.length(); perItem = span.kvbuffer.limit()/items; items = (int) ((span.capacity)/(METASIZE+perItem)); if(items 1024*1024) { // our goal is to have 1M splits and sort early items = 1024*1024; } } Preconditions.checkArgument(listIterator.hasNext(), block iterator should not be empty); span = new SortSpan((ByteBuffer)listIterator.next().clear(), (1024*1024), perItem, ConfigUtils.getIntermediateOutputKeyComparator(this.conf)); {quote} Should we use items instead of (1024*1024)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2595) Fix licensing and notice file for full tarball
[ https://issues.apache.org/jira/browse/TEZ-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2595: - Issue Type: Task (was: Sub-task) Parent: (was: TEZ-2592) Fix licensing and notice file for full tarball Key: TEZ-2595 URL: https://issues.apache.org/jira/browse/TEZ-2595 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Full tarball needs its own license and notice file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2496) Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source
[ https://issues.apache.org/jira/browse/TEZ-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2496: -- Attachment: TEZ-2496.8.patch missed out the sorter test changes. ignoring the findbugs warning in externalsorter. Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source -- Key: TEZ-2496 URL: https://issues.apache.org/jira/browse/TEZ-2496 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2496.1.patch, TEZ-2496.2.patch, TEZ-2496.3.patch, TEZ-2496.4.patch, TEZ-2496.5.patch, TEZ-2496.6.patch, TEZ-2496.7.patch, TEZ-2496.8.patch Consider scheduling tasks in ShuffleVertexManager based on the partition sizes from the source. This would be helpful in scenarios, where there is limited resources (or concurrent jobs running or multiple waves) with dataskew and the task which gets large amount of data gets sceheduled much later. e.g Consider the following hive query running in a queue with limited capacity (42 slots in total) @ 200 GB scale {noformat} CREATE TEMPORARY TABLE sampleData AS SELECT CASE WHEN ss_sold_time_sk IS NULL THEN 70429 ELSE ss_sold_time_sk END AS ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk FROM store_sales distribute by ss_sold_time_sk; {noformat} This generated 39 maps and 134 reduce slots (3 reduce waves). When lots of nulls are there for ss_sold_time_sk, it would tend to have data skew towards 70429. If the reducer which gets this data gets scheduled much earlier (i.e in first wave itself), entire job would finish fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2602) Throwing EOFException when launching MR job
[ https://issues.apache.org/jira/browse/TEZ-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated TEZ-2602: Affects Version/s: 0.8.0 Throwing EOFException when launching MR job --- Key: TEZ-2602 URL: https://issues.apache.org/jira/browse/TEZ-2602 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.0 Reporter: Tsuyoshi Ozawa {quote} $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.framework.name=yarn-tez -Dmapr ed.reduce.tasks=15 -Dtez.runtime.sort.threads=1 wc10g tezwc10g5 15/07/07 13:24:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8081 15/07/07 13:24:30 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 15/07/07 13:24:30 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-8.ap-northeast-1.compute.internal:8088/proxy/application_1435943097882_0019/ 15/07/07 13:24:30 INFO mapreduce.Job: Running job: job_1435943097882_0019 15/07/07 13:24:31 INFO mapreduce.Job: Job job_1435943097882_0019 running in uber mode : false 15/07/07 13:24:31 INFO mapreduce.Job: map 0% reduce 0% 15/07/07 13:24:59 INFO mapreduce.Job: Job job_1435943097882_0019 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1435943097882_0019_1_00, diagnostics=[Task failed, taskId=task_1435943097882_0019_1_00_05, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115) at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:285) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:463) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:219) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311) at