[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-28 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066924#comment-16066924
 ] 

Kuhu Shukla commented on TEZ-3605:
--

Committing this to master.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, 
> TEZ-3605.012.patch, TEZ-3605.013.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065525#comment-16065525
 ] 

Kuhu Shukla commented on TEZ-3605:
--

[~sseth], with the latest patch running a clean pre-commit, request for one 
last review if needed, else I will commit this tomorrow if there are no 
objections from the community till then. Thanks!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, 
> TEZ-3605.012.patch, TEZ-3605.013.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065516#comment-16065516
 ] 

TezQA commented on TEZ-3605:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12874729/TEZ-3605.013.patch
  against master revision de72fbe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2555//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2555//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, 
> TEZ-3605.012.patch, TEZ-3605.013.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065275#comment-16065275
 ] 

Siddharth Seth commented on TEZ-3605:
-

Thanks for the updated patch. Fixes large records as well.

+1, with one minor fix before committing.

In PipelinedSorter - if (combiner != null) will run into an NPE. A simple 
hasNext check there as well fixes this.




> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, TEZ-3605.012.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064752#comment-16064752
 ] 

Kuhu Shukla commented on TEZ-3605:
--

Request for comments/review on the latest patch [~sseth], [~jeagles]. Thanks a 
lot!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, TEZ-3605.012.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-26 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064225#comment-16064225
 ] 

TezQA commented on TEZ-3605:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12874554/TEZ-3605.012.patch
  against master revision 5b0f5a0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2546//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2546//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, TEZ-3605.012.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-26 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063606#comment-16063606
 ] 

Kuhu Shukla commented on TEZ-3605:
--

bq. and invokes a merger on an empty list (not sure how this is handled)
Empty List is handled fine 
{quote}
if (segments.size() == 0) {
LOG.info("Nothing to merge. Returning an empty iterator");
return new EmptyIterator();
  }
{quote}
It is when the segment size is zero when it gets into trouble due to a stream 
with no bytes to read.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-09 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045101#comment-16045101
 ] 

Siddharth Seth commented on TEZ-3605:
-

In PipelinedSorter - the final merge does not necessarily skip a fully empty 
partition. The check while creating DiskSegments can end up with a list which 
is empty, and invokes a merger on an empty list (not sure how this is handled)
Similarly in DefaultSorter, I think mergeParts needs some work.

Would be useful to have tests for both, i.e. when there's multiple spills 
involved, 1) where a single spill has a partition, another does not, 2) all 
spills don't have a partition

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-31 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031088#comment-16031088
 ] 

Kuhu Shukla commented on TEZ-3605:
--

TestExceptionPropagation failure seems unrelated and locally irreproducible. I 
will continue to investigate this. Other test failures are known and already 
have JIRAs associated. [~jeagles], looking for some comments on the latest 
patch. Thanks a lot!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-24 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022893#comment-16022893
 ] 

Kuhu Shukla commented on TEZ-3605:
--

The test failure is irreproducible locally and is due to an unexpected state 
transition. I ran the same test in a loop and did not see it failing even once.
{code}
2017-05-22 20:58:09,104 ERROR [Dispatcher thread {Central}] 
impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(861)) - Can't handle this 
event at current state for attempt_1495486688894_0001_1_00_03_1
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
TA_SUBMITTED at KILL_IN_PROGRESS
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:859)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:124)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2299)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2284)
at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:745)
{code}

Would appreciate any comments/review on the latest patch. [~sseth]/ 
[~jeagles]/[~rajesh.balamohan]. Thanks a lot!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, TEZ-3605.009.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-18 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016327#comment-16016327
 ] 

Kuhu Shukla commented on TEZ-3605:
--

Looking at the related test failures. Will update shortly.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-18 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016297#comment-16016297
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12868801/TEZ-3605.008.patch
  against master revision e3ee7a6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2464//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2464//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-18 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016051#comment-16016051
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12868788/TEZ-3605.007.patch
  against master revision e3ee7a6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2463//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948010#comment-15948010
 ] 

Siddharth Seth commented on TEZ-3605:
-

Took a while to get to this, and to recollect what is done in the 
UnorderedWriter / DefaultWriter.
If I'm not mistaken, the patch is trying to avoid writing out the default 
4bytes(?) that is generated by an IFile.Writer?, when the partition does not 
have data? (TEZ-941)

The changes to track numRecordsPerPartition are required for this. The Sorters 
already know how to generate the empty partition bitset by making use of 
TezSpillRecord and TezIndexRecord.hasData.
The current changes to track numRecordsPerPartition also breaks 
PipelinedSHuffle / AvoidFinalMerge - since the partition stats are cumulative, 
and not per partition. Synchronization will also need to be looked at (suspect 
there may be some issues with the size stats as well).

The unordered case does not respect "sendEmptyPartitionsViaEvents" as a 
configuration parameter, and always sends empty partition information. IIRC 
this is why it is able to avoid the Writer for an empty partition - the reader 
will never access it.
In the ordered case, if sendEmptyPartitionsViaEvents is disabled, the reader 
may try interpreting the contents of TezIndexRecord, which was not written, and 
fail (need to check how this will behave).

I think the changes to track number of records should be removed. Instead, the 
main changes should be in DefaultSorter (and maybe the same changes in 
PipelinedSorter). These changes should skip creating the writer only if 
sendEmptyPartitionsViaEvents is enabled.
Also, in the current changes to DefaultSorter, is it possible to move (if 
(writer == null)) - outside of the while loop? 

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-23 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938398#comment-15938398
 ] 

Jonathan Eagles commented on TEZ-3605:
--

This approach seems correct. I will want to spend some time testing this to 
understand the implications.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-15 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927083#comment-15927083
 ] 

Kuhu Shukla commented on TEZ-3605:
--

[~jeagles], [~rajesh.balamohan], request for review/comments. Appreciate it!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-15 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927078#comment-15927078
 ] 

TezQA commented on TEZ-3605:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12858951/TEZ-3605.006.patch
  against master revision 57c857d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2330//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2330//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906612#comment-15906612
 ] 

Kuhu Shukla commented on TEZ-3605:
--

[~rajesh.balamohan], could you take a look at this patch and share your 
comments. Thanks a lot!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-07 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900398#comment-15900398
 ] 

Kuhu Shukla commented on TEZ-3605:
--

[~sseth], [~jeagles], Would appreciate any initial comments on the patch and 
how to proceed with this fix.
Thanks a lot!

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900376#comment-15900376
 ] 

TezQA commented on TEZ-3605:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12856686/TEZ-3605.005.patch
  against master revision c6d4908.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2313//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2313//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-07 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899672#comment-15899672
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12856609/TEZ-3605.004.patch
  against master revision d40f3ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2307//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2307//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2307//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-06 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898019#comment-15898019
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12856328/TEZ-3605.003.patch
  against master revision a5ffdea.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2303//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2303//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2303//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-06 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897595#comment-15897595
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12856292/TEZ-3605.002.patch
  against master revision 518deb6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2302//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2302//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2302//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-05 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896696#comment-15896696
 ] 

TezQA commented on TEZ-3605:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12856006/TEZ-3605.001.patch
  against master revision 1b1eb1d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2299//console

This message is automatically generated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)