[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2016-11-17 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674696#comment-15674696
 ] 

Rohini Palaniswamy commented on TEZ-391:


[~bikassaha],
Possible to make time to get this also into Tez 0.9?

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2016-07-09 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369253#comment-15369253
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734397/TEZ-391-WIP-7.patch
  against master revision 608e15e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1838//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2016-04-08 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233267#comment-15233267
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734397/TEZ-391-WIP-7.patch
  against master revision 53981d4.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1643//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2016-01-08 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090040#comment-15090040
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734397/TEZ-391-WIP-7.patch
  against master revision 85637c6.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1413//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-08-26 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715240#comment-14715240
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734397/TEZ-391-WIP-7.patch
  against master revision eb70cb7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1029//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-06-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569873#comment-14569873
 ] 

Hitesh Shah commented on TEZ-391:
-

ping [~bikassaha] for review

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-05-21 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554104#comment-14554104
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734397/TEZ-391-WIP-7.patch
  against master revision aa6a84c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 162 javac 
compiler warnings (more than the master's current 161 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/719//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/719//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/719//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-05-21 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553819#comment-14553819
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12734339/TEZ-391-WIP-7.patch
  against master revision 7c16b10.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 162 javac 
compiler warnings (more than the master's current 161 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/717//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/717//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/717//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-05-20 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553693#comment-14553693
 ] 

Jeff Zhang commented on TEZ-391:


The following shows the different edge types we may need to support. 
|  | Vertex | VertexGroup |
| Vertex | Common Edge | SharedOutputEdge |
| VertexGroup | GroupInputEdge | Both SharedOutputEdge & GroupInputEdge (not 
implemented yet ) |

List several main changes of this patch
* Currently SharedOutputEdge only support One-to-One and Broadcast 
(ScatterGather require the 2 downstream vertices has the same parallelism, 
otherwise shuffle will break. Although I did some change to make the 
ScatterGather work, but it still need more work, especially on the reducer 
auto-parallelism) From the pig's usage scenario, One-to-One and broadcast 
should be sufficient now. 
* Work flow for shared output edge
** Specify the shared output edge when building DAG on client. 
** AM get the shared output edge from DAGPlan and pass this SharedOutputSpec 
through TaskSpec to TezChild
** LogicalIOProcessorRuntimeTask get the TaskSpec which contains the 
SharedOutputSpec. It would created corresponded SharedLogicOutput & 
SharedOutputContext which is very similar to common LogicOutput &  
OutputContext. The only difference is that SharedLogicOutput & 
SharedOutputContext is associated with the downstream vertex group name rather 
than the downstream vertex name. The key thing here is that although we 
generate one copy of DatamovementEvent but we will send this one copy to each 
members of the downstream vertex group. (This is done in 
LogicalIOProcessorRuntimeTask.close())
* Refactor changes
** I rename lots of MergedInput to GroupedInput to make it align with 
SharedOutput
** Rename VertexImpl#sharedOutput to VertexImpl#mergedOutput 
 

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515040#comment-14515040
 ] 

Bikas Saha commented on TEZ-391:


[~zjffdu] Can you make a call on whether this is for 0.7.0 or not?
IMO, if this was close to being done then perhaps yes.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514688#comment-14514688
 ] 

Hitesh Shah commented on TEZ-391:
-

[~bikassaha] [~zjffdu] Is this for 0.7? 

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-22 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508396#comment-14508396
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12727512/TEZ-391-WIP-6.patch
  against master revision fe11c5e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 161 javac 
compiler warnings (more than the master's current 160 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/520//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/520//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/520//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-22 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508332#comment-14508332
 ] 

TezQA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12727511/TEZ-391-WIP-5.patch
  against master revision fe11c5e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/519//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
> TEZ-391-WIP-5.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338237#comment-14338237
 ] 

Hadoop QA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12701014/TEZ-391-WIP-4.patch
  against master revision 1ccb0be.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 181 javac 
compiler warnings (more than the master's current 180 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/231//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/231//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/231//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306425#comment-14306425
 ] 

Jeff Zhang commented on TEZ-391:


bq. Sounds good. But can we call it SharedOutputEdge instead of ShareOutputEdge?
Sure.

bq. We already use GroupInputEdge in pig. Refer to TezDAGBuilder. Not sure how 
you can set up the edge for Vertex Group without that as the mergedinput 
descriptor needs to be set for it.
This may need some api changes. As my understanding, the input descriptor 
depends on the edge property. So the following API should be sufficient for 
creating any kind of edges. Anyway, since this would change the api, it is just 
a proposal , won't do it this jira.

{code}
Edge.create(vertex/vertexgroup,  vertex/vertexgroup, edge_property)
{code}


> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305808#comment-14305808
 ] 

Rohini Palaniswamy commented on TEZ-391:


bq. I still think ShareOutputEdge is more suitable. Because for GroupInputEdge, 
there's multiple inputs from upstream vertices, we group them together into 
GroupInput. While for ShareOutputEdge, there's actually only one output from 
upstream vertex. So from semantic perspective I think ShareOutputEdge is better.
   Sounds good. But can we call it SharedOutputEdge instead of ShareOutputEdge?

bq. Besides, I am thinking is it necessary to expose the 
GroupInputEdge/ShareOutputEdge as public API. IMO, I don't think it is 
necessary.
   We already use GroupInputEdge in pig. Refer to TezDAGBuilder.  Not sure how 
you can set up the edge for Vertex Group without that as the mergedinput 
descriptor needs to be set for it.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304797#comment-14304797
 ] 

Jeff Zhang commented on TEZ-391:


bq. We should probably name it GroupOutputEdge to be symmetric with 
GroupInputEdge.
I still think ShareOutputEdge is more suitable. Because for GroupInputEdge, 
there's multiple inputs from upstream vertices, we group them together into 
GroupInput. While for ShareOutputEdge, there's actually only one output from 
upstream vertex. So from semantic perspective I think ShareOutputEdge is 
better. Besides, there's one concept of SharedOutput in VertexImpl 
(VertexImpl:: addSharedOutputs ) for output to a data sink. I think this kind 
of output be renamed as GroupOutput is much better.

bq. Is the design suggesting that a group output edge expand into standard 
edges with additional metadata at the source vertex which will enable its 
TezChild to provide a single output to its tasks even though there are multiple 
consumers?
Yes, TezChild only has one output but would send multiple events to AM based on 
the additional metadata about the share edge.

bq.  What happens to fault tolerance? If a destination vertex reports an error 
about a shared source then what should happen in other destination vertices 
that are sharing that source? 
The upstream vertex will get the the InputReadErrorEvent and would send the 
InputFailedEvent to both downstream vertices. In theory it should be no 
problem. But you are right, I think I need to highlight these case and verify 
it in unittest.

bq. Related: When an output of a task is marked bad then it sends an 
InputFailed event to its destination tasks. This happens in the AM and needs to 
be sent to all destination tasks of a shared output. So the AM routing would 
need to take into account shared outputs for this case.
For the AM, it knows the standard edges that are expanded from share edge. so 
all the downstream vertices will get the InputFailed event.

bq. Can it happen that a VertexGroup is connected to another VertexGroup? What 
use case would that be?
Good question. This case would be 2 union join together and one of them is 
replicated part.  In this case the edges between these vertex group would be 
both GroupInputEdge and ShareOutputEdge. Need to look into it more deeply. 

{code}
a = load 'file:///tmp/input' as (x:int, y:chararray);
b = load 'file:///tmp/input' as (y:chararray, x:int);
c = union onschema a, b;
d = load 'file:///tmp/input1' as (x:int, z:chararray);
e = load 'file:///tmp/input2' as (x:int, z:chararray);
f  = union onschema d,e;
g = join c by x, d by f using 'replicated';
store g into 'file:///tmp/output';
{code}

Besides, I am thinking is it necessary to expose the 
GroupInputEdge/ShareOutputEdge as public API. User just need to create edge by 
connecting one Vertex/VertexGroup and another Vertex/VertexGroup (2 by 2 
cases)., 
* If the destination is vertex group, then that mean they share the one copy of 
output from source no matter the source is vertex or vertex group.
* Meanwhile, If the source is vertex group, then that mean destination use the 
merged input from the destination no matter the destination is vertex or vertex 
group. 




> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304395#comment-14304395
 ] 

Bikas Saha commented on TEZ-391:


Thanks for the doc. It gives a good overall picture.

We should probably name it GroupOutputEdge to be symmetric with GroupInputEdge.

Some parts are not clear. E.g. a group input edge actually expands into a set 
of standard edges a long with the some metadata on the destination vertex that 
enables the AM to send additional info to the destination TezChild. TezChild 
uses the metadata to create a unified input that wraps around the member 
inputs. This provides a merged view on top of the real inputs and failure 
handling remains as is.
Is the design suggesting that a group output edge expand into standard edges 
with additional metadata at the source vertex which will enable its TezChild to 
provide a single output to its tasks even though there are multiple consumers?

The replicate event at TezChild vs keep it single needs some more thought. E.g. 
replication would increase event memory by replica times. What happens to fault 
tolerance? If a destination vertex reports an error about a shared source then 
what should happen in other destination vertices that are sharing that source? 
Related: When an output of a task is marked bad then it sends an InputFailed 
event to its destination tasks. This happens in the AM and needs to be sent to 
all destination tasks of a shared output. So the AM routing would need to take 
into account shared outputs for this case. The OutputReportedFailedTransition 
may need to be updated to consider the case the errors may be reported from 
multiple vertices with different task counts.

Shared output to a data sink was already covered in the jira that added 
GroupInputEdge. So we can skip that here.

Can it happen that a VertexGroup is connected to another VertexGroup? What use 
case would that be? Until now standard vertices would be inputs a VertexGroup. 
Shared edge will allow VertexGroups to be outputs to standard vertices.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300856#comment-14300856
 ] 

Hadoop QA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12695866/Shared%20Edge%20Design.pdf
  against master revision cfa637a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/110//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291652#comment-14291652
 ] 

Hadoop QA commented on TEZ-391:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12694458/TEZ-391-WIP-3.patch
  against master revision 12e1e66.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/77//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/77//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch, TEZ-391-WIP-2.patch, 
> TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-26 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291616#comment-14291616
 ] 

Jeff Zhang commented on TEZ-391:


[~bikassaha] Thanks for your comments. I will attach a design doc later for 
your review.



> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch, TEZ-391-WIP-2.patch, 
> TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291516#comment-14291516
 ] 

Bikas Saha commented on TEZ-391:


I am glad that you have come to the conclusion that VertexGroup can be 
symmetrically used to created shared outputs in the same manner as it is 
currently used to create shared inputs. I had thought about shared edge 
implementation after this jira was opened and this seemed like the most natural 
solution. I should have noted that down in a design note earlier but looks like 
we are on the same page. Before going down the implementation path, it would be 
great if you could leave a design note that outline the flow - from API spec to 
how the logic flows through to the tasks. This will help clear out the design 
and enabled others to understand it better.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch, TEZ-391-WIP-2.patch, 
> TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287210#comment-14287210
 ] 

Hadoop QA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12693848/TEZ-391-WIP-2.patch
  against master revision 3f4e8a7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/71//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch, TEZ-391-WIP-2.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282411#comment-14282411
 ] 

Hadoop QA commented on TEZ-391:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12693059/TEZ-391-WIP-1.patch
  against master revision c684653.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 64 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/52//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/52//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/52//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/52//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/52//console

This message is automatically generated.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-01-19 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282355#comment-14282355
 ] 

Jeff Zhang commented on TEZ-391:


Attach patch for SharedEdge
* Add a new api in Edge to create shared edge
{code}
public Edge createSharedEdge(Vertex outputVertex) 
{code}
* Currently it only support One-to-One and Broadcast (ScatterGather require the 
2 downstream vertices has the same parallelism, otherwise shuffle will break. 
Although I did some change to make the ScatterGather work, but it still need 
more work, especially on the reducer auto-parallelism)
* Add one example in tez-example to show the usage. (SharedEdgeExample)

Although this patch works, after more thinking, I think using VertexGroup may 
be more natural and easy to understand. (We just need to make the 2 downstream 
vertices as a vertex group and connect the upstream vertex with this vertex 
group)  VertexGroup is now used for shared output, it is also natural to make 
it support for shared input. I will attach a new patch by using VertexGroup 
later.




> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> -
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)