[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans

2016-07-12 Thread Sushmitha Sreenivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushmitha Sreenivasan updated TEZ-3235:
---
Attachment: Tez-3235.3.patch

> Modify Example TestOrderedWordCount job to test the IPC limit for large dag 
> plans
> -
>
> Key: TEZ-3235
> URL: https://issues.apache.org/jira/browse/TEZ-3235
> Project: Apache Tez
>  Issue Type: Task
>Affects Versions: 0.8.3
>Reporter: Sushmitha Sreenivasan
>Assignee: Sushmitha Sreenivasan
> Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

2016-07-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374379#comment-15374379
 ] 

Tsuyoshi Ozawa commented on TEZ-3303:
-

Thanks for your review and your committing :-)

> Have ShuffleVertexManager consume more precise partition stats
> --
>
> Key: TEZ-3303
> URL: https://issues.apache.org/jira/browse/TEZ-3303
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Fix For: 0.9.0
>
> Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, 
> TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch
>
>
> TEZ-3216 adds the support for more precise partition stats. 
> ShuffleVertexManager should be updated to consume the more precise partition 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3343) sqoop import can't success

2016-07-12 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374251#comment-15374251
 ] 

Jeff Zhang commented on TEZ-3343:
-

Could you attach the yarn app log ? You can ask this kind of question in tez 
user mail list before confirming this is a bug.  

> sqoop import can't success
> --
>
> Key: TEZ-3343
> URL: https://issues.apache.org/jira/browse/TEZ-3343
> Project: Apache Tez
>  Issue Type: Bug
> Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4
>Reporter: lishaoguang
>
> I deployed the hadoop environment,and i tried import data from mysql to 
> hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and 
> It success,but when I use sqoop to import data from mysql to hdfs ,It stop at 
> 0% map and failed at last.How can I do ?Can anyone help me?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3343) sqoop import can't success

2016-07-12 Thread lishaoguang (JIRA)
lishaoguang created TEZ-3343:


 Summary: sqoop import can't success
 Key: TEZ-3343
 URL: https://issues.apache.org/jira/browse/TEZ-3343
 Project: Apache Tez
  Issue Type: Bug
 Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4
Reporter: lishaoguang


I deployed the hadoop environment,and i tried import data from mysql to 
hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and It 
success,but when I use sqoop to import data from mysql to hdfs ,It stop at 0% 
map and failed at last.How can I do ?Can anyone help me?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-3303 PreCommit Build #1848

2016-07-12 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3303
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1848/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4123 lines...]
[INFO] Tez ... SUCCESS [  0.035 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 55:31 min
[INFO] Finished at: 2016-07-13T01:17:13+00:00
[INFO] Final Memory: 86M/1053M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12817556/TEZ-3303.003.02.patch
  against master revision 8131896.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1848//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
f2b72f48dd84ca2ad2ede90b8b9dc9d19e49bf70 logged out


==
==
Finished build.
==
==


Archiving artifacts
[description-setter] Description set: TEZ-3303
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

2016-07-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374133#comment-15374133
 ] 

TezQA commented on TEZ-3303:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12817556/TEZ-3303.003.02.patch
  against master revision 8131896.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1848//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//console

This message is automatically generated.

> Have ShuffleVertexManager consume more precise partition stats
> --
>
> Key: TEZ-3303
> URL: https://issues.apache.org/jira/browse/TEZ-3303
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, 
> TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch
>
>
> TEZ-3216 adds the support for more precise partition stats. 
> ShuffleVertexManager should be updated to consume the more precise partition 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374071#comment-15374071
 ] 

Bikas Saha commented on TEZ-3334:
-

Also reporting errors properly in the response such that 1 error does not 
corrupt the entire data stream. YARN-1773.

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374065#comment-15374065
 ] 

Bikas Saha commented on TEZ-3334:
-

YARN-4577 for classpath isolation of aux services. 

Perhaps the first thing could be the POC. 

Which is take existing MR shuffle and change its packaging to org.apache.tez. 
Then add it as tez_shuffle in YARN alongside mapreduce_shuffle. And verify that 
tez jobs use Tez shuffle and MR jobs use MR shuffle (both shuffle services 
running the same code effectively). 

After that we can create follow up jiras for new features and improvements to 
tez shuffle.

Sounds like a plan?

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-3337 PreCommit Build #1847

2016-07-12 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3337
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1847/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4121 lines...]
[INFO] Tez ... SUCCESS [  0.030 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 54:01 min
[INFO] Finished at: 2016-07-13T00:06:09+00:00
[INFO] Final Memory: 85M/1211M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12817540/TEZ-3337.1.patch
  against master revision 8131896.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1847//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
f05a669123d2097020adaf99dd531d88b36de504 logged out


==
==
Finished build.
==
==


Archiving artifacts
[description-setter] Description set: TEZ-3337
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374021#comment-15374021
 ] 

TezQA commented on TEZ-3337:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12817540/TEZ-3337.1.patch
  against master revision 8131896.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1847//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//console

This message is automatically generated.

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3337.1.patch
>
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373981#comment-15373981
 ] 

Hitesh Shah commented on TEZ-3331:
--

Also, TEZ-3168 has a wip patch that shows how the shims could be enhanced to 
make use of an API not in the default version of hadoop that we compile 
against. 

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
> Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, 
> TEZ-3331.wip.4.patch, TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

2016-07-12 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3303:

Attachment: TEZ-3303.003.02.patch

Uploading the modified patch for precommit.

> Have ShuffleVertexManager consume more precise partition stats
> --
>
> Key: TEZ-3303
> URL: https://issues.apache.org/jira/browse/TEZ-3303
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, 
> TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch
>
>
> TEZ-3216 adds the support for more precise partition stats. 
> ShuffleVertexManager should be updated to consume the more precise partition 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

2016-07-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373969#comment-15373969
 ] 

Siddharth Seth commented on TEZ-3303:
-

Very minor: can we make this an "else if (proto.hasDetailedPartitionStats)". 
One of the two stats is populated; however this should not double count if both 
were populated.

Thanks [~ozawa] for the patch and [~mingma] for the review. Will commit after 
this change.

> Have ShuffleVertexManager consume more precise partition stats
> --
>
> Key: TEZ-3303
> URL: https://issues.apache.org/jira/browse/TEZ-3303
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, 
> TEZ-3303.002.patch, TEZ-3303.003.patch
>
>
> TEZ-3216 adds the support for more precise partition stats. 
> ShuffleVertexManager should be updated to consume the more precise partition 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-12 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373947#comment-15373947
 ] 

Ming Ma commented on TEZ-3331:
--

[~hitesh] thanks for the info about hadoop shim.

bq. Mind adding more details on which features in particular?

I have opened TEZ-3340, TEZ-3341, TEZ-3342 and followed up on [~sseth]'s email 
thread about release. Do you know if hadoop shim can supports additions of 
these features?

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
> Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, 
> TEZ-3331.wip.4.patch, TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3340) Add support for YARN Shared Cache

2016-07-12 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3340:
-
Description: 
YARN provides shared cache in functionality YARN-1492. According to [~ctrezzo] 
most of the YARN functionality is in hadoop 2.8 and frameworks can start to use 
it. MR adds the support via MAPREDUCE-5951.

Can anyone confirm if Tez supports the upload of application DAG jar and 
dependent lib jars from client machine to HDFS as part of Tez app submission? 
From my test, that doesn't seem to happen. Instead Tez expects applications to 
upload the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS 
locations.

  was:
YARN provides shared cache in functionality YARN-1492. According to [~ctrezzo] 
most of the YARN functionality is in hadoop 2.8 and frameworks can start to use 
it. MR adds the support via MAPREDUCE-5951.

Can anyone confirm if Tez supports the upload of application DAG jar and 
dependent lib jars from client machine to HDFS as part of Tez app submission? 
From my test, that doesn't seem to happen. Tez expects applications to upload 
the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS locations.


> Add support for YARN Shared Cache
> -
>
> Key: TEZ-3340
> URL: https://issues.apache.org/jira/browse/TEZ-3340
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> YARN provides shared cache in functionality YARN-1492. According to 
> [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can 
> start to use it. MR adds the support via MAPREDUCE-5951.
> Can anyone confirm if Tez supports the upload of application DAG jar and 
> dependent lib jars from client machine to HDFS as part of Tez app submission? 
> From my test, that doesn't seem to happen. Instead Tez expects applications 
> to upload the jars to HDFS beforehand and then set the tez.aux.uris to the 
> HDFS locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3342) Have Tez AM generate thread dump on task attempts timeout before killing them

2016-07-12 Thread Ming Ma (JIRA)
Ming Ma created TEZ-3342:


 Summary: Have Tez AM generate thread dump on task attempts timeout 
before killing them
 Key: TEZ-3342
 URL: https://issues.apache.org/jira/browse/TEZ-3342
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Ming Ma


This is to provide something similar to MAPREDUCE-5044.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (TEZ-3340) Add support for YARN Shared Cache

2016-07-12 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma moved YARN-5365 to TEZ-3340:


Key: TEZ-3340  (was: YARN-5365)
Project: Apache Tez  (was: Hadoop YARN)

> Add support for YARN Shared Cache
> -
>
> Key: TEZ-3340
> URL: https://issues.apache.org/jira/browse/TEZ-3340
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> YARN provides shared cache in functionality YARN-1492. According to 
> [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can 
> start to use it. MR adds the support via MAPREDUCE-5951.
> Can anyone confirm if Tez supports the upload of application DAG jar and 
> dependent lib jars from client machine to HDFS as part of Tez app submission? 
> From my test, that doesn't seem to happen. Tez expects applications to upload 
> the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS 
> locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373841#comment-15373841
 ] 

Zhiyuan Yang edited comment on TEZ-3337 at 7/12/16 10:43 PM:
-

[~hitesh], [~gopalv], [~jeagles], Please help review.


was (Author: aplusplus):
[~hitesh], [~gopalv], Please help review.

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3337.1.patch
>
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3337:
--
Attachment: TEZ-3337.1.patch

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3337.1.patch
>
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3339) Add Tez Counters for bytes-read-by-network-distance FileSystem metrics

2016-07-12 Thread Ming Ma (JIRA)
Ming Ma created TEZ-3339:


 Summary: Add Tez Counters for bytes-read-by-network-distance 
FileSystem metrics
 Key: TEZ-3339
 URL: https://issues.apache.org/jira/browse/TEZ-3339
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Ming Ma


This is the Tez part of the change which is to consume 
bytes-read-by-network-distance metrics generated by HDFS-9579, like what we 
want to have in MAPREDUCE-6660.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373787#comment-15373787
 ] 

Zhiyuan Yang commented on TEZ-3337:
---

This issue doesn't apply for ATS related things. When we convert 
TaskAttemptFinishedEvent to either TimelineEntity or JSONObject, we skip the 
null fields.

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3338) Support classloader isolation

2016-07-12 Thread Ming Ma (JIRA)
Ming Ma created TEZ-3338:


 Summary: Support classloader isolation
 Key: TEZ-3338
 URL: https://issues.apache.org/jira/browse/TEZ-3338
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Ming Ma


HADOOP-10893 and MAPREDUCE-1700 provide classloader isolation at both client 
side and container side for MR. We should add the same support for Tez. Given 
we use hadoop command to launch Tez, it appears the client side has been taken 
care of. Only the container side support is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373781#comment-15373781
 ] 

Hitesh Shah edited comment on TEZ-3337 at 7/12/16 9:58 PM:
---

If this is the case, -the conversion to ATS should also not set the value if it 
is empty or null-  can you confirm that  we dont reset the value to empty for 
ATS?


was (Author: hitesh):
If this is the case, the conversion to ATS should also not set the value if it 
is empty or null. 

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373781#comment-15373781
 ] 

Hitesh Shah commented on TEZ-3337:
--

If this is the case, the conversion to ATS should also not set the value if it 
is empty or null. 

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-12 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3331:
-
Attachment: TEZ-3331.wip.4.patch

Some tests added. 

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
> Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, 
> TEZ-3331.wip.4.patch, TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-12 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created TEZ-3337:
-

 Summary: Not log empty fields of TaskAttemptFinishedEvent to avoid 
confusion
 Key: TEZ-3337
 URL: https://issues.apache.org/jira/browse/TEZ-3337
 Project: Apache Tez
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


For successful task attempt, we don't record the containerId, which cause 
"containerId=," in the INFO logs. We should avoid logging this field if it's 
empty.

{code}
2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
|history.HistoryEventHandler|: 
[HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
creationTime=1467956979891, allocationTime=1467956980426, 
startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
nodeHttpAddress=
2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
|history.HistoryEventHandler|: 
[HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
creationTime=1467956979894, allocationTime=1467956980427, 
startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
nodeHttpAddress=
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373702#comment-15373702
 ] 

Ming Ma commented on TEZ-3334:
--

[~bikassaha], for the new YARN aux service isolation, do you mean YARN-1593?

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373690#comment-15373690
 ] 

Jason Lowe commented on TEZ-3336:
-

Seems like one fix would be to simply have the MR input initializers ignore 
events rather than explode.  I'm guessing those initializers do not care at all 
about what anything else is doing -- they just want to compute splits based 
purely on the MR input.

> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> ---
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373680#comment-15373680
 ] 

Jason Lowe commented on TEZ-3336:
-

One example of the failure:
{noformat}
Vertex failed, vertexName=Map 1, vertexId=vertex_1467094199147_3081640_1_01, 
diagnostics=[Vertex vertex_1467094199147_3081640_1_01 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: input initializer failed, 
vertex=vertex_1467094199147_3081640_1_01 [Map 1], 
java.lang.UnsupportedOperationException: Not expecting to handle any events
at 
org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.handleInputInitializerEvent(MRInputAMSplitGenerator.java:170)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InitializerWrapper.sendEvents(RootInputInitializerManager.java:501)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InitializerWrapper.onTaskSucceeded(RootInputInitializerManager.java:451)
at 
org.apache.tez.dag.app.dag.StateChangeNotifier.taskSucceeded(StateChangeNotifier.java:290)
at 
org.apache.tez.dag.app.dag.impl.TaskImpl$TaskStateChangedCallback.onStateChanged(TaskImpl.java:1524)
at 
org.apache.tez.dag.app.dag.impl.TaskImpl$TaskStateChangedCallback.onStateChanged(TaskImpl.java:1508)
at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:61)
at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:918)
at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:112)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2068)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2054)
at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
]
{noformat}

RootInputInitializerManager delegates the input initializers to a thread pool 
and listens for vertex/task events while those initializers are running.  Once 
they complete it unregisters from those events.  If the initializer completes 
before an upstream task succeeds we're OK, but if a task succeeds first it ends 
up sending events to the initializer which doesn't expect any events.

Looks like MRInputSplitDistributor could have the same issue, and a fix for 
TEZ-3274 would aggravate the issue further.


> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> ---
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373670#comment-15373670
 ] 

Hitesh Shah commented on TEZ-3336:
--

\cc [~hagleitn] 

> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> ---
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-12 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3336:
---

 Summary: Hive map-side join job sometimes fails with 
ROOT_INPUT_INIT_FAILURE
 Key: TEZ-3336
 URL: https://issues.apache.org/jira/browse/TEZ-3336
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Jason Lowe


When Hive does a map-side join it can generate a DAG where a vertex has two 
inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
of the tasks for the other upstream vertex completes then the job can fail with 
an error since MRInputAMSplitGenerator does not expect to receive any events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373610#comment-15373610
 ] 

Hitesh Shah commented on TEZ-3334:
--

Other feature asks:
  -  control channel to query the shuffle service about various bits of info 
- potential stats on cache hits, failures, aborted fetches, etc
  - support for deleting data - in case of Tez, intermediate data across a long 
running session will need cleaning up.
- can we do something better from a disk usage/quote perspective? What 
happens if one app takes over too much disk space? Guess that falls under yarn 
local dirs and not really shuffle but worth thinking about?

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null

2016-07-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373411#comment-15373411
 ] 

Jason Lowe commented on TEZ-3335:
-

I thought about fixing this on the YARN side.  The YarnClient currently 
auto-redirects to the AHS when the RM doesn't know about an app.  It could 
detect that the AHS report doesn't contain a status, so therefore the app is 
essentially lost at that point.  The RM doesnt' know about it, and the AHS 
never got a completion event for it.  However I didn't want the AHS client to 
throw an exception for that case since the app report does contain _some_ 
useful information about the lost app, such as user, queue, start time, app 
name, etc.  Throwing an exception means the user gets no details about the app, 
so returning what we do know seemed more prudent.

The problem with the AHS or client trying to fix this on the YARN side is that 
we don't know what the final status of the application was.  It could be any of 
FAILED, KILLED, or SUCCEEDED if the completion event tried to get posted to the 
AHS but was dropped for some reason.  Therefore it seems a bit dangerous to 
assume one of those three.  We could always add a new status like LOST or 
UNKNOWN, etc., but of course that requires app frameworks to update themselves 
to detect and react properly to the new state.


> DAG client thinks app is still running when app status is null
> --
>
> Key: TEZ-3335
> URL: https://issues.apache.org/jira/browse/TEZ-3335
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When an RM restarts without recovering apps (i.e.: either work-preserving is 
> not enabled or state store was removed) and the YARN application history is 
> enabled then YarnClient can return an application report with the app status 
> as null.  The RM doesn't know about the application, so the client redirects 
> to the AHS.  The AHS knows the app started at some point but will never 
> received a finished event, hence the null app status.
> The DAG client fails to detect this scenario and believes the app is still 
> running, so for example Hive clients will continue to hammer for status on an 
> app that doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373352#comment-15373352
 ] 

Hitesh Shah commented on TEZ-3335:
--

\cc [~gtCarrera9] [~vinodkv]

> DAG client thinks app is still running when app status is null
> --
>
> Key: TEZ-3335
> URL: https://issues.apache.org/jira/browse/TEZ-3335
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When an RM restarts without recovering apps (i.e.: either work-preserving is 
> not enabled or state store was removed) and the YARN application history is 
> enabled then YarnClient can return an application report with the app status 
> as null.  The RM doesn't know about the application, so the client redirects 
> to the AHS.  The AHS knows the app started at some point but will never 
> received a finished event, hence the null app status.
> The DAG client fails to detect this scenario and believes the app is still 
> running, so for example Hive clients will continue to hammer for status on an 
> app that doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3335) DAG client thinks app is still running when app status is null

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373330#comment-15373330
 ] 

Hitesh Shah edited comment on TEZ-3335 at 7/12/16 5:58 PM:
---

Seems like a bug in YARN that should be fixed too? Where if the RM does not 
know about it, it means app has completed with final state/status unknown and 
therefore either the RM or AHS should inject some state denoting completion?


was (Author: hitesh):
Seems like a bug in YARN that should be fixed too? Where if the RM does not 
know about it, it means app has completed with final state/status unknown?

> DAG client thinks app is still running when app status is null
> --
>
> Key: TEZ-3335
> URL: https://issues.apache.org/jira/browse/TEZ-3335
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When an RM restarts without recovering apps (i.e.: either work-preserving is 
> not enabled or state store was removed) and the YARN application history is 
> enabled then YarnClient can return an application report with the app status 
> as null.  The RM doesn't know about the application, so the client redirects 
> to the AHS.  The AHS knows the app started at some point but will never 
> received a finished event, hence the null app status.
> The DAG client fails to detect this scenario and believes the app is still 
> running, so for example Hive clients will continue to hammer for status on an 
> app that doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null

2016-07-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373330#comment-15373330
 ] 

Hitesh Shah commented on TEZ-3335:
--

Seems like a bug in YARN that should be fixed too? Where if the RM does not 
know about it, it means app has completed with final state/status unknown?

> DAG client thinks app is still running when app status is null
> --
>
> Key: TEZ-3335
> URL: https://issues.apache.org/jira/browse/TEZ-3335
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When an RM restarts without recovering apps (i.e.: either work-preserving is 
> not enabled or state store was removed) and the YARN application history is 
> enabled then YarnClient can return an application report with the app status 
> as null.  The RM doesn't know about the application, so the client redirects 
> to the AHS.  The AHS knows the app started at some point but will never 
> received a finished event, hence the null app status.
> The DAG client fails to detect this scenario and believes the app is still 
> running, so for example Hive clients will continue to hammer for status on an 
> app that doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3335) DAG client thinks app is still running when app status is null

2016-07-12 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3335:
---

 Summary: DAG client thinks app is still running when app status is 
null
 Key: TEZ-3335
 URL: https://issues.apache.org/jira/browse/TEZ-3335
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Jason Lowe


When an RM restarts without recovering apps (i.e.: either work-preserving is 
not enabled or state store was removed) and the YARN application history is 
enabled then YarnClient can return an application report with the app status as 
null.  The RM doesn't know about the application, so the client redirects to 
the AHS.  The AHS knows the app started at some point but will never received a 
finished event, hence the null app status.

The DAG client fails to detect this scenario and believes the app is still 
running, so for example Hive clients will continue to hammer for status on an 
app that doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-12 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372798#comment-15372798
 ] 

Manuel Godbert commented on TEZ-3330:
-

I already tried that actually, with no success: the configuration property 
becomes available during shuffle but its value is the constant value of the 
tez-site.xml, not the value dynamically built at job setup.

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated TEZ-3334:

Issue Type: New Feature  (was: Bug)

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372216#comment-15372216
 ] 

Rajesh Balamohan commented on TEZ-3334:
---

+1 for custom shuffle handler.  

>> "fetching 100 pieces serially from the same mapper " 
If keep-alive connections are enabled in tez and in NM, would this be to mainly 
reduce the number of round trips?.

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)