[jira] [Updated] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.

2016-09-29 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-3419:

Attachment: TEZ-3419.3.patch

Re-attaching the patch as the pre-commit build was picking image files instead 
of the patch file.

> Tez UI: Applications page shows error, for users with only DAG level ACL 
> permission.
> 
>
> Key: TEZ-3419
> URL: https://issues.apache.org/jira/browse/TEZ-3419
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, 
> TEZ-3419.wip.1.patch, Tez data missing.png, YARN & Tez data missing.png, YARN 
> data missing.png
>
>
> Follow this logic and display better message:
> On loading app details page, send a request to 
> /ws/v1/timeline/TEZ_APPLICATION/tez_
> - If it succeed, display the details page as we do now.
> - If it fails, send a request to 
> /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A
> -- If it succeed, then we know that DAGs under the app are available and 
> assume that the user doesn't have permission to access app level data.
> --- If AHS is accessible, display application data from there in the details 
> page.
> --- else if AHS is not accessible, display a message in app details tab, 
> something like "Data is not available. Check if you are authorized to access 
> application data!".
> --- Also display the DAGs tab, for the user to see DAGs under that app.
> -- If it fails, display error message as we do now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3451) select count(*) fails with tez over cassandra

2016-09-29 Thread jean carlo rivera ura (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532837#comment-15532837
 ] 

jean carlo rivera ura commented on TEZ-3451:


Hi, Can I modify the jira in order to change it to the hive project or should I 
create another ticket ?

> select count(*) fails with tez over cassandra
> -
>
> Key: TEZ-3451
> URL: https://issues.apache.org/jira/browse/TEZ-3451
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs| sqsd |  dea| NULL|
> |QSHJKA | sqsd |  dea| NULL|
> |45s-qs   | sqsd |  dea| NULL|
> At the moment to do a "select count ( * )" over table using hive like that 
> (tez is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the table( only 3 records), I dont 
> think the data is a problem. 
> Another thing to add is that if I do  a "select *", it works perfectly fine 
> with tez. Using the engine mp, select count ( * ) and select * work fine as 
> well.
> We are using hortonworks 

[jira] [Commented] (TEZ-3428) Tez UI: First Tab not needed for few entries in DAG listings

2016-09-29 Thread Harish Jaiprakash (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532801#comment-15532801
 ] 

Harish Jaiprakash commented on TEZ-3428:


What happens if there is no data? Earlier there was a message "No records", we 
should continue to inform the user that we could not find any dags. Other than 
that the patch looks fine.

> Tez UI: First Tab not needed for few entries in DAG listings
> 
>
> Key: TEZ-3428
> URL: https://issues.apache.org/jira/browse/TEZ-3428
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3428.1.patch
>
>
> All DAGs get listed in a table in the Tez view (navigated through Hive View's 
> Query Tab) which allows for pagination. But when there are few DAGs ( less 
> than 5 e.g.), then the navigation buttons still show Tab "First" and Tab "1" 
> which can be misleading.
> It would be better to show only the page number when there is only one page 
> of DAGs. The "First" tab could become enabled/visible when there are more 
> entries into the table and the user navigates to subsequent pages.
> Suggested by ritticheria.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3432) Add unit test for session timeout

2016-09-29 Thread Harish Jaiprakash (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532694#comment-15532694
 ] 

Harish Jaiprakash commented on TEZ-3432:


Hi Sreenath,

Test mostly looks fine, please consider the following:

* The timeout of 25 is too much, I guess thats because you are iterating 10 
times. We should not be doing this, just asserting once should be enough.
* The getDiagnostics method needs to fixed, it assumes that the top diagnostic 
is the required one and that it will always have values which are long, this 
can make it flaky if someone adds another diagnostic before this. How about 
sending a prefix or substring in getDiagnostic which can be used to filter 
diagnostic and only parse those, it would be nice if we can just use 
Map so that it will be generic and we can do the parsing later.



> Add unit test for session timeout
> -
>
> Key: TEZ-3432
> URL: https://issues.apache.org/jira/browse/TEZ-3432
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sushmitha Sreenivasan
>Assignee: Sreenath Somarajapuram
>  Labels: newbie
> Attachments: TEZ-3432.1.patch, TEZ-3432.2.patch
>
>
> Add unit test which sets tez.session.am.dag.submit.timeout.secs to say 5 secs 
> and checking if dag submission timeouts after the configured time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532954#comment-15532954
 ] 

TezQA commented on TEZ-3419:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12830887/TEZ-3419.3.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.history.utils.TestDAGUtils

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1994//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1994//console

This message is automatically generated.

> Tez UI: Applications page shows error, for users with only DAG level ACL 
> permission.
> 
>
> Key: TEZ-3419
> URL: https://issues.apache.org/jira/browse/TEZ-3419
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, 
> TEZ-3419.wip.1.patch, Tez data missing.png, YARN & Tez data missing.png, YARN 
> data missing.png
>
>
> Follow this logic and display better message:
> On loading app details page, send a request to 
> /ws/v1/timeline/TEZ_APPLICATION/tez_
> - If it succeed, display the details page as we do now.
> - If it fails, send a request to 
> /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A
> -- If it succeed, then we know that DAGs under the app are available and 
> assume that the user doesn't have permission to access app level data.
> --- If AHS is accessible, display application data from there in the details 
> page.
> --- else if AHS is not accessible, display a message in app details tab, 
> something like "Data is not available. Check if you are authorized to access 
> application data!".
> --- Also display the DAGs tab, for the user to see DAGs under that app.
> -- If it fails, display error message as we do now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3419 PreCommit Build #1994

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3419
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1994/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4627 lines...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-dag
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12830887/TEZ-3419.3.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.history.utils.TestDAGUtils

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1994//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1994//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
4901eb6e9e6526f73f2d2e71ba2ae17fcbd97e5e logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.37 MB of artifacts by 26.0% relative to #1993
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.tez.dag.history.utils.TestDAGUtils.testConvertDAGPlanToATSMap

Error Message:
test timed out after 5000 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 5000 milliseconds
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at javax.xml.parsers.FactoryFinder.newInstance(FactoryFinder.java:179)
at 
javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:333)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
at 
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2387)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2364)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2281)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1108)
at 
org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider.getRecordFactory(RecordFactoryProvider.java:49)
at org.apache.hadoop.yarn.util.Records.(Records.java:32)
at 
org.apache.hadoop.yarn.api.records.Resource.newInstance(Resource.java:57)
at 
org.apache.tez.dag.history.utils.TestDAGUtils.createDAG(TestDAGUtils.java:63)
at 
org.apache.tez.dag.history.utils.TestDAGUtils.testConvertDAGPlanToATSMap(TestDAGUtils.java:105)




[jira] [Commented] (TEZ-3432) Add unit test for session timeout

2016-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532956#comment-15532956
 ] 

Hitesh Shah commented on TEZ-3432:
--

bq. I guess thats because you are iterating 10 times

To clarify, my point was to run mvn test -Dtestcase 50-100 times to ensure that 
the test is not flaky but not to have the unit test do multiple iterations. 

> Add unit test for session timeout
> -
>
> Key: TEZ-3432
> URL: https://issues.apache.org/jira/browse/TEZ-3432
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sushmitha Sreenivasan
>Assignee: Sreenath Somarajapuram
>  Labels: newbie
> Attachments: TEZ-3432.1.patch, TEZ-3432.2.patch
>
>
> Add unit test which sets tez.session.am.dag.submit.timeout.secs to say 5 secs 
> and checking if dag submission timeouts after the configured time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3450 PreCommit Build #1995

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3450
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1995/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 84 lines...]
Going to apply patch with: /usr/bin/patch -p0
patching file 
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
patching file 
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java


==
==
Determining number of patched javac warnings.
==
==


/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch > 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt
 2>&1




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12830441/TEZ-3450.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1995//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ff4fe45ea533dea234418b1ad337d3a33b5925b4 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-3450) Avoid reading Max Failed Attempts from configuration for each task

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532993#comment-15532993
 ] 

TezQA commented on TEZ-3450:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12830441/TEZ-3450.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1995//console

This message is automatically generated.

> Avoid reading Max Failed Attempts from configuration for each task
> --
>
> Key: TEZ-3450
> URL: https://issues.apache.org/jira/browse/TEZ-3450
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: darion yaphet
>Assignee: darion yaphet
> Attachments: TEZ-3450.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3451) select count(*) fails with tez over cassandra

2016-09-29 Thread darion yaphet (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532795#comment-15532795
 ] 

darion yaphet commented on TEZ-3451:


Hi , this is seems a exception on hive 

> select count(*) fails with tez over cassandra
> -
>
> Key: TEZ-3451
> URL: https://issues.apache.org/jira/browse/TEZ-3451
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs| sqsd |  dea| NULL|
> |QSHJKA | sqsd |  dea| NULL|
> |45s-qs   | sqsd |  dea| NULL|
> At the moment to do a "select count ( * )" over table using hive like that 
> (tez is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the table( only 3 records), I dont 
> think the data is a problem. 
> Another thing to add is that if I do  a "select *", it works perfectly fine 
> with tez. Using the engine mp, select count ( * ) and select * work fine as 
> well.
> We are using hortonworks version 2.3.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3454) Support UniformityPartitioner

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535042#comment-15535042
 ] 

TezQA commented on TEZ-3454:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831052/TEZ-3454.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1999//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1999//console

This message is automatically generated.

> Support UniformityPartitioner
> -
>
> Key: TEZ-3454
> URL: https://issues.apache.org/jira/browse/TEZ-3454
> Project: Apache Tez
>  Issue Type: New Feature
>Affects Versions: 0.8.4
>Reporter: darion yaphet
>Assignee: darion yaphet
> Attachments: TEZ-3454.1.patch
>
>
> Sometimes we just want to use Tez as a framework to process a large volume 
> dataset . It's not necessary to collect the same key into a key:value-list 
> which maybe cause data skew .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3454 PreCommit Build #1999

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3454
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1999/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4807 lines...]
[INFO] 
[INFO] Total time: 58:16 min
[INFO] Finished at: 2016-09-30T05:02:00+00:00
[INFO] Final Memory: 78M/1104M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831052/TEZ-3454.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1999//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1999//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ed4d5683a88a039a7c07f419e0cdc4f0e07659bb logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534831#comment-15534831
 ] 

TezQA commented on TEZ-3437:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831032/TEZ-3437.007.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1997//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1997//console

This message is automatically generated.

> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> -
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, 
> TEZ-3437.006.patch, TEZ-3437.007.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-3437 PreCommit Build #1997

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3437
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1997/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4827 lines...]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 58:45 min
[INFO] Finished at: 2016-09-30T02:48:51+00:00
[INFO] Final Memory: 82M/1407M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831032/TEZ-3437.007.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1997//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1997//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
fdc882f90a69f35478aa9bfd9088c325adef66f5 logged out


==
==
Finished build.
==
==


Archiving artifacts
Compressed 3.37 MB of artifacts by 23.2% relative to #1993
[description-setter] Description set: TEZ-3437
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-3454) Support UniformityPartitioner

2016-09-29 Thread darion yaphet (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

darion yaphet updated TEZ-3454:
---
Attachment: TEZ-3454.1.patch

> Support UniformityPartitioner
> -
>
> Key: TEZ-3454
> URL: https://issues.apache.org/jira/browse/TEZ-3454
> Project: Apache Tez
>  Issue Type: New Feature
>Affects Versions: 0.8.4
>Reporter: darion yaphet
>Assignee: darion yaphet
> Attachments: TEZ-3454.1.patch
>
>
> Sometimes we just want to use Tez as a framework to process a large volume 
> dataset . It's not necessary to collect the same key into a key:value-list 
> which maybe cause data skew .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3454) Support UniformityPartitioner

2016-09-29 Thread darion yaphet (JIRA)
darion yaphet created TEZ-3454:
--

 Summary: Support UniformityPartitioner
 Key: TEZ-3454
 URL: https://issues.apache.org/jira/browse/TEZ-3454
 Project: Apache Tez
  Issue Type: New Feature
Affects Versions: 0.8.4
Reporter: darion yaphet
Assignee: darion yaphet


Sometimes we just want to use Tez as a framework to process a large volume 
dataset . It's not necessary to collect the same key into a key:value-list 
which maybe cause data skew .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534868#comment-15534868
 ] 

TezQA commented on TEZ-3453:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831027/TEZ-3453.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1998//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1998//console

This message is automatically generated.

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
> Attachments: 16.patch, TEZ-3453.1.patch
>
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3453 PreCommit Build #1998

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3453
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1998/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4811 lines...]
[INFO] 
[INFO] Total time: 01:01 h
[INFO] Finished at: 2016-09-30T03:13:19+00:00
[INFO] Final Memory: 80M/1397M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831027/TEZ-3453.1.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1998//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1998//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
03674b4613f7b56b98621e1406d3f61a7ce203a5 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533391#comment-15533391
 ] 

Kuhu Shukla commented on TEZ-3362:
--

The patch missed out some checks in LocalContainerLauncher and AMNodeTracker. 
Updating patch momentarily.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533633#comment-15533633
 ] 

Jonathan Eagles commented on TEZ-3362:
--

[~kshukla], couple of minor things. In general, I think we can live with the 
current design for dag level delete and can do the future vertex level design 
in a follow up jira. Once these are fixed, I think this patch can go in.

{code:title=AMNodeImpl}
  // Access should be package
  public int shufflePort = ShuffleUtils.UNDEFINED_PORT;
{code}

{code:title=AMNodeTracker#nodeSeen}
  // This get is using the getName instead of getID().getId()
  Set nodeIds = perDagNodeMap.get(appContext.getCurrentDAG().getName());
{code}

{code:title=AMNodeTracker#dagDelete}
  // we should protect ourselves from null pointer
  for (NodeId nodeId : getPerDagNodeMap().get(dag.getID().getId())) {
{code}


> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread Dharmesh Kakadia (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dharmesh Kakadia updated TEZ-3453:
--
Attachment: 16.patch

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
> Attachments: 16.patch
>
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread Dharmesh Kakadia (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534137#comment-15534137
 ] 

Dharmesh Kakadia commented on TEZ-3453:
---

cc [~hitesh] [~bikassaha]

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
> Attachments: 16.patch
>
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread Dharmesh Kakadia (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dharmesh Kakadia updated TEZ-3453:
--
Summary: Correct the downloaded ATS dag data location for analyzer  (was: 
Hive + Tez not able to push-down common sub expression )

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.005.patch

Thank you [~jeagles] for the review comments. Addressed those and additionally 
made the thread pool for the deletion executor service to be fixed by a 
configured number of threads, shutdown the executor when serviceStop is called.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533903#comment-15533903
 ] 

Jonathan Eagles commented on TEZ-3362:
--

+1. Looks good to me. We can address the vertex design in the vertex delete 
JIRA or perhaps another jira. I'll leave this open for today to allow [~hitesh] 
to comment.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3451) select count(*) fails with tez over cassandra

2016-09-29 Thread jean carlo rivera ura (JIRA)
jean carlo rivera ura created TEZ-3451:
--

 Summary: select count(*) fails with tez over cassandra
 Key: TEZ-3451
 URL: https://issues.apache.org/jira/browse/TEZ-3451
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: jean carlo rivera ura


Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions

campaign_id |   sid  |  name  |  ts
-
45sqdqs| sqsd |  dea| NULL
QSHJKA | sqsd |  dea| NULL
45s-qs   | sqsd |  dea| NULL

At the moment to do a "select count(*)" over table using hive like that (tez is 
our engine by default)
{code} hive -e "select count(*) from table1;" {code}

I got this error:

{code}
Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at 
org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more
{code}

So far I understand, in readfields we are getting more data that we are 
expecting. But considering the size of the table( only 3 records), I dont think 
the data is a problem. 

Another thing to add is that if I do  a "select *", it works perfectly fine 
with tez :) . Using the engine mp, select count(*) and select * work fine as 
well.

We are using hortonworks version 2.3.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3451) select count(*) fails with tez over cassandra

2016-09-29 Thread jean carlo rivera ura (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jean carlo rivera ura updated TEZ-3451:
---
Description: 
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions


||campaign_id ||   sid  ||  name  ||  ts||

|45sqdqs| sqsd |  dea| NULL|
|QSHJKA | sqsd |  dea| NULL|
|45s-qs   | sqsd |  dea| NULL|


At the moment to do a "select count(*)" over table using hive like that (tez is 
our engine by default)
{code} hive -e "select count(*) from table1;" {code}

I got this error:

{code}
Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at 
org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more
{code}

So far I understand, in readfields we are getting more data that we are 
expecting. But considering the size of the table( only 3 records), I dont think 
the data is a problem. 

Another thing to add is that if I do  a "select *", it works perfectly fine 
with tez :) . Using the engine mp, select count(*) and select * work fine as 
well.

We are using hortonworks version 2.3.2

  was:
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions

campaign_id |   sid  |  name  |  ts
-
45sqdqs| sqsd |  dea| NULL
QSHJKA | sqsd |  dea| NULL
45s-qs   | sqsd |  dea 

[jira] [Updated] (TEZ-3451) select count(*) fails with tez over cassandra

2016-09-29 Thread jean carlo rivera ura (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jean carlo rivera ura updated TEZ-3451:
---
Description: 
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions


||campaign_id ||   sid  ||  name  ||  ts||

|45sqdqs| sqsd |  dea| NULL|
|QSHJKA | sqsd |  dea| NULL|
|45s-qs   | sqsd |  dea| NULL|


At the moment to do a "select count ( * )" over table using hive like that (tez 
is our engine by default)
{code} hive -e "select count(*) from table1;" {code}

I got this error:

{code}
Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at 
org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more
{code}

So far I understand, in readfields we are getting more data that we are 
expecting. But considering the size of the table( only 3 records), I dont think 
the data is a problem. 

Another thing to add is that if I do  a "select *", it works perfectly fine 
with tez. Using the engine mp, select count ( * ) and select * work fine as 
well.

We are using hortonworks version 2.3.2

  was:
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions


||campaign_id ||   sid  ||  name  ||  ts||

|45sqdqs| sqsd |  dea| NULL|
|QSHJKA | sqsd |  dea| NULL|
|45s-qs   | sqsd |  dea| NULL|


At the 

[jira] [Comment Edited] (TEZ-3450) Avoid reading Max Failed Attempts from configuration for each task

2016-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534402#comment-15534402
 ] 

Hitesh Shah edited comment on TEZ-3450 at 9/29/16 11:29 PM:


Additional point - the max attempts should only be read from the vertex conf 
after the vertex conf has been updated with additional vertex specific params 
from the plan proto. 


was (Author: hitesh):
Additional point - the max attempts should only be set after the vertex conf 
has been updated with additional vertex specific params from the plan proto. 

> Avoid reading Max Failed Attempts from configuration for each task
> --
>
> Key: TEZ-3450
> URL: https://issues.apache.org/jira/browse/TEZ-3450
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: darion yaphet
>Assignee: darion yaphet
> Attachments: TEZ-3450.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3361) Fetch Multiple Partitions from the Shuffle Handler

2016-09-29 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3361:
-
Attachment: TEZ-3361.4.patch

> Fetch Multiple Partitions from the Shuffle Handler
> --
>
> Key: TEZ-3361
> URL: https://issues.apache.org/jira/browse/TEZ-3361
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3361.1.patch, TEZ-3361.2.patch, TEZ-3361.3.patch, 
> TEZ-3361.4.patch
>
>
> Provide an API that allows for fetching multiple partitions at once from a 
> single upstream task. This is to better support auto-reduce parallelism where 
> a single downstream task is impersonating several (possibly?) consecutive 
> downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3361) Fetch Multiple Partitions from the Shuffle Handler

2016-09-29 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534411#comment-15534411
 ] 

Jonathan Eagles commented on TEZ-3361:
--

[~kshukla], I addressed your comments. Posted new patch and verified tests are 
passing.

> Fetch Multiple Partitions from the Shuffle Handler
> --
>
> Key: TEZ-3361
> URL: https://issues.apache.org/jira/browse/TEZ-3361
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3361.1.patch, TEZ-3361.2.patch, TEZ-3361.3.patch, 
> TEZ-3361.4.patch
>
>
> Provide an API that allows for fetching multiple partitions at once from a 
> single upstream task. This is to better support auto-reduce parallelism where 
> a single downstream task is impersonating several (possibly?) consecutive 
> downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3452) Auto-reduce parallelism calculation can overflow with large inputs

2016-09-29 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3452:
-
Description: 
Overflow can occur when the numTasks is high (say 45000) and outputSize is high 
(say 311TB) and slow start is set to 1.0. 
{code:title=ShuffleVertexManager}
for (Map.Entry vInfo : getBipartiteInfo()) {
  SourceVertexInfo srcInfo = vInfo.getValue();
  if (srcInfo.numTasks > 0 && srcInfo.numVMEventsReceived > 0) {
// this assumes that 1 vmEvent is received per completed task - TEZ-2961
expectedTotalSourceTasksOutputSize += 
(srcInfo.numTasks * srcInfo.outputSize) / 
srcInfo.numVMEventsReceived;
  }
}
{code}

> Auto-reduce parallelism calculation can overflow with large inputs
> --
>
> Key: TEZ-3452
> URL: https://issues.apache.org/jira/browse/TEZ-3452
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>
> Overflow can occur when the numTasks is high (say 45000) and outputSize is 
> high (say 311TB) and slow start is set to 1.0. 
> {code:title=ShuffleVertexManager}
> for (Map.Entry vInfo : getBipartiteInfo()) {
>   SourceVertexInfo srcInfo = vInfo.getValue();
>   if (srcInfo.numTasks > 0 && srcInfo.numVMEventsReceived > 0) {
> // this assumes that 1 vmEvent is received per completed task - 
> TEZ-2961
> expectedTotalSourceTasksOutputSize += 
> (srcInfo.numTasks * srcInfo.outputSize) / 
> srcInfo.numVMEventsReceived;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3437) hhjerekretudubjdkbktjvgjrd

2016-09-29 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3437:
-
Summary: hhjerekretudubjdkbktjvgjrd  (was: Improve synchronization and the 
progress report behavior for Inputs from TEZ-3317)

> hhjerekretudubjdkbktjvgjrd
> --
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, TEZ-3437.006.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

2016-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534341#comment-15534341
 ] 

Siddharth Seth commented on TEZ-3437:
-

- On the interface, can we include InterruptedException as well?
- Think this bit was missed out in the consolidation - MapProcessor is creating 
the same Runnable that is created in ProgressHelper
- Bunch of getters/setters can be removed from ProgressHelper after this, and 
some final fields? | Or was this MapProcessor change done for some reason?
- Nit: Name the thread in ProgressHelper based on the invoker, set as daemon 
(ThreadFactoryBuilder)
- {code}
+if (e.getCause() instanceof IOException) {
+  throw new IOException(e);
+}
{code}  Cast to IOException, instead of creating a new IOException.Likewise for 
InterruptedException.
Same place - instead of returning 0.0f as progress, this should throw a 
RuntimException.
- TezTaskContextImpl.getRuntimeTask - why is this required? (Don't think it's 
used anywhere)
- Nit: private AtomicLong totalBytesRead = new AtomicLong(0); - should be 
final. Also other similar fields.


> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> -
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, TEZ-3437.006.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

2016-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534366#comment-15534366
 ] 

Siddharth Seth commented on TEZ-3437:
-

On the setProgress bit - setProgress() is mainly used for keep alive. The 
change made earlier - which invokes setProgress() makes sense. Apologies for 
the confusion caused by previous comments. The float check there however, 
likely needs to be changed. Equality on float is not great.

> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> -
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, TEZ-3437.006.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-09-29 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534371#comment-15534371
 ] 

Rajesh Balamohan commented on TEZ-2741:
---

Key/Val references can not be updated at the higher up in the chain. 
Alternative is to use {{--hiveconf hive.compute.splits.in.am=false}} to 
workaround this issue.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 

[jira] [Commented] (TEZ-3450) Avoid reading Max Failed Attempts from configuration for each task

2016-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534372#comment-15534372
 ] 

Siddharth Seth commented on TEZ-3450:
-

Good catch. Thought we had already cleaned up all such config lookups. Think 
the patch will need some test changes to compile.

> Avoid reading Max Failed Attempts from configuration for each task
> --
>
> Key: TEZ-3450
> URL: https://issues.apache.org/jira/browse/TEZ-3450
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: darion yaphet
>Assignee: darion yaphet
> Attachments: TEZ-3450.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

2016-09-29 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3437:
-
Summary: Improve synchronization and the progress report behavior for 
Inputs from TEZ-3317  (was: hhjerekretudubjdkbktjvgjrd)

> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> -
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, TEZ-3437.006.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread Dharmesh Kakadia (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dharmesh Kakadia updated TEZ-3453:
--
Attachment: TEZ-3453.1.patch

Looks like it needs the patch file name in specific format

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
> Attachments: 16.patch, TEZ-3453.1.patch
>
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

2016-09-29 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3437:
-
Attachment: TEZ-3437.007.patch

Thank you so much [~sseth] for the quick and detailed comments. 

I have added the setProgress check with 0.001f epsilon. Is that a good 
granularity?

Also, for adding InterruptedException to the interface, I added it to the 
getProgress call in AbstractLogicalInput/MergedLogicalInput. Is that what you 
meant? 
Also throwing a runtimeException would bubble up and may cause failures if the 
progress update fails. Is that ok to have?

Appreciate more comments. Thanks!


> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> -
>
> Key: TEZ-3437
> URL: https://issues.apache.org/jira/browse/TEZ-3437
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch, TEZ-3437.004.patch, TEZ-3437.005.patch, 
> TEZ-3437.006.patch, TEZ-3437.007.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3453 PreCommit Build #1996

2016-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3453
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1996/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4812 lines...]
[INFO] Total time: 56:46 min
[INFO] Finished at: 2016-09-30T00:22:20+00:00
[INFO] Final Memory: 83M/1363M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831002/16.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1996//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1996//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
df303e1cbfbabef9134faa112ce9e885507e9918 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.37 MB of artifacts by 26.0% relative to #1993
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3453) Correct the downloaded ATS dag data location for analyzer

2016-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534554#comment-15534554
 ] 

TezQA commented on TEZ-3453:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12831002/16.patch
  against master revision 5c2f893.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1996//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1996//console

This message is automatically generated.

> Correct the downloaded ATS dag data location for analyzer
> -
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
> Attachments: 16.patch, TEZ-3453.1.patch
>
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3453) Hive + Tez not able to push-down common sub expression

2016-09-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534110#comment-15534110
 ] 

ASF GitHub Bot commented on TEZ-3453:
-

Github user dharmeshkakadia commented on the issue:

https://github.com/apache/tez/pull/16
  
Here : https://issues.apache.org/jira/browse/TEZ-3453


> Hive + Tez not able to push-down common sub expression 
> ---
>
> Key: TEZ-3453
> URL: https://issues.apache.org/jira/browse/TEZ-3453
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Priority: Minor
>
> hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
> --dagId=dag_1475171170456_0002_1 --outputDir=tmp/
> fails with
> INFO history.ATSImportTool: Using 
> baseURL=http://headnodehost:8188/ws/v1/timeline, 
> dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
> java.lang.IllegalArgumentException: Zipfile 
> tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
> at 
> org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)
> The code is incorrectly expecting it to be in 
> subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
> downloaded data location to the subfolder fixes the issue. This PR corrects 
> the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3453) Hive + Tez not able to push-down common sub expression

2016-09-29 Thread Dharmesh Kakadia (JIRA)
Dharmesh Kakadia created TEZ-3453:
-

 Summary: Hive + Tez not able to push-down common sub expression 
 Key: TEZ-3453
 URL: https://issues.apache.org/jira/browse/TEZ-3453
 Project: Apache Tez
  Issue Type: Bug
Reporter: Dharmesh Kakadia
Priority: Minor


hadoop jar /usr/hdp/current/tez-client/tez-job-analyzer-*.jar CriticalPath 
--dagId=dag_1475171170456_0002_1 --outputDir=tmp/
fails with

INFO history.ATSImportTool: Using 
baseURL=http://headnodehost:8188/ws/v1/timeline, 
dagId=dag_1475171170456_0002_1, batchSize=100, downloadDir=tmp
java.lang.IllegalArgumentException: Zipfile 
tmp/dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip does not exist
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.tez.history.parser.ATSFileParser.(ATSFileParser.java:65)
at org.apache.tez.analyzer.plugins.TezAnalyzerBase.run(TezAnalyzerBase.java:169)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at 
org.apache.tez.analyzer.plugins.CriticalPathAnalyzer.main(CriticalPathAnalyzer.java:653)

The code is incorrectly expecting it to be in 
subfolder(dag_1475171170456_0002_1/dag_1475171170456_0002_1.zip). Moving the 
downloaded data location to the subfolder fixes the issue. This PR corrects the 
expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3452) Auto-reduce parallelism calculation can overflow with large inputs

2016-09-29 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3452:


 Summary: Auto-reduce parallelism calculation can overflow with 
large inputs
 Key: TEZ-3452
 URL: https://issues.apache.org/jira/browse/TEZ-3452
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534755#comment-15534755
 ] 

Kuhu Shukla commented on TEZ-3362:
--

bq.  I am not sure how this works in the hive-llap mode.
This implementation would not work in LLAP world AFAIK. The generic service 
plugin design that will follow shortly after this, will move the AMNodeTracker 
changes and do scheduler service lookups and node connections for deleting 
paths instead of handling specific cases(like it does today), similar to 
QueryTracker and similar components in Hive. I need to understand the hybrid 
case and how LLAP daemons work. Will look into that starting tomorrow. Welcome 
more comments/ideas or pointers on how to model the design for the deletion 
service plugin. Thanks a lot!

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)