[jira] [Created] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-2732:
-

 Summary: DefaultSorter throws ArrayIndex exceptions on 2047 Mb 
size sort buffers
 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


{noformat}
  kvbuffer.length = 2146435072 (2047 MB)
  Corner case: bufIndex=2026133899, kvbidx=523629312.
  distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
  newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
overflow)
{noformat}

Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2731:
-
Issue Type: Sub-task  (was: Improvement)
Parent: TEZ-2605

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Gopal V
 Attachments: lock-inc.png, mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2541) DAGClientImpl enable TimelineClient check is wrong.

2015-08-20 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704373#comment-14704373
 ] 

Prakash Ramachandran commented on TEZ-2541:
---

[~hitesh] This has been handled as part of the TEZ-1529 port to branch 0.5. so 
no action required here for 0.5

 DAGClientImpl enable TimelineClient check is wrong.
 ---

 Key: TEZ-2541
 URL: https://issues.apache.org/jira/browse/TEZ-2541
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Fix For: 0.6.2, 0.8.0, 0.7.1

 Attachments: TEZ-2541.1.patch, TEZ-2541.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2690 PreCommit Build #1007

2015-08-20 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2690
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1007/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3297 lines...]


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12751363/TEZ-2690.2.patch
  against master revision 24ca1de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 16 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1007//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1007//artifact/patchprocess/newPatchFindbugsWarningsjob-analyzer.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
b7e441a06c04ec9fd740272940d90f7d5cf0 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1000
Archived 50 artifacts
Archive block size is 32768
Received 2 blocks and 3065049 bytes
Compression is 2.1%
Took 1 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2690) Add critical path analyser

2015-08-20 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704372#comment-14704372
 ] 

TezQA commented on TEZ-2690:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12751363/TEZ-2690.2.patch
  against master revision 24ca1de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 16 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1007//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1007//artifact/patchprocess/newPatchFindbugsWarningsjob-analyzer.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//console

This message is automatically generated.

 Add critical path analyser
 --

 Key: TEZ-2690
 URL: https://issues.apache.org/jira/browse/TEZ-2690
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2690.1.patch, TEZ-2690.2.patch, criticalPath.jpg, 
 dag_1439860407967_0030_1.svg


 Use input and scheduling dependencies to create critical path for a DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706236#comment-14706236
 ] 

Rajesh Balamohan commented on TEZ-2731:
---

java version: 1.7.0_67

java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions 
-XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias
  bool UseBiasedLocking  = true{product}

It is enabled as default in latest JVMs.

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706254#comment-14706254
 ] 

Rajesh Balamohan commented on TEZ-2732:
---

Attaching the patch for review.  [~hitesh], [~sseth] - Please review when you 
find time.
- Capping buffer to 1800.
- Added tests which would reproduce this issue with 
DefaultSorter.MAX_IO_SORT_MB=2047 and when io.sort.mb is set to 2047.
- Disabled these tests by default as it would need  2 GB containers in test 
env.

 DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
 ---

 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2732.1.patch


 {noformat}
   kvbuffer.length = 2146435072 (2047 MB)
   Corner case: bufIndex=2026133899, kvbidx=523629312.
   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
 overflow)
 {noformat}
 Would be good to restrict the max allowed sort buffer to 1800 instead of 
 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706230#comment-14706230
 ] 

Hitesh Shah commented on TEZ-2687:
--

bq. Any difference between sleeping before and after ats events flushed to ATS 
? Do you concern about the DAGClient ?  

Consider a test which wants to verify that containers are released but also at 
the same time verify that the data is being pushed into timeline. No real 
functional concern apart from the fact that it would be better to finish all 
the real work as soon as possible and then wait instead of waiting first. 

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706236#comment-14706236
 ] 

Rajesh Balamohan edited comment on TEZ-2731 at 8/21/15 5:04 AM:


java version: 1.7.0_67

{noformat}
java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions 
-XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias
  bool UseBiasedLocking  = true{product}
{noformat}
It is enabled as default in latest JVMs.


was (Author: rajesh.balamohan):
java version: 1.7.0_67

java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions 
-XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias
  bool UseBiasedLocking  = true{product}

It is enabled as default in latest JVMs.

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706267#comment-14706267
 ] 

Hitesh Shah commented on TEZ-2732:
--

+1 pending pre-commit 

 DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
 ---

 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2732.1.patch


 {noformat}
   kvbuffer.length = 2146435072 (2047 MB)
   Corner case: bufIndex=2026133899, kvbidx=523629312.
   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
 overflow)
 {noformat}
 Would be good to restrict the max allowed sort buffer to 1800 instead of 
 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706228#comment-14706228
 ] 

Rajesh Balamohan commented on TEZ-2731:
---

lgtm. +1.
Had it been uncontended synchronization UseBiasedLocking (believe this is 
enabled in default in JDK 7) would have been triggered and not seen this level 
of drop in perf with sync. But appears that it is relatively contended with 
less threads in which case atomic long help a lot in improving the perf.

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706229#comment-14706229
 ] 

Gopal V commented on TEZ-2731:
--

[~rajesh.balamohan]: I thought UseBiasedLocking had some issues with 
IdentityHashMap (so is not on by default)?

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706253#comment-14706253
 ] 

Rajesh Balamohan commented on TEZ-2732:
---

One more place where similar overflow can happen is in write() (bufindex + len 
can get into -ve space). In such cases, it would end up throwing following 
exception
{noformat}
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(DefaultSorter.java:648)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(DefaultSorter.java:544)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
at org.apache.hadoop.io.Text.write(Text.java:330)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
{noformat}

 DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
 ---

 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2732.1.patch


 {noformat}
   kvbuffer.length = 2146435072 (2047 MB)
   Corner case: bufIndex=2026133899, kvbidx=523629312.
   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
 overflow)
 {noformat}
 Would be good to restrict the max allowed sort buffer to 1800 instead of 
 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2732:
--
Attachment: TEZ-2732.1.patch

 DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
 ---

 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2732.1.patch


 {noformat}
   kvbuffer.length = 2146435072 (2047 MB)
   Corner case: bufIndex=2026133899, kvbidx=523629312.
   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
 overflow)
 {noformat}
 Would be good to restrict the max allowed sort buffer to 1800 instead of 
 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2687:

Attachment: TEZ-2687-4.patch

Minor update to address the comments. Commit it soon. 

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2687:

Attachment: TEZ-2687-6.patch

Add new config tez.test.history-service.stop.sleep.secs for system test to 
simulate the ATS hang behavior

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706142#comment-14706142
 ] 

Hitesh Shah commented on TEZ-2687:
--

[~zjffdu] I think the test config sleep fix should be restricted to the 
ATSHistoryLoggerService. Furthermore the new config property does not need to 
be declared in TezConfiguration. It can be just in ATSHistoryLoggingService 
only. Lastly, the sleep should happen *after* all ats events are flushed to 
ATS. The current sleep is being done before the flush happens which seems 
incorrect. 

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705970#comment-14705970
 ] 

Bikas Saha commented on TEZ-2687:
-

Minor typo - +LOG.info(Realease held containers);
Minor array list could be given initial size - +ListObject tasks = new 
ArrayListObject();

Rest looks good. +1. Not sure this needs to go all the way to 0.5

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services

2015-08-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2003:

Attachment: 2003_20150820.1.txt

 [Umbrella] Allow Tez to co-ordinate execution to external services
 --

 Key: TEZ-2003
 URL: https://issues.apache.org/jira/browse/TEZ-2003
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
 Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt, 
 2003_20150807.2.txt, 2003_20150812.1.txt, 2003_20150812.2.txt, 
 2003_20150814.1.txt, 2003_20150814.2.txt, 2003_20150820.1.txt, Tez With 
 External Services.pdf


 The Tez engine itself takes care of co-ordinating execution - controlling how 
 data gets routed (different connection patterns), fault tolerance, scheduling 
 of work, etc.
 This is currently tied to TaskSpecs defined within Tez and on containers 
 launched by Tez itself (TezChild).
 The proposal is to allow Tez to work with external services instead of just 
 containers launched by Tez. This involves several more pluggable layers to 
 work with alternate Task Specifications, custom launch and task allocation 
 mechanics, as well as custom scheduling sources.
 A simple example would be a simple a process with the capability to execute 
 multiple Tez TaskSpecs as threads. In such a case, a container launch isn't 
 really need and can be mocked. Sourcing / scheduling containers would need to 
 be pluggable.
 A more advanced example would be LLAP (HIVE-7926; 
 https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf).
 This works with custom interfaces - which would need to be supported by Tez, 
 along with a custom event model which would need translation hooks.
 Tez should be able to work with a combination of certain vertices running in 
 external services and others running in regular Tez containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706150#comment-14706150
 ] 

Jeff Zhang commented on TEZ-2687:
-

bq. Instead of touching code that will be run in prod, I suggest writing a 
VerySlowHistoryLoggingService impl for the test-cases.
make sense since HistoryLoggingService is pluggable. 

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2687:

Attachment: TEZ-2687-7.patch

 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705422#comment-14705422
 ] 

Hitesh Shah commented on TEZ-2164:
--

Thanks for the review [~rajesh.balamohan]. The unintentional changes are due to 
the handling of core.autocrlf in git. It seems the general good practice is to 
have is to have a newline at the end of files. Will file a follow up jira for 
the import re-orders. 

[~sseth] Any suggestions on the build issue for guava-tez? Should there be a 
one-off build to publish/deploy the guava-tez shaded jar before committing this 
patch?  

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max

2015-08-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2629:

Attachment: TEZ-2629_branch07_1.txt
TEZ-2629_branch06_1.txt
TEZ-2629_branch05_1.txt

Additional patches for different branches.

Thanks for the review. Committing.

 LimitExceededException in Tez client when DAG has exceeds the default max
 -

 Key: TEZ-2629
 URL: https://issues.apache.org/jira/browse/TEZ-2629
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Jason Dere
Assignee: Siddharth Seth
 Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, 
 TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt


 Original issue was HIVE-11303, seeing LimitExceededException when the client 
 tries to get the counters for a completed job:
 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 It looks like Limits.ensureInitialized() is defaulting to an empty 
 configuration, resulting in COUNTERS_MAX being set to the default of 1200 
 (even though Hive's configuration specified tez.counters.max=16000).
 Per [~sseth]:
 {quote}
 I think the Tez client does need to make this call to setup the Configuration 
 correctly. We do this for the AM and the executing task - which is why it 
 works. Could you please open a Tez jira for this ?
 Also, Limits is making use of Configuration instead of TezConfiguration for 
 default initialization, which implies changes to tez-site on the local node 
 won't be picked up.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max counters

2015-08-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2629:

Summary: LimitExceededException in Tez client when DAG has exceeds the 
default max counters  (was: LimitExceededException in Tez client when DAG has 
exceeds the default max)

 LimitExceededException in Tez client when DAG has exceeds the default max 
 counters
 --

 Key: TEZ-2629
 URL: https://issues.apache.org/jira/browse/TEZ-2629
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0, 0.6.0, 0.7.0
Reporter: Jason Dere
Assignee: Siddharth Seth
 Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, 
 TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt


 Original issue was HIVE-11303, seeing LimitExceededException when the client 
 tries to get the counters for a completed job:
 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 It looks like Limits.ensureInitialized() is defaulting to an empty 
 configuration, resulting in COUNTERS_MAX being set to the default of 1200 
 (even though Hive's configuration specified tez.counters.max=16000).
 Per [~sseth]:
 {quote}
 I think the Tez client does need to make this call to setup the Configuration 
 correctly. We do this for the AM and the executing task - which is why it 
 works. Could you please open a Tez jira for this ?
 Also, Limits is making use of Configuration instead of TezConfiguration for 
 default initialization, which implies changes to tez-site on the local node 
 won't be picked up.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max

2015-08-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2629:

Affects Version/s: 0.6.0
   0.7.0

 LimitExceededException in Tez client when DAG has exceeds the default max
 -

 Key: TEZ-2629
 URL: https://issues.apache.org/jira/browse/TEZ-2629
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0, 0.6.0, 0.7.0
Reporter: Jason Dere
Assignee: Siddharth Seth
 Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, 
 TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt


 Original issue was HIVE-11303, seeing LimitExceededException when the client 
 tries to get the counters for a completed job:
 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 It looks like Limits.ensureInitialized() is defaulting to an empty 
 configuration, resulting in COUNTERS_MAX being set to the default of 1200 
 (even though Hive's configuration specified tez.counters.max=16000).
 Per [~sseth]:
 {quote}
 I think the Tez client does need to make this call to setup the Configuration 
 correctly. We do this for the AM and the executing task - which is why it 
 works. Could you please open a Tez jira for this ?
 Also, Limits is making use of Configuration instead of TezConfiguration for 
 default initialization, which implies changes to tez-site on the local node 
 won't be picked up.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2733) Add information about container assignment to attempts

2015-08-20 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-2733:
---

 Summary: Add information about container assignment to attempts
 Key: TEZ-2733
 URL: https://issues.apache.org/jira/browse/TEZ-2733
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Bikas Saha






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2690) Add critical path analyser

2015-08-20 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2690:

Attachment: TEZ-2690.3.patch

 Add critical path analyser
 --

 Key: TEZ-2690
 URL: https://issues.apache.org/jira/browse/TEZ-2690
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-2690.1.patch, TEZ-2690.2.patch, TEZ-2690.3.patch, 
 criticalPath.jpg, dag_1439860407967_0030_1.svg


 Use input and scheduling dependencies to create critical path for a DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-2628:

Attachment: TEZ-2628.002.patch

Yes, there's a problem with retention on a secure cluster.  The timeline server 
does not have write permissions to the application directory which prevents it 
from removing the app directory tree when it's time to remove it from the done 
directory.  Updating the original patch with a quick fix that provides group 
write permissions on the app directory.

 History logging plugin to write ATS events to HDFS
 --

 Key: TEZ-2628
 URL: https://issues.apache.org/jira/browse/TEZ-2628
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: TEZ-2628.001.patch, TEZ-2628.002.patch, 
 hive-timeline.json


 This provides another history logging alternative that conceptually the same 
 as the timeline logging service but logs the entities to a file rather than 
 posting the events to the timeline server directly.  When coupled with the 
 timeline store plugin from YARN-3942 it allows the Tez job to be decoupled 
 from the timeline server yet the Tez UI can still function properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2690 PreCommit Build #1008

2015-08-20 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2690
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1008/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by remote host 0:0:0:0:0:0:0:1
Building remotely on H5 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in 
workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git 
  # timeout=10
FATAL: Failed to fetch from https://git-wip-us.apache.org/repos/asf/tez.git
hudson.plugins.git.GitException: Failed to fetch from 
https://git-wip-us.apache.org/repos/asf/tez.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:647)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:889)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:914)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1252)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524)
at hudson.model.Run.execute(Run.java:1706)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:232)
Caused by: hudson.plugins.git.GitException: Error performing git command
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1444)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1411)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1407)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1110)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1120)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.setRemoteUrl(CliGitAPIImpl.java:832)
at hudson.plugins.git.GitAPI.setRemoteUrl(GitAPI.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:310)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:328)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at hudson.Proc$LocalProc.init(Proc.java:276)
at hudson.Proc$LocalProc.init(Proc.java:216)
at hudson.Launcher$LocalLauncher.launch(Launcher.java:780)
at hudson.Launcher$ProcStarter.start(Launcher.java:360)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1431)
... 21 more



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

2015-08-20 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706258#comment-14706258
 ] 

Jeff Zhang commented on TEZ-2687:
-

[~hitesh], Thanks for clarification.

Committed TEZ-2687-7.patch to 0.5/0.6/0.7/master


 ATS History shutdown happens before the min-held containers are released
 

 Key: TEZ-2687
 URL: https://issues.apache.org/jira/browse/TEZ-2687
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.2, 0.8.0, 0.7.1
Reporter: Gopal V
Assignee: Jeff Zhang
 Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, 
 TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch


 When ATS goes into a GC pause under heavy loads and while it recovers, each 
 Tez AM holds onto a few containers even though it is shutting down and will 
 never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705742#comment-14705742
 ] 

Hitesh Shah commented on TEZ-2164:
--

[~fs111] [~cchepelov] Any comments on this approach? 

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez

2015-08-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705740#comment-14705740
 ] 

Hitesh Shah commented on TEZ-2164:
--

bq. Should guava-tez reside in tez-tools, or some such sub-module.

Kept it separate as it is not meant to be built each time and therefore outside 
of the main build tree. 

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705660#comment-14705660
 ] 

Gopal V commented on TEZ-2731:
--

[~rajesh.balamohan]: can you review?

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez

2015-08-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705642#comment-14705642
 ] 

Siddharth Seth commented on TEZ-2164:
-

If we're taking this approach, I think we'll have to publish the guava-tez jar 
into a repository. Changing the build step to first compile guava-tez and then 
run the mvn install command would be a terrible experience.

This also means any project which depends on Tez will end up seeing two 
versions of Guava classes - which can lead to accidental usage of the tez 
version.
I'm not sure about this, but we may be able to continue depending on guava, and 
set the dependency to optional - so that downstream components do not 
automatically get the dependency. Don't think it's possible to set this for 
guava-tez though.

Should guava-tez reside in tez-tools, or some such sub-package.

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2164) Shade the guava version used by Tez

2015-08-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705642#comment-14705642
 ] 

Siddharth Seth edited comment on TEZ-2164 at 8/20/15 7:50 PM:
--

If we're taking this approach, I think we'll have to publish the guava-tez jar 
into a repository. Changing the build step to first compile guava-tez and then 
run the mvn install command would be a terrible experience.

This also means any project which depends on Tez will end up seeing two 
versions of Guava classes - which can lead to accidental usage of the tez 
version.
I'm not sure about this, but we may be able to continue depending on guava, and 
set the dependency to optional - so that downstream components do not 
automatically get the dependency. Don't think it's possible to set this for 
guava-tez though.

Should guava-tez reside in tez-tools, or some such sub-module.


was (Author: sseth):
If we're taking this approach, I think we'll have to publish the guava-tez jar 
into a repository. Changing the build step to first compile guava-tez and then 
run the mvn install command would be a terrible experience.

This also means any project which depends on Tez will end up seeing two 
versions of Guava classes - which can lead to accidental usage of the tez 
version.
I'm not sure about this, but we may be able to continue depending on guava, and 
set the dependency to optional - so that downstream components do not 
automatically get the dependency. Don't think it's possible to set this for 
guava-tez though.

Should guava-tez reside in tez-tools, or some such sub-package.

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, 
 allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-2731:


Assignee: Gopal V

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Gopal V
 Attachments: atomic-long-cntr.png, lock-inc.png, mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2731:
-
Attachment: atomic-long-cntr.png

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Gopal V
 Attachments: atomic-long-cntr.png, lock-inc.png, mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2731:
-
Attachment: TEZ-2731.1.patch

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, 
 mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 !lock-inc.png!
 !mr-reader-next.png!
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-08-20 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2732:
-
Assignee: Rajesh Balamohan

 DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
 ---

 Key: TEZ-2732
 URL: https://issues.apache.org/jira/browse/TEZ-2732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan

 {noformat}
   kvbuffer.length = 2146435072 (2047 MB)
   Corner case: bufIndex=2026133899, kvbidx=523629312.
   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
 overflow)
 {noformat}
 Would be good to restrict the max allowed sort buffer to 1800 instead of 
 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck

2015-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2731:
-
Attachment: lock-inc.png
mr-reader-next.png

 Fix Tez GenericCounter performance bottleneck
 -

 Key: TEZ-2731
 URL: https://issues.apache.org/jira/browse/TEZ-2731
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Gopal V
 Attachments: lock-inc.png, mr-reader-next.png


 GenericCounter::increment(1) shows up as a ~16% performance penalty inside 
 the unvectorized codepath of Hive queries.
 The vectorized codepath amortizes this entirely by running through that 
 exactly once every 1024 rows  the performance improvement is dramatic.
 Optimize the GenericCounter impl for mostly uncontested atomic operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)