[jira] [Created] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
Rajesh Balamohan created TEZ-2732: - Summary: DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-2731: - Issue Type: Sub-task (was: Improvement) Parent: TEZ-2605 Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Reporter: Gopal V Attachments: lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2541) DAGClientImpl enable TimelineClient check is wrong.
[ https://issues.apache.org/jira/browse/TEZ-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704373#comment-14704373 ] Prakash Ramachandran commented on TEZ-2541: --- [~hitesh] This has been handled as part of the TEZ-1529 port to branch 0.5. so no action required here for 0.5 DAGClientImpl enable TimelineClient check is wrong. --- Key: TEZ-2541 URL: https://issues.apache.org/jira/browse/TEZ-2541 Project: Apache Tez Issue Type: Bug Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Fix For: 0.6.2, 0.8.0, 0.7.1 Attachments: TEZ-2541.1.patch, TEZ-2541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2690 PreCommit Build #1007
Jira: https://issues.apache.org/jira/browse/TEZ-2690 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1007/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 3297 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12751363/TEZ-2690.2.patch against master revision 24ca1de. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 16 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//artifact/patchprocess/newPatchFindbugsWarningsjob-analyzer.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. b7e441a06c04ec9fd740272940d90f7d5cf0 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #1000 Archived 50 artifacts Archive block size is 32768 Received 2 blocks and 3065049 bytes Compression is 2.1% Took 1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2690) Add critical path analyser
[ https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704372#comment-14704372 ] TezQA commented on TEZ-2690: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12751363/TEZ-2690.2.patch against master revision 24ca1de. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 16 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//artifact/patchprocess/newPatchFindbugsWarningsjob-analyzer.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1007//console This message is automatically generated. Add critical path analyser -- Key: TEZ-2690 URL: https://issues.apache.org/jira/browse/TEZ-2690 Project: Apache Tez Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2690.1.patch, TEZ-2690.2.patch, criticalPath.jpg, dag_1439860407967_0030_1.svg Use input and scheduling dependencies to create critical path for a DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706236#comment-14706236 ] Rajesh Balamohan commented on TEZ-2731: --- java version: 1.7.0_67 java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias bool UseBiasedLocking = true{product} It is enabled as default in latest JVMs. Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706254#comment-14706254 ] Rajesh Balamohan commented on TEZ-2732: --- Attaching the patch for review. [~hitesh], [~sseth] - Please review when you find time. - Capping buffer to 1800. - Added tests which would reproduce this issue with DefaultSorter.MAX_IO_SORT_MB=2047 and when io.sort.mb is set to 2047. - Disabled these tests by default as it would need 2 GB containers in test env. DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers --- Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2732.1.patch {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706230#comment-14706230 ] Hitesh Shah commented on TEZ-2687: -- bq. Any difference between sleeping before and after ats events flushed to ATS ? Do you concern about the DAGClient ? Consider a test which wants to verify that containers are released but also at the same time verify that the data is being pushed into timeline. No real functional concern apart from the fact that it would be better to finish all the real work as soon as possible and then wait instead of waiting first. ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706236#comment-14706236 ] Rajesh Balamohan edited comment on TEZ-2731 at 8/21/15 5:04 AM: java version: 1.7.0_67 {noformat} java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias bool UseBiasedLocking = true{product} {noformat} It is enabled as default in latest JVMs. was (Author: rajesh.balamohan): java version: 1.7.0_67 java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal | grep Bias bool UseBiasedLocking = true{product} It is enabled as default in latest JVMs. Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706267#comment-14706267 ] Hitesh Shah commented on TEZ-2732: -- +1 pending pre-commit DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers --- Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2732.1.patch {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706228#comment-14706228 ] Rajesh Balamohan commented on TEZ-2731: --- lgtm. +1. Had it been uncontended synchronization UseBiasedLocking (believe this is enabled in default in JDK 7) would have been triggered and not seen this level of drop in perf with sync. But appears that it is relatively contended with less threads in which case atomic long help a lot in improving the perf. Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706229#comment-14706229 ] Gopal V commented on TEZ-2731: -- [~rajesh.balamohan]: I thought UseBiasedLocking had some issues with IdentityHashMap (so is not on by default)? Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706253#comment-14706253 ] Rajesh Balamohan commented on TEZ-2732: --- One more place where similar overflow can happen is in write() (bufindex + len can get into -ve space). In such cases, it would end up throwing following exception {noformat} java.lang.ArrayIndexOutOfBoundsException at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(DefaultSorter.java:648) at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(DefaultSorter.java:544) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at org.apache.hadoop.io.Text.write(Text.java:330) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) {noformat} DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers --- Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2732.1.patch {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2732: -- Attachment: TEZ-2732.1.patch DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers --- Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2732.1.patch {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2687: Attachment: TEZ-2687-4.patch Minor update to address the comments. Commit it soon. ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2687: Attachment: TEZ-2687-6.patch Add new config tez.test.history-service.stop.sleep.secs for system test to simulate the ATS hang behavior ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706142#comment-14706142 ] Hitesh Shah commented on TEZ-2687: -- [~zjffdu] I think the test config sleep fix should be restricted to the ATSHistoryLoggerService. Furthermore the new config property does not need to be declared in TezConfiguration. It can be just in ATSHistoryLoggingService only. Lastly, the sleep should happen *after* all ats events are flushed to ATS. The current sleep is being done before the flush happens which seems incorrect. ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705970#comment-14705970 ] Bikas Saha commented on TEZ-2687: - Minor typo - +LOG.info(Realease held containers); Minor array list could be given initial size - +ListObject tasks = new ArrayListObject(); Rest looks good. +1. Not sure this needs to go all the way to 0.5 ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services
[ https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2003: Attachment: 2003_20150820.1.txt [Umbrella] Allow Tez to co-ordinate execution to external services -- Key: TEZ-2003 URL: https://issues.apache.org/jira/browse/TEZ-2003 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt, 2003_20150807.2.txt, 2003_20150812.1.txt, 2003_20150812.2.txt, 2003_20150814.1.txt, 2003_20150814.2.txt, 2003_20150820.1.txt, Tez With External Services.pdf The Tez engine itself takes care of co-ordinating execution - controlling how data gets routed (different connection patterns), fault tolerance, scheduling of work, etc. This is currently tied to TaskSpecs defined within Tez and on containers launched by Tez itself (TezChild). The proposal is to allow Tez to work with external services instead of just containers launched by Tez. This involves several more pluggable layers to work with alternate Task Specifications, custom launch and task allocation mechanics, as well as custom scheduling sources. A simple example would be a simple a process with the capability to execute multiple Tez TaskSpecs as threads. In such a case, a container launch isn't really need and can be mocked. Sourcing / scheduling containers would need to be pluggable. A more advanced example would be LLAP (HIVE-7926; https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf). This works with custom interfaces - which would need to be supported by Tez, along with a custom event model which would need translation hooks. Tez should be able to work with a combination of certain vertices running in external services and others running in regular Tez containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706150#comment-14706150 ] Jeff Zhang commented on TEZ-2687: - bq. Instead of touching code that will be run in prod, I suggest writing a VerySlowHistoryLoggingService impl for the test-cases. make sense since HistoryLoggingService is pluggable. ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2687: Attachment: TEZ-2687-7.patch ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705422#comment-14705422 ] Hitesh Shah commented on TEZ-2164: -- Thanks for the review [~rajesh.balamohan]. The unintentional changes are due to the handling of core.autocrlf in git. It seems the general good practice is to have is to have a newline at the end of files. Will file a follow up jira for the import re-orders. [~sseth] Any suggestions on the build issue for guava-tez? Should there be a one-off build to publish/deploy the guava-tez shaded jar before committing this patch? Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Hitesh Shah Priority: Critical Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max
[ https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2629: Attachment: TEZ-2629_branch07_1.txt TEZ-2629_branch06_1.txt TEZ-2629_branch05_1.txt Additional patches for different branches. Thanks for the review. Committing. LimitExceededException in Tez client when DAG has exceeds the default max - Key: TEZ-2629 URL: https://issues.apache.org/jira/browse/TEZ-2629 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Jason Dere Assignee: Siddharth Seth Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt Original issue was HIVE-11303, seeing LimitExceededException when the client tries to get the counters for a completed job: {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} It looks like Limits.ensureInitialized() is defaulting to an empty configuration, resulting in COUNTERS_MAX being set to the default of 1200 (even though Hive's configuration specified tez.counters.max=16000). Per [~sseth]: {quote} I think the Tez client does need to make this call to setup the Configuration correctly. We do this for the AM and the executing task - which is why it works. Could you please open a Tez jira for this ? Also, Limits is making use of Configuration instead of TezConfiguration for default initialization, which implies changes to tez-site on the local node won't be picked up. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max counters
[ https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2629: Summary: LimitExceededException in Tez client when DAG has exceeds the default max counters (was: LimitExceededException in Tez client when DAG has exceeds the default max) LimitExceededException in Tez client when DAG has exceeds the default max counters -- Key: TEZ-2629 URL: https://issues.apache.org/jira/browse/TEZ-2629 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0, 0.6.0, 0.7.0 Reporter: Jason Dere Assignee: Siddharth Seth Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt Original issue was HIVE-11303, seeing LimitExceededException when the client tries to get the counters for a completed job: {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} It looks like Limits.ensureInitialized() is defaulting to an empty configuration, resulting in COUNTERS_MAX being set to the default of 1200 (even though Hive's configuration specified tez.counters.max=16000). Per [~sseth]: {quote} I think the Tez client does need to make this call to setup the Configuration correctly. We do this for the AM and the executing task - which is why it works. Could you please open a Tez jira for this ? Also, Limits is making use of Configuration instead of TezConfiguration for default initialization, which implies changes to tez-site on the local node won't be picked up. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2629) LimitExceededException in Tez client when DAG has exceeds the default max
[ https://issues.apache.org/jira/browse/TEZ-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2629: Affects Version/s: 0.6.0 0.7.0 LimitExceededException in Tez client when DAG has exceeds the default max - Key: TEZ-2629 URL: https://issues.apache.org/jira/browse/TEZ-2629 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0, 0.6.0, 0.7.0 Reporter: Jason Dere Assignee: Siddharth Seth Attachments: TEZ-2629.1.txt, TEZ-2629_branch05_1.txt, TEZ-2629_branch06_1.txt, TEZ-2629_branch07_1.txt Original issue was HIVE-11303, seeing LimitExceededException when the client tries to get the counters for a completed job: {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} It looks like Limits.ensureInitialized() is defaulting to an empty configuration, resulting in COUNTERS_MAX being set to the default of 1200 (even though Hive's configuration specified tez.counters.max=16000). Per [~sseth]: {quote} I think the Tez client does need to make this call to setup the Configuration correctly. We do this for the AM and the executing task - which is why it works. Could you please open a Tez jira for this ? Also, Limits is making use of Configuration instead of TezConfiguration for default initialization, which implies changes to tez-site on the local node won't be picked up. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2733) Add information about container assignment to attempts
Bikas Saha created TEZ-2733: --- Summary: Add information about container assignment to attempts Key: TEZ-2733 URL: https://issues.apache.org/jira/browse/TEZ-2733 Project: Apache Tez Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2690) Add critical path analyser
[ https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2690: Attachment: TEZ-2690.3.patch Add critical path analyser -- Key: TEZ-2690 URL: https://issues.apache.org/jira/browse/TEZ-2690 Project: Apache Tez Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2690.1.patch, TEZ-2690.2.patch, TEZ-2690.3.patch, criticalPath.jpg, dag_1439860407967_0030_1.svg Use input and scheduling dependencies to create critical path for a DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2628) History logging plugin to write ATS events to HDFS
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2628: Attachment: TEZ-2628.002.patch Yes, there's a problem with retention on a secure cluster. The timeline server does not have write permissions to the application directory which prevents it from removing the app directory tree when it's time to remove it from the done directory. Updating the original patch with a quick fix that provides group write permissions on the app directory. History logging plugin to write ATS events to HDFS -- Key: TEZ-2628 URL: https://issues.apache.org/jira/browse/TEZ-2628 Project: Apache Tez Issue Type: Improvement Reporter: Jason Lowe Assignee: Jason Lowe Attachments: TEZ-2628.001.patch, TEZ-2628.002.patch, hive-timeline.json This provides another history logging alternative that conceptually the same as the timeline logging service but logs the entities to a file rather than posting the events to the timeline server directly. When coupled with the timeline store plugin from YARN-3942 it allows the Tez job to be decoupled from the timeline server yet the Tez UI can still function properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2690 PreCommit Build #1008
Jira: https://issues.apache.org/jira/browse/TEZ-2690 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1008/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by remote host 0:0:0:0:0:0:0:1 Building remotely on H5 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git # timeout=10 FATAL: Failed to fetch from https://git-wip-us.apache.org/repos/asf/tez.git hudson.plugins.git.GitException: Failed to fetch from https://git-wip-us.apache.org/repos/asf/tez.git at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:647) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:889) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:914) at hudson.model.AbstractProject.checkout(AbstractProject.java:1252) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524) at hudson.model.Run.execute(Run.java:1706) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:232) Caused by: hudson.plugins.git.GitException: Error performing git command at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1444) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1411) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1407) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1110) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1120) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.setRemoteUrl(CliGitAPIImpl.java:832) at hudson.plugins.git.GitAPI.setRemoteUrl(GitAPI.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:310) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at hudson.Proc$LocalProc.init(Proc.java:276) at hudson.Proc$LocalProc.init(Proc.java:216) at hudson.Launcher$LocalLauncher.launch(Launcher.java:780) at hudson.Launcher$ProcStarter.start(Launcher.java:360) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1431) ... 21 more ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released
[ https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706258#comment-14706258 ] Jeff Zhang commented on TEZ-2687: - [~hitesh], Thanks for clarification. Committed TEZ-2687-7.patch to 0.5/0.6/0.7/master ATS History shutdown happens before the min-held containers are released Key: TEZ-2687 URL: https://issues.apache.org/jira/browse/TEZ-2687 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.2, 0.8.0, 0.7.1 Reporter: Gopal V Assignee: Jeff Zhang Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch, TEZ-2687-3.patch, TEZ-2687-4.patch, TEZ-2687-6.patch, TEZ-2687-7.patch When ATS goes into a GC pause under heavy loads and while it recovers, each Tez AM holds onto a few containers even though it is shutting down and will never accept any more DAGs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705742#comment-14705742 ] Hitesh Shah commented on TEZ-2164: -- [~fs111] [~cchepelov] Any comments on this approach? Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Hitesh Shah Priority: Critical Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705740#comment-14705740 ] Hitesh Shah commented on TEZ-2164: -- bq. Should guava-tez reside in tez-tools, or some such sub-module. Kept it separate as it is not meant to be built each time and therefore outside of the main build tree. Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Hitesh Shah Priority: Critical Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705660#comment-14705660 ] Gopal V commented on TEZ-2731: -- [~rajesh.balamohan]: can you review? Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705642#comment-14705642 ] Siddharth Seth commented on TEZ-2164: - If we're taking this approach, I think we'll have to publish the guava-tez jar into a repository. Changing the build step to first compile guava-tez and then run the mvn install command would be a terrible experience. This also means any project which depends on Tez will end up seeing two versions of Guava classes - which can lead to accidental usage of the tez version. I'm not sure about this, but we may be able to continue depending on guava, and set the dependency to optional - so that downstream components do not automatically get the dependency. Don't think it's possible to set this for guava-tez though. Should guava-tez reside in tez-tools, or some such sub-package. Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Hitesh Shah Priority: Critical Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705642#comment-14705642 ] Siddharth Seth edited comment on TEZ-2164 at 8/20/15 7:50 PM: -- If we're taking this approach, I think we'll have to publish the guava-tez jar into a repository. Changing the build step to first compile guava-tez and then run the mvn install command would be a terrible experience. This also means any project which depends on Tez will end up seeing two versions of Guava classes - which can lead to accidental usage of the tez version. I'm not sure about this, but we may be able to continue depending on guava, and set the dependency to optional - so that downstream components do not automatically get the dependency. Don't think it's possible to set this for guava-tez though. Should guava-tez reside in tez-tools, or some such sub-module. was (Author: sseth): If we're taking this approach, I think we'll have to publish the guava-tez jar into a repository. Changing the build step to first compile guava-tez and then run the mvn install command would be a terrible experience. This also means any project which depends on Tez will end up seeing two versions of Guava classes - which can lead to accidental usage of the tez version. I'm not sure about this, but we may be able to continue depending on guava, and set the dependency to optional - so that downstream components do not automatically get the dependency. Don't think it's possible to set this for guava-tez though. Should guava-tez reside in tez-tools, or some such sub-package. Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Hitesh Shah Priority: Critical Attachments: TEZ-2164.3.patch, TEZ-2164.wip.2.patch, allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned TEZ-2731: Assignee: Gopal V Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Reporter: Gopal V Assignee: Gopal V Attachments: atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-2731: - Attachment: atomic-long-cntr.png Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Reporter: Gopal V Assignee: Gopal V Attachments: atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-2731: - Attachment: TEZ-2731.1.patch Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Sub-task Reporter: Gopal V Assignee: Gopal V Attachments: TEZ-2731.1.patch, atomic-long-cntr.png, lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. !lock-inc.png! !mr-reader-next.png! Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2732: - Assignee: Rajesh Balamohan DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers --- Key: TEZ-2732 URL: https://issues.apache.org/jira/browse/TEZ-2732 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan {noformat} kvbuffer.length = 2146435072 (2047 MB) Corner case: bufIndex=2026133899, kvbidx=523629312. distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would overflow) {noformat} Would be good to restrict the max allowed sort buffer to 1800 instead of 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2731) Fix Tez GenericCounter performance bottleneck
[ https://issues.apache.org/jira/browse/TEZ-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-2731: - Attachment: lock-inc.png mr-reader-next.png Fix Tez GenericCounter performance bottleneck - Key: TEZ-2731 URL: https://issues.apache.org/jira/browse/TEZ-2731 Project: Apache Tez Issue Type: Improvement Reporter: Gopal V Attachments: lock-inc.png, mr-reader-next.png GenericCounter::increment(1) shows up as a ~16% performance penalty inside the unvectorized codepath of Hive queries. The vectorized codepath amortizes this entirely by running through that exactly once every 1024 rows the performance improvement is dramatic. Optimize the GenericCounter impl for mostly uncontested atomic operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)