[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637709#comment-13637709 ] Hudson commented on HIVE-4103: -- Integrated in Hive-trunk-h0.21 #2073 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2073/]) HIVE-4103 : Remove System.gc() call from the map-join local-task loop (Gopal V via Ashutosh Chauhan) (Revision 1470227) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470227 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637467#comment-13637467 ] Hudson commented on HIVE-4103: -- Integrated in Hive-trunk-hadoop2 #168 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/168/]) HIVE-4103 : Remove System.gc() call from the map-join local-task loop (Gopal V via Ashutosh Chauhan) (Revision 1470227) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470227 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637109#comment-13637109 ] Ashutosh Chauhan commented on HIVE-4103: Thanks, Gunther for running experiments. Difference of 56 vs 120 seconds is quite substantial. I agree, we should move ahead with the patch. +1 Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635782#comment-13635782 ] Gunther Hagleitner commented on HIVE-4103: -- I took some time to test out the two versions of the code. I ran a number of mapjoins ranging from small to at the limit and finally over the limit. In summary: Without the gc calls we overestimate the used memory very slightly. The biggest one I've seen is ~1%. The errors btw always cause the estimates to be more conservative, never less. The performance benefit on the other hand is quite substantial: On that large run it went from 120s to 56s with Gopals patch. I think we should move forward with this. Largest run: With Patch: {noformat} 2013-04-18 05:29:36 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:29:42 Processing rows:20 Hashtable size: 19 Memory usage: 108807528 rate: 0.102 2013-04-18 05:29:44 Processing rows:30 Hashtable size: 29 Memory usage: 158575416 rate: 0.149 2013-04-18 05:29:46 Processing rows:40 Hashtable size: 39 Memory usage: 211033848 rate: 0.198 2013-04-18 05:29:48 Processing rows:50 Hashtable size: 49 Memory usage: 260673400 rate: 0.245 2013-04-18 05:29:50 Processing rows:60 Hashtable size: 59 Memory usage: 310156256 rate: 0.291 2013-04-18 05:29:53 Processing rows:70 Hashtable size: 69 Memory usage: 359750536 rate: 0.338 2013-04-18 05:29:54 Processing rows:80 Hashtable size: 79 Memory usage: 417989768 rate: 0.392 2013-04-18 05:29:57 Processing rows:90 Hashtable size: 89 Memory usage: 460568536 rate: 0.432 2013-04-18 05:29:58 Processing rows:100 Hashtable size: 99 Memory usage: 510475320 rate: 0.479 2013-04-18 05:30:01 Processing rows:110 Hashtable size: 109 Memory usage: 559513584 rate: 0.525 2013-04-18 05:30:03 Processing rows:120 Hashtable size: 119 Memory usage: 609277088 rate: 0.572 2013-04-18 05:30:06 Processing rows:130 Hashtable size: 129 Memory usage: 659366968 rate: 0.619 2013-04-18 05:30:07 Processing rows:140 Hashtable size: 139 Memory usage: 708744832 rate: 0.665 2013-04-18 05:30:08 Processing rows:150 Hashtable size: 149 Memory usage: 758335688 rate: 0.712 2013-04-18 05:30:13 Processing rows:160 Hashtable size: 159 Memory usage: 825625224 rate: 0.775 2013-04-18 05:30:14 Processing rows:1646400 Hashtable size: 1646400 Memory usage: 848652056 rate: 0.796 2013-04-18 05:30:14 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable 2013-04-18 05:30:32 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable File size: 127593266 2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec. {noformat} Without patch: {noformat} 2013-04-18 05:55:22 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:55:29 Processing rows:20 Hashtable size: 19 Memory usage: 108779608 rate: 0.102 2013-04-18 05:55:33 Processing rows:30 Hashtable size: 29 Memory usage: 157203744 rate: 0.148 2013-04-18 05:55:37 Processing rows:40 Hashtable size: 39 Memory usage: 208667552 rate: 0.196 2013-04-18 05:55:42 Processing rows:50 Hashtable size: 49 Memory usage: 258126352 rate: 0.242 2013-04-18 05:55:46 Processing rows:60 Hashtable size: 59 Memory usage: 307734104 rate: 0.289 2013-04-18 05:55:51 Processing rows:70 Hashtable size: 69 Memory usage: 357043768 rate: 0.335 2013-04-18 05:55:57 Processing rows:80 Hashtable size: 79 Memory usage: 415059928 rate: 0.39 2013-04-18 05:56:04 Processing rows:90 Hashtable size: 89 Memory usage: 460135344 rate: 0.432 2013-04-18 05:56:10 Processing rows:100 Hashtable size: 99 Memory usage: 509690176 rate: 0.478 2013-04-18 05:56:18 Processing rows:110 Hashtable size: 109 Memory usage: 559042448 rate: 0.525 2013-04-18 05:56:25 Processing rows:120 Hashtable size: 119 Memory usage: 608652728 rate: 0.571 2013-04-18 05:56:33 Processing rows:130 Hashtable
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611226#comment-13611226 ] Ashutosh Chauhan commented on HIVE-4103: [~gopalv] System.gc() calls were made to get better estimate of free memory which than is decided to kill task or not if its using too much memory. I don't think its safe to remove this call. As a downside, we may kill a local task thinking its using too much memory, whereas in reality, it could have gone on to completion. Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611542#comment-13611542 ] Brock Noland commented on HIVE-4103: Agreed it's a bad practice in general to call sys.gc(), but I think the issue here is that we are trying to decide if it's possible to do a map join out on the cluster therefore we need a good estimate of memory consumed? Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590988#comment-13590988 ] Gopal V commented on HIVE-4103: --- On a run, the difference was {code} 2013-03-01 04:57:21 Upload 1 File to: file:/tmp/root/hive_2013-03-01_16-56-53_785_1192800933446838868/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable File size: 18426794 2013-03-01 04:57:21 End of local task; Time Taken: 22.426 sec. {code} versus, after-fix {code} 2013-03-01 04:56:26 Upload 1 File to: file:/tmp/root/hive_2013-03-01_16-56-01_539_5116929752955084952/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable File size: 18426794 2013-03-01 04:56:26 End of local task; Time Taken: 19.874 sec. {code} Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira