[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637709#comment-13637709
 ] 

Hudson commented on HIVE-4103:
--

Integrated in Hive-trunk-h0.21 #2073 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2073/])
HIVE-4103 : Remove System.gc() call from the map-join local-task loop 
(Gopal V via Ashutosh Chauhan) (Revision 1470227)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470227
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java


 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-04-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637467#comment-13637467
 ] 

Hudson commented on HIVE-4103:
--

Integrated in Hive-trunk-hadoop2 #168 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/168/])
HIVE-4103 : Remove System.gc() call from the map-join local-task loop 
(Gopal V via Ashutosh Chauhan) (Revision 1470227)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470227
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java


 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-04-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637109#comment-13637109
 ] 

Ashutosh Chauhan commented on HIVE-4103:


Thanks, Gunther for running experiments. Difference of 56 vs 120 seconds is 
quite substantial. I agree, we should move ahead with the patch. 
+1

 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-04-18 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635782#comment-13635782
 ] 

Gunther Hagleitner commented on HIVE-4103:
--

I took some time to test out the two versions of the code. I ran a number of 
mapjoins ranging from small to at the limit and finally over the limit. In 
summary: Without the gc calls we overestimate the used memory very slightly. 
The biggest one I've seen is ~1%. The errors btw always cause the estimates to 
be more conservative, never less. The performance benefit on the other hand is 
quite substantial: On that large run it went from 120s to 56s with Gopals 
patch. I think we should move forward with this.

Largest run:

With Patch:

{noformat}
2013-04-18 05:29:36 Starting to launch local task to process map join;  
maximum memory = 1065484288
2013-04-18 05:29:42 Processing rows:20  Hashtable size: 19  
Memory usage:   108807528   rate:   0.102
2013-04-18 05:29:44 Processing rows:30  Hashtable size: 29  
Memory usage:   158575416   rate:   0.149
2013-04-18 05:29:46 Processing rows:40  Hashtable size: 39  
Memory usage:   211033848   rate:   0.198
2013-04-18 05:29:48 Processing rows:50  Hashtable size: 49  
Memory usage:   260673400   rate:   0.245
2013-04-18 05:29:50 Processing rows:60  Hashtable size: 59  
Memory usage:   310156256   rate:   0.291
2013-04-18 05:29:53 Processing rows:70  Hashtable size: 69  
Memory usage:   359750536   rate:   0.338
2013-04-18 05:29:54 Processing rows:80  Hashtable size: 79  
Memory usage:   417989768   rate:   0.392
2013-04-18 05:29:57 Processing rows:90  Hashtable size: 89  
Memory usage:   460568536   rate:   0.432
2013-04-18 05:29:58 Processing rows:100 Hashtable size: 99  
Memory usage:   510475320   rate:   0.479
2013-04-18 05:30:01 Processing rows:110 Hashtable size: 109 
Memory usage:   559513584   rate:   0.525
2013-04-18 05:30:03 Processing rows:120 Hashtable size: 119 
Memory usage:   609277088   rate:   0.572
2013-04-18 05:30:06 Processing rows:130 Hashtable size: 129 
Memory usage:   659366968   rate:   0.619
2013-04-18 05:30:07 Processing rows:140 Hashtable size: 139 
Memory usage:   708744832   rate:   0.665
2013-04-18 05:30:08 Processing rows:150 Hashtable size: 149 
Memory usage:   758335688   rate:   0.712
2013-04-18 05:30:13 Processing rows:160 Hashtable size: 159 
Memory usage:   825625224   rate:   0.775
2013-04-18 05:30:14 Processing rows:1646400 Hashtable size: 1646400 
Memory usage:   848652056   rate:   0.796
2013-04-18 05:30:14 Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
2013-04-18 05:30:32 Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
 File size: 127593266
2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec.
{noformat}

Without patch:

{noformat}
2013-04-18 05:55:22 Starting to launch local task to process map join;  
maximum memory = 1065484288
2013-04-18 05:55:29 Processing rows:20  Hashtable size: 19  
Memory usage:   108779608   rate:   0.102
2013-04-18 05:55:33 Processing rows:30  Hashtable size: 29  
Memory usage:   157203744   rate:   0.148
2013-04-18 05:55:37 Processing rows:40  Hashtable size: 39  
Memory usage:   208667552   rate:   0.196
2013-04-18 05:55:42 Processing rows:50  Hashtable size: 49  
Memory usage:   258126352   rate:   0.242
2013-04-18 05:55:46 Processing rows:60  Hashtable size: 59  
Memory usage:   307734104   rate:   0.289
2013-04-18 05:55:51 Processing rows:70  Hashtable size: 69  
Memory usage:   357043768   rate:   0.335
2013-04-18 05:55:57 Processing rows:80  Hashtable size: 79  
Memory usage:   415059928   rate:   0.39
2013-04-18 05:56:04 Processing rows:90  Hashtable size: 89  
Memory usage:   460135344   rate:   0.432
2013-04-18 05:56:10 Processing rows:100 Hashtable size: 99  
Memory usage:   509690176   rate:   0.478
2013-04-18 05:56:18 Processing rows:110 Hashtable size: 109 
Memory usage:   559042448   rate:   0.525
2013-04-18 05:56:25 Processing rows:120 Hashtable size: 119 
Memory usage:   608652728   rate:   0.571
2013-04-18 05:56:33 Processing rows:130 Hashtable 

[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-03-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611226#comment-13611226
 ] 

Ashutosh Chauhan commented on HIVE-4103:


[~gopalv] System.gc() calls were made to get better estimate of free memory 
which than is decided to kill task or not if its using too much memory. I don't 
think its safe to remove this call. As a downside, we may kill a local task 
thinking its using too much memory, whereas in reality, it could have gone on 
to completion.

 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-03-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611542#comment-13611542
 ] 

Brock Noland commented on HIVE-4103:


Agreed it's a bad practice in general to call sys.gc(), but I think the issue 
here is that we are trying to decide if it's possible to do a map join out on 
the cluster therefore we need a good estimate of memory consumed? 

 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop

2013-03-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590988#comment-13590988
 ] 

Gopal V commented on HIVE-4103:
---

On a run, the difference was 

{code}
2013-03-01 04:57:21 Upload 1 File to: 
file:/tmp/root/hive_2013-03-01_16-56-53_785_1192800933446838868/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable
 File size: 18426794
2013-03-01 04:57:21 End of local task; Time Taken: 22.426 sec.
{code}

versus, after-fix

{code}
2013-03-01 04:56:26 Upload 1 File to: 
file:/tmp/root/hive_2013-03-01_16-56-01_539_5116929752955084952/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable
 File size: 18426794
2013-03-01 04:56:26 End of local task; Time Taken: 19.874 sec.
{code}

 Remove System.gc() call from the map-join local-task loop
 -

 Key: HIVE-4103
 URL: https://issues.apache.org/jira/browse/HIVE-4103
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Priority: Minor
 Attachments: HIVE-4103.patch


 Hive's HashMapWrapper calls System.gc() twice within the 
 HashMapWrapper::isAbort() which produces a significant slow-down during the 
 loop.
 {code}
 2013-03-01 04:54:28 The gc calls took 677 ms
 2013-03-01 04:54:28 Processing rows:20  Hashtable size: 
 19  Memory usage:   62955432rate:   0.033
 2013-03-01 04:54:31 The gc calls took 956 ms
 2013-03-01 04:54:31 Processing rows:30  Hashtable size: 
 29  Memory usage:   90826656rate:   0.048
 2013-03-01 04:54:33 The gc calls took 967 ms
 2013-03-01 04:54:33 Processing rows:384160  Hashtable size: 
 384160  Memory usage:   114412712   rate:   0.06
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira