[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752349#comment-13752349
 ] 

Hudson commented on HIVE-5144:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/386/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays  OOMs - use a 
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Fix For: 0.12.0

 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751824#comment-13751824
 ] 

Hudson commented on HIVE-5144:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #73 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/73/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays  OOMs - use a 
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Fix For: 0.12.0

 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752025#comment-13752025
 ] 

Hudson commented on HIVE-5144:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #141 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/141/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays  OOMs - use a 
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Fix For: 0.12.0

 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752072#comment-13752072
 ] 

Hudson commented on HIVE-5144:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2294 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2294/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays  OOMs - use a 
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Fix For: 0.12.0

 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750182#comment-13750182
 ] 

Ashutosh Chauhan commented on HIVE-5144:


+1

 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748839#comment-13748839
 ] 

Ashutosh Chauhan commented on HIVE-5144:


Nice findings, Gopal!

Couple of comments:
* Shall we do initialization of empty row and container in the constructor?
* Also mark these fields as transient.

 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748846#comment-13748846
 ] 

Gopal V commented on HIVE-5144:
---

I followed the pattern from MapJoinKey and MapJoinRowContainer which both use 
static final EMPTY_OBJECT_ARRAY[].

And since the empty items are immutable, I followed the same pattern.



 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira