[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752349#comment-13752349 ] Hudson commented on HIVE-5144: -- FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/386/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Fix For: 0.12.0 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751824#comment-13751824 ] Hudson commented on HIVE-5144: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #73 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/73/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Fix For: 0.12.0 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752025#comment-13752025 ] Hudson commented on HIVE-5144: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #141 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/141/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Fix For: 0.12.0 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752072#comment-13752072 ] Hudson commented on HIVE-5144: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2294 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2294/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Fix For: 0.12.0 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750182#comment-13750182 ] Ashutosh Chauhan commented on HIVE-5144: +1 HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748839#comment-13748839 ] Ashutosh Chauhan commented on HIVE-5144: Nice findings, Gopal! Couple of comments: * Shall we do initialization of empty row and container in the constructor? * Also mark these fields as transient. HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748846#comment-13748846 ] Gopal V commented on HIVE-5144: --- I followed the pattern from MapJoinKey and MapJoinRowContainer which both use static final EMPTY_OBJECT_ARRAY[]. And since the empty items are immutable, I followed the same pattern. HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira