[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4421: Fix Version/s: 0.11.0 Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0, 0.12.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4421: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Owen! Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4421: Fix Version/s: (was: 0.11.0) Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4421: -- Attachment: HIVE-4421.D10545.3.patch omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC dictionaries. I've updated the TestOrcFile unit test to reflect the changes. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D10545 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10545?vs=33201id=33219#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4421: -- Attachment: HIVE-4421.D10545.4.patch omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC dictionaries. Addressed Ashutosh's suggestions. Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D10545 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10545?vs=33219id=33249#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/resources/orc-file-dump.out To: JIRA, ashutoshc, omalley Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4421: -- Attachment: HIVE-4421.D10545.2.patch omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC dictionaries. Changed the memory manager to check on each 5000 total rows added. This seems to give the best trade off between handling too many writers in a small heap and still managing memory pretty accurately. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D10545 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10545?vs=32889id=33201#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4421: -- Attachment: HIVE-4421.D10545.1.patch omalley requested code review of HIVE-4421 [jira] Improve memory usage by ORC dictionaries. Reviewers: JIRA HIVE-4421 Improve ORC dictionary memory usage and tracking Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D10545 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/resources/orc-file-dump.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/25221/ To: JIRA, omalley Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4421.D10545.1.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4421: Fix Version/s: 0.11.0 Status: Patch Available (was: Open) This patch does three things: * Improves the memory usage while writing ORC dictionaries by removing the counts and just storing offsets instead of offsets and lengths. * Improves the tracking of how much memory is used by the dictionaries by tracking the allocation rather than the usage. * Reduces the size of some of the allocation sizes of the integer arrays. Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0 Attachments: HIVE-4421.D10545.1.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira