[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-08 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4421:


Fix Version/s: 0.11.0

 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4421:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Owen!

 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-06 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4421:


Fix Version/s: (was: 0.11.0)

 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-01 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4421:
--

Attachment: HIVE-4421.D10545.3.patch

omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC 
dictionaries.

  I've updated the TestOrcFile unit test to reflect the changes.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10545

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10545?vs=33201id=33219#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
  ql/src/test/resources/orc-file-dump.out

To: JIRA, omalley


 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-01 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4421:
--

Attachment: HIVE-4421.D10545.4.patch

omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC 
dictionaries.

  Addressed Ashutosh's suggestions.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10545

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10545?vs=33219id=33249#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
  ql/src/test/resources/orc-file-dump.out

To: JIRA, ashutoshc, omalley


 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-04-30 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4421:
--

Attachment: HIVE-4421.D10545.2.patch

omalley updated the revision HIVE-4421 [jira] Improve memory usage by ORC 
dictionaries.

  Changed the memory manager to check on each 5000 total rows added. This seems 
to give the best trade off between handling too many writers in a small heap 
and still managing memory pretty accurately.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10545

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10545?vs=32889id=33201#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestMemoryManager.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
  ql/src/test/resources/orc-file-dump.out

To: JIRA, omalley


 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-04-25 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4421:
--

Attachment: HIVE-4421.D10545.1.patch

omalley requested code review of HIVE-4421 [jira] Improve memory usage by ORC 
dictionaries.

Reviewers: JIRA

HIVE-4421 Improve ORC dictionary memory usage and tracking

Currently, for tables with many string columns, it is possible to significantly 
underestimate the memory used by the ORC dictionaries and cause the query to 
run out of memory in the task.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D10545

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
  ql/src/test/resources/orc-file-dump.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/25221/

To: JIRA, omalley


 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4421.D10545.1.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-04-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4421:


Fix Version/s: 0.11.0
   Status: Patch Available  (was: Open)

This patch does three things:
* Improves the memory usage while writing ORC dictionaries by removing the 
counts and just storing offsets instead of offsets and lengths.
* Improves the tracking of how much memory is used by the dictionaries by 
tracking the allocation rather than the usage.
* Reduces the size of some of the allocation sizes of the integer arrays.

 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4421.D10545.1.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira