[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2014-08-09 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091872#comment-14091872
 ] 

Lefty Leverenz commented on HIVE-4248:
--

This added configuration parameter *hive.exec.orc.memory.pool* to HiveConf.java 
in 0.11.0.  It's documented in the wiki here:

* [Configuration Properties -- hive.exec.orc.memory.pool | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.memory.pool]
 

 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch, 
 HIVE-4248.D9993.4.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637710#comment-13637710
 ] 

Hudson commented on HIVE-4248:
--

Integrated in Hive-trunk-h0.21 #2073 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2073/])
HIVE-4248 : Implement a memory manager for ORC (Owen Omalley via Ashutosh 
Chauhan) (Revision 1470249)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470249
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch, 
 HIVE-4248.D9993.4.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637468#comment-13637468
 ] 

Hudson commented on HIVE-4248:
--

Integrated in Hive-trunk-hadoop2 #168 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/168/])
HIVE-4248 : Implement a memory manager for ORC (Owen Omalley via Ashutosh 
Chauhan) (Revision 1470249)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470249
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch, 
 HIVE-4248.D9993.4.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636917#comment-13636917
 ] 

Phabricator commented on HIVE-4248:
---

ashutoshc has accepted the revision HIVE-4248 [jira] Implement a memory 
manager for ORC.

  +1 will commit if tests pass.

REVISION DETAIL
  https://reviews.facebook.net/D9993

BRANCH
  h-4248

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley
Cc: kevinwilfong


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch, 
 HIVE-4248.D9993.4.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626957#comment-13626957
 ] 

Owen O'Malley commented on HIVE-4248:
-

Kevin,
  After thinking about it a bit more, how about if I ask the writers to 
re-check their memory relative to their allocation when the pool has shrunk by 
more than 10% from the last time they checked. I ran a quick experiment where I 
had a pool of 1GB and an increasing set of 250MB writers. By only doing the 
check when the pool has changed by more than 10%, as 1000 writers were added it 
cut down the number checks from 1000 to 49. Does that sound reasonable?

 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626037#comment-13626037
 ] 

Phabricator commented on HIVE-4248:
---

kevinwilfong has commented on the revision HIVE-4248 [jira] Implement a memory 
manager for ORC.

  This allows for cases where the memory used could exceed the amount of memory 
allocated by significant amounts.

  E.g. say totalMemoryPool = 256 Mb = stripe size, also say we have a writer 
that writes 255 Mb to a stripe, then a second writer is created (e.g. a new 
dynamic partition value is encountered) and all new rows get written to this 
second writer, than nothing will get written out until the second writer 
accumulates 128 Mb of data in the stripe using a total of 383 Mb of the 
allocated 256 Mb.  In theory, with some terrible luck, these could be chained 
together to use significantly more memory (first writer writes 255 Mb, second 
writes 127 Mb, third writes 85 Mb, etc.)

  Could you loop through the stripes whenever a writer is added (shouldn't 
happen to frequently) and check if the estimated stripe size of any of these 
writers exceeds the value of stripeSize * memoryManager.getAllocationScale() 
(should be doable by making a couple methods public and storing a reference to 
the WriterImpl along with or instead of the Path).

  Also (could be done in a follow up) could there be an additional check to see 
what the total HeapMemoryUsage is?  E.g. in the shouldBeFlushed method of 
GroupByOperator, every 1000 rows, it checks that no more than 90% of the total 
heap has been used, and if so it flushes the hash map.  Something similar could 
be done for WriterImpl, and given the MemoryManager, could even flush the 
largest stripe, rather than just the one that pushed it over the edge.   This 
would be particularly useful given that in the case of a map join, followed by 
a map aggregation, the mapjoin is allowed to use 55% of the memory, and the 
group by another 30%, if there was also a FileSinkOpeartor, allowing the ORC 
WriterImpl to use 50% could be too much.

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:490 could you add 
this to conf/hive-default.xml.template as well.

REVISION DETAIL
  https://reviews.facebook.net/D9993

To: JIRA, omalley
Cc: kevinwilfong


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626193#comment-13626193
 ] 

Phabricator commented on HIVE-4248:
---

omalley has commented on the revision HIVE-4248 [jira] Implement a memory 
manager for ORC.

  I agree that it can overshoot, but it won't likely be by that much. Of course 
the normal case is that the dynamic partitions are distributed randomly, in 
which case the current version will do fine. Granted, if the data is already 
sorted by the dynamic partition, it will not do well.

  Ok, I'll add a check when we add a new partition. I was just concerned with 
each new partition addition, it will take longer and longer to do all of the 
checks.

REVISION DETAIL
  https://reviews.facebook.net/D9993

To: JIRA, omalley
Cc: kevinwilfong


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-04-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624184#comment-13624184
 ] 

Phabricator commented on HIVE-4248:
---

omalley updated the revision HIVE-4248 [jira] Implement a memory manager for 
ORC.

  removed other patch

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D9993

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D9993?vs=31311id=31317#toc

AFFECTED FILES
  
metastore/src/java/org/apache/hadoop/hive/metastore/PartitionNameWhitelistPreEventListener.java
  
metastore/src/test/org/apache/hadoop/hive/metastore/TestPartitionNameWhitelistPreEventHook.java

To: JIRA, omalley


 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4248.D9993.1.patch, HIVE-4248.D9993.2.patch


 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-03-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616691#comment-13616691
 ] 

Owen O'Malley commented on HIVE-4248:
-

This may result in ORC files with smaller stripes, but that seems far better 
than letting the users get out of memory exceptions.

 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira