subject:"\[jira\] \[Updated\] \(HIVE\-10036\) Writing ORC format big table causes OOM \- too many fixed sized stream buffers"

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.9.patch

Fixed the unit tests.

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

Key: HIVE-10036
URL: https://issues.apache.org/jira/browse/HIVE-10036
Project: Hive
Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
Labels: orcfile
Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch,
HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch,
HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch

ORC writer keeps multiple out steams for each column. Each output stream is
allocated fixed size ByteBuffer (configurable, default to 256K). For a big
table, the memory cost is unbearable. Specially when HCatalog dynamic
partition involves, several hundreds files may be open and writing at the
same time (same problems for FileSinkOperator).
Global ORC memory manager controls the buffer size, but it only got kicked in
at 5000 rows interval. An enhancement could be done here, but the problem is
reducing the buffer size introduces worse compression and more IOs in read
path. Sacrificing the read performance is always not a good choice.
I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound
to the existing configurable buffer size. Most of the streams does not need
large buffer so the performance got improved significantly. Comparing to
Facebook's hive-dwrf, I monitored 2x performance gain with this fix.
Solving OOM for ORC completely maybe needs lots of effort , but this is
definitely a low hanging fruit.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-07 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.8.patch

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-05 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.8.patch

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-03 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.7.patch

Fixed ql/pom.xml

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-15 Thread Damien Carol (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-10036:

Labels: orcfile (was: )

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-10 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.5.patch

Thanks Mithun and Prasanth! Uploaded modified patch.

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

Key: HIVE-10036
URL: https://issues.apache.org/jira/browse/HIVE-10036
Project: Hive
Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch,
HIVE-10036.3.patch, HIVE-10036.5.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-23 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.2.patch

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.1.patch

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

Key: HIVE-10036
URL: https://issues.apache.org/jira/browse/HIVE-10036
Project: Hive
Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
Attachments: HIVE-10036.1.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

8 matches

Site Navigation

Mail list logo

Footer information