[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Summary: Fix OOM in MapTask with many input partitions with RCFile  (was: 
Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's 
cachedLazyStruct weakly referenced)

 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to partition here) will construct its own SerDe
 * Each SerDe will do its own caching of deserialized object (and try to reuse 
 it), but will never release it (in this case, the 
 serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
 a lot of space - pretty much the last N rows of a file where N is the number 
 of rows in a columnar block).
 * This problem may exist in other SerDe as well, but columnar file format are 
 affected the most because they need bigger cache for the last N rows instead 
 of 1 row.
 Proposed solution:
 * Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
 other columnar serde if any (e.g. maybe ORCFile's serde as well).
 Alternative solutions:
 * We can also free up the whole SerDe after processing a block/file.  The 
 problem with that is that the input splits may contain multiple blocks/files 
 that maps to the same SerDe, and recreating a SerDe is just more work.
 * We can also move the SerDe creation/free-up to the place when input file 
 changes.  But that requires a much bigger change to the code.
 * We can also add a cleanup() method to SerDe interface that release the 
 cached object, but that change is not backward compatible with many SerDes 
 that people have wrote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.


  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
other columnar serde if any (e.g. maybe ORCFile's serde as well).

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.



 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to partition here) will construct its own SerDe
 * Each SerDe will do its own caching of deserialized object (and try to reuse 
 it), but will never release it (in this case, the 
 serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
 a lot of space - pretty much the last N rows of a file where N is the number 
 of rows in a columnar block).
 * 

[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is a much bigger change to 
the code.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.



  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.




 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to