[ 
https://issues.apache.org/jira/browse/HIVE-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13755:
----------------------------------
    Priority: Critical  (was: Major)

> Hybrid mapjoin allocates memory the same for multi broadcast
> ------------------------------------------------------------
>
>                 Key: HIVE-13755
>                 URL: https://issues.apache.org/jira/browse/HIVE-13755
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.0
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>            Priority: Critical
>
> PROBLEM:
> When hybrid mapjoin gets the memory needed, it estimates memory needed for 
> each hashtable the same. This may cause problem when there are multiple 
> broadcast, as it may exceeds the memory intended to allocate to it.
> Example reducer task log attached.  This task has 5 broadcast input,
> Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 
> (SIMPLE_EDGE)
> excerpt of it:
> {code}
> 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Memory manager allocates 0 bytes for the loading hashtable.
> 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] 
> |persistence.HashMapWrapper|: Key count from statistics is 210; setting map 
> size to 280
> 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Total available memory: 1968177152
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Estimated small table size: 155190
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of hash partitions to be 
> created: 16
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Write buffer size: 524288
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions created: 16
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions spilled directly 
> to disk on creation: 0
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Using tableContainer HybridHashTableContainer
> 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Initializing container with 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] 
> |readers.UnorderedKVReader|: Num Records read: 20
> 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG 
> method=LoadHashtable start=1458069830811 end=1458069830814 duration=3 
> from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
> 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching 
> key: 
> svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container
> 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: 
> Initializing operator HASHTABLEDUMMY[32]
> 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: 
> Initializing operator MAPJOIN[26]
> 2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN 
> struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string>
>  totalsz = 95
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG 
> method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Memory manager allocates 0 bytes for the loading hashtable.
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
> |persistence.HashMapWrapper|: Key count from statistics is 5942112; setting 
> map size to 7922816
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Total available memory: 1968177152
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Estimated small table size: 1324101915
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of hash partitions to be 
> created: 16
> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Write buffer size: 8388608
> 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions created: 16
> 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions spilled directly 
> to disk on creation: 0
> 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Using tableContainer HybridHashTableContainer
> 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Initializing container with 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> 2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] 
> |readers.UnorderedKVReader|: Num Records read: 852596
> 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG 
> method=LoadHashtable start=1458069830817 end=1458069831563 duration=746 
> from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
> 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching 
> key: 
> svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container
> 2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|: 
> Initializing operator HASHTABLEDUMMY[31]
> 2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: 
> Initializing operator MAPJOIN[27]
> 2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN 
> struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string>
>  totalsz = 93
> 2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG 
> method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Memory manager allocates 0 bytes for the loading hashtable.
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
> |persistence.HashMapWrapper|: Key count from statistics is 293380; setting 
> map size to 391174
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Total available memory: 1968177152
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Estimated small table size: 69929471
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of hash partitions to be 
> created: 16
> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Write buffer size: 4194304
> 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions created: 16
> 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Number of partitions spilled directly 
> to disk on creation: 0
> 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: 
> Using tableContainer HybridHashTableContainer
> 2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1] 
> |persistence.HybridHashTableContainer|: Initializing container with 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> 2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] 
> |readers.UnorderedKVReader|: Num Records read: 586760
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to