[jira] [Commented] (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290901#comment-14290901 ] Lefty Leverenz commented on HIVE-1830: -- Doc note: This added three configuration parameters to HiveConf.java, with descriptions in the template file. They are documented in the wiki. * [hive.mapjoin.followby.map.aggr.hash.percentmemory | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.followby.map.aggr.hash.percentmemory] * [hive.map.aggr.hash.force.flush.memory.threshold | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.map.aggr.hash.force.flush.memory.threshold] * [hive.mapjoin.followby.gby.localtask.max.memory.usage | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.followby.gby.localtask.max.memory.usage] > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > Fix For: 0.7.0 > > Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, > hive-1830-4.patch, hive-1830-5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969549#action_12969549 ] Namit Jain commented on HIVE-1830: -- +1 > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, > hive-1830-4.patch, hive-1830-5.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969203#action_12969203 ] Namit Jain commented on HIVE-1830: -- if (groupByOp.getConf() == null) { 91 System.out.println("Group by desc is null"); 92 return null; 93} This should never happen GroupByOperator: memoryThreshold = HiveConf.getFloatVar(hconf, HiveConf.ConfVars.HIVEMAPAGGRM⬅ EMORYTHRESHOLD); This should also be in groupByDesc > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, > hive-1830-4.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969070#action_12969070 ] Namit Jain commented on HIVE-1830: -- Add the new parameters description in hive-default.xml. Move the memory threshold in the descriptors (for eg. all the needed confs should be copied in groupbydesc. and accessed from there, instead of being accessed from HiveConf at runtime) - you already did it from HashTableSinkOperator > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969061#action_12969061 ] Namit Jain commented on HIVE-1830: -- I will take a look > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967105#action_12967105 ] Namit Jain commented on HIVE-1830: -- After HIVE-1642, joins are automatically converted into map-joins at physical optimization time. However, this may lead to problems. For eg: consider the query: select T1.val, count(1) from T1 join T2 on T1.key=T2.key group by T1.val This will have 2 map-reduce jobs, one for the join and the other for group by. Before HIVE-1642, the partial group for aggregation will be performed in the reducer where the join is performed. However, after HIVE-1642, the same will be performed in the mapper. The local task will confirm that there is just enough memory to hold the map-join data. Hoever, it does not take into account the memory needed for partial group by. So, in case there is group by followed by join, it is a good idea to reduce the memory given to the local task to validate if there is enough memory to fit small table - it can be controlled by a new configuration paramter, but it can be some default: say 70% of total memory (instead of 90%). Also, the group by may still run out of memory, so it might be a good idea to check in group by for free memory and periodically flush memory > mappers in group followed by joins may die OOM > -- > > Key: HIVE-1830 > URL: https://issues.apache.org/jira/browse/HIVE-1830 > Project: Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Liyin Tang > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.