[jira] [Commented] (HIVE-1830) mappers in group followed by joins may die OOM

2015-01-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290901#comment-14290901
 ] 

Lefty Leverenz commented on HIVE-1830:
--

Doc note:  This added three configuration parameters to HiveConf.java, with 
descriptions in the template file.  They are documented in the wiki.

* [hive.mapjoin.followby.map.aggr.hash.percentmemory | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.followby.map.aggr.hash.percentmemory]
* [hive.map.aggr.hash.force.flush.memory.threshold | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.map.aggr.hash.force.flush.memory.threshold]
* [hive.mapjoin.followby.gby.localtask.max.memory.usage | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.followby.gby.localtask.max.memory.usage]

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
> hive-1830-4.patch, hive-1830-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-08 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969549#action_12969549
 ] 

Namit Jain commented on HIVE-1830:
--

+1

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
> hive-1830-4.patch, hive-1830-5.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969203#action_12969203
 ] 

Namit Jain commented on HIVE-1830:
--

  if (groupByOp.getConf() == null) {
91  System.out.println("Group by desc is null");
92  return null;
93}




This should never happen


GroupByOperator:
memoryThreshold = HiveConf.getFloatVar(hconf, 
HiveConf.ConfVars.HIVEMAPAGGRM⬅
EMORYTHRESHOLD);


This should also be in groupByDesc



> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
> hive-1830-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969070#action_12969070
 ] 

Namit Jain commented on HIVE-1830:
--

Add the new parameters description in hive-default.xml.
Move the memory threshold in the descriptors (for eg. all the needed confs 
should be copied in groupbydesc.
and accessed from there, instead of being accessed from HiveConf at runtime) - 
you already did it from
HashTableSinkOperator


> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969061#action_12969061
 ] 

Namit Jain commented on HIVE-1830:
--

I will take a look 

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967105#action_12967105
 ] 

Namit Jain commented on HIVE-1830:
--

After HIVE-1642, joins are automatically converted into map-joins at physical 
optimization time.

However, this may lead to problems.


For eg:  consider the query:

select T1.val, count(1) from T1 join T2 on T1.key=T2.key group by T1.val


This will have 2 map-reduce jobs, one for the join and the other for group by.

Before HIVE-1642, the partial group for aggregation will be performed in the 
reducer where the join is performed.
However, after HIVE-1642, the same will be performed in the mapper. The local 
task will confirm that there is  just
enough memory to hold the map-join data. Hoever, it does not take into account 
the memory needed for partial group
by.

So, in case there is group by followed by join, it is a good idea to reduce the 
memory given to the local task to validate
if there is enough memory to fit small table - it can be controlled by a new 
configuration paramter, but it can be some
default: say 70% of total memory (instead of 90%).

Also, the group by may still run out of memory, so it might be a good idea to 
check in group by for free memory and
periodically flush memory

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.