as for the amount of data getting larger and larger, the same OOM occurs
again, and we set
hive.exec.reducers.bytes.per.reducer
from 256MB to 64MB, and everything goes well after that ~
os the root cause of the issue is one reduce cannot process so much data in
a round. hope it helps.
2017-08-
by decreasing mapreduce.reduce.shuffle.parallelcopies from 20 to 5, it
seems that everything goes well, no OOM ~~
2017-08-23 17:19 GMT+08:00 panfei :
> The full error stack is (which described here : https://issues.apache.org/
> jira/browse/MAPREDUCE-6108) :
>
> this error can not reproduce ever
The full error stack is (which described here :
https://issues.apache.org/jira/browse/MAPREDUCE-6108) :
this error can not reproduce every time, after retry several times, the job
successfully finished.
2017-08-23 17:16:03,574 WARN [main]
org.apache.hadoop.mapred.YarnChild: Exception running chil
Hi Gopal, Thanks for all the information and suggestion.
The Hive version is 2.0.1 and use Hive-on-MR as the execution engine.
I think I should create a intermediate table which includes all the
dimensions (including the serval kinds of ids), and then use spark-sql to
calculate the distinct value
> COUNT(DISTINCT monthly_user_id) AS monthly_active_users,
> COUNT(DISTINCT weekly_user_id) AS weekly_active_users,
…
> GROUPING_ID() AS gid,
> COUNT(1) AS dummy
There are two things which prevent Hive from optimize multiple count distincts.
Another aggregate like a count(1) or a Grouping sets li