Thanks Jorn. Really appreciate your response. Based on your inputs I have 
increased the default heap size from 1000MB to 3000 MB. Will let you know the 
results soon. My hive execution engine is ‘Tez’ and the hive version is ‘Hive 
2.1.1-amzn-0’. I don’t have the liberty to switch back to mr. My actual hive 
query is as below where I want to find the top 50 performers for my partition 
based on the page view time. You have any other inputs that I can play around 
with?

create table t1_content_pages_agg_by_month stored as orc
as
select * from (
select A.dt
               ,A.year
               ,A.month
               ,A.bouncer
               ,A.visitor_type
               ,A.device_type
               ,A.pg_domain_name
               ,A.pg_page_url
               ,A.class1_id
               ,A.class2_id
               ,A.total_page_view_time
               ,row_number() over ( PARTITION BY A.dt,A.year, A.month, 
A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc ) 
as rank
from content_pages_agg_by_month A
)AA
where rank <= ${TOP_K_VAL};



Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-30418810
[Description: untitled]
 "When the solution is simple, God is answering…"

From: Jörn Franke <jornfra...@gmail.com>
Sent: 10 January 2019 PM 01:29
To: user@hive.apache.org
Cc: Shashikant Deore <shashikant.de...@sas.com>
Subject: Re: Out Of Memory Error


EXTERNAL
Which hive version and engine?

If it is tez then you can also try mr as an engine set hive.execution.engine=mr 
that will use less memory. Check also the max heap space configuration on the 
nodes . Maybe you have physically 16 gb memory but the Java process takes only 
4 or so memory.

Maybe your query could be also expressed differently, but I miss background 
information on the use case.

Last but not least : size on disk != size in memory especially if you go beyond 
simple queries.

Try also with a smaller subset of the data when you reach the memory limit

Am 10.01.2019 um 07:57 schrieb Sujeet Pardeshi 
<sujeet.parde...@sas.com<mailto:sujeet.parde...@sas.com>>:
Hi Pals,
I have the below Hive SQL which is hitting the following error “at 
java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: 
Java heap space at”. It’s basically going out of memory. The table on which the 
query is being hit has 246608473 (246 million) records, its size is around 43 
GB’s. I am running this sql on a Hadoop cluster which has 4 nodes, every node 
has 16GB memory and 128 GB disk space. I can definitely increase the memory, 
can scale up more clusters and try but is there something that I can do to make 
this query work without having to touch the clusters or the memory?

create table t1_content_pages_agg_by_month stored as orc
as
select * from (
select A.dt
               ,A.year
               ,A.month
               ,A.bouncer
               ,A.visitor_type
               ,A.device_type
               ,A.pg_domain_name
               ,A.pg_page_url
               ,A.class1_id
               ,A.class2_id
               ,A.total_page_view_time
               ,row_number() over ( PARTITION BY A.dt,A.year, A.month, 
A.bouncer,A.visitor_type,A.device_type) as rank
from content_pages_agg_by_month A
)AA
;


Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-30418810
<image001.png>
 "When the solution is simple, God is answering…"

Reply via email to