Thanks Jorn. Really appreciate your response. Based on your inputs I have increased the default heap size from 1000MB to 3000 MB. Will let you know the results soon. My hive execution engine is ‘Tez’ and the hive version is ‘Hive 2.1.1-amzn-0’. I don’t have the liberty to switch back to mr. My actual hive query is as below where I want to find the top 50 performers for my partition based on the page view time. You have any other inputs that I can play around with?
create table t1_content_pages_agg_by_month stored as orc as select * from ( select A.dt ,A.year ,A.month ,A.bouncer ,A.visitor_type ,A.device_type ,A.pg_domain_name ,A.pg_page_url ,A.class1_id ,A.class2_id ,A.total_page_view_time ,row_number() over ( PARTITION BY A.dt,A.year, A.month, A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc ) as rank from content_pages_agg_by_month A )AA where rank <= ${TOP_K_VAL}; Regards, Sujeet Singh Pardeshi Software Specialist SAS Research and Development (India) Pvt. Ltd. Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411 013 off: +91-20-30418810 [Description: untitled] "When the solution is simple, God is answering…" From: Jörn Franke <jornfra...@gmail.com> Sent: 10 January 2019 PM 01:29 To: user@hive.apache.org Cc: Shashikant Deore <shashikant.de...@sas.com> Subject: Re: Out Of Memory Error EXTERNAL Which hive version and engine? If it is tez then you can also try mr as an engine set hive.execution.engine=mr that will use less memory. Check also the max heap space configuration on the nodes . Maybe you have physically 16 gb memory but the Java process takes only 4 or so memory. Maybe your query could be also expressed differently, but I miss background information on the use case. Last but not least : size on disk != size in memory especially if you go beyond simple queries. Try also with a smaller subset of the data when you reach the memory limit Am 10.01.2019 um 07:57 schrieb Sujeet Pardeshi <sujeet.parde...@sas.com<mailto:sujeet.parde...@sas.com>>: Hi Pals, I have the below Hive SQL which is hitting the following error “at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space at”. It’s basically going out of memory. The table on which the query is being hit has 246608473 (246 million) records, its size is around 43 GB’s. I am running this sql on a Hadoop cluster which has 4 nodes, every node has 16GB memory and 128 GB disk space. I can definitely increase the memory, can scale up more clusters and try but is there something that I can do to make this query work without having to touch the clusters or the memory? create table t1_content_pages_agg_by_month stored as orc as select * from ( select A.dt ,A.year ,A.month ,A.bouncer ,A.visitor_type ,A.device_type ,A.pg_domain_name ,A.pg_page_url ,A.class1_id ,A.class2_id ,A.total_page_view_time ,row_number() over ( PARTITION BY A.dt,A.year, A.month, A.bouncer,A.visitor_type,A.device_type) as rank from content_pages_agg_by_month A )AA ; Regards, Sujeet Singh Pardeshi Software Specialist SAS Research and Development (India) Pvt. Ltd. Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411 013 off: +91-20-30418810 <image001.png> "When the solution is simple, God is answering…"