Thanks Rajesh, this helped solve the OOM issue. I was going through the wiki documentation for this parameter and was not able to understand it clearly, can you please explain the significance of this ?
I have 2 questions:- 1. Also, this parameter is set to false by default should this be set to true ? 2. I see that the number of mappers generated by setting this parameter to true is less than the number of mappers generated by setting the split.strategy=BI. Therefore, I am hoping that using this parameter along with HYBRID is better than using BI split strategy. Can you please comment on this? Thanks, Jayadeep On Wed, Sep 13, 2017 at 3:14 PM, Rajesh Balamohan <rbalamo...@apache.org> wrote: > With "HYBRID" can you try with "hive.orc.cache.use.soft.references=true"? > That should help in preventing OOM with Hybrid strategy. > > ~Rajesh.B > > On Wed, Sep 13, 2017 at 2:54 PM, Jay <jayadeep.jayara...@gmail.com> wrote: > >> Hi All, >> >> I am running a simple select query as below >> >> select distinct vehicle_no from >> rmd.gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3 >> where incident_dt = '2999-01-01'; >> >> The table is a 2 level partitioned table as shown below >> >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2010-01-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2011-01-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:35 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2012-01-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2013-01-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-01-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-02-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-03-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:36 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-04-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:34 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-05-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:33 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-06-01 >> drwx------ - gpadmin hdfs 0 2017-09-12 14:33 >> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_ >> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-07-01 >> >> >> The ORC files have been created with a rough size of 2 GB and have ZLIB >> compression. >> >> When the hive.exec.orc.split.strategy is set to HYBRID in our HDP 2.6.1 >> cluster the MAP phase is stuck in the INITIALIZATION phases and after about >> 20 minutes it fails with OOM. >> >> When I change hive.exec.orc.split.strategy to BI the SQL runs fine >> without any issues. >> >> My question is what parameter controls the memory assigned while Hive/Tez >> generates the splits? >> >> the hive container size is set to 8GB >> >> Thanks, >> Jayadeep >> > >