Hi All, I am running a simple select query as below
select distinct vehicle_no from rmd.gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3 where incident_dt = '2999-01-01'; The table is a 2 level partitioned table as shown below drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2010-01-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2011-01-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:35 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2012-01-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2013-01-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-01-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-02-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-03-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:36 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-04-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:34 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-05-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:33 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-06-01 drwx------ - gpadmin hdfs 0 2017-09-12 14:33 /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-07-01 The ORC files have been created with a rough size of 2 GB and have ZLIB compression. When the hive.exec.orc.split.strategy is set to HYBRID in our HDP 2.6.1 cluster the MAP phase is stuck in the INITIALIZATION phases and after about 20 minutes it fails with OOM. When I change hive.exec.orc.split.strategy to BI the SQL runs fine without any issues. My question is what parameter controls the memory assigned while Hive/Tez generates the splits? the hive container size is set to 8GB Thanks, Jayadeep