Thanks Rajesh, this helped solve the OOM issue. I was going through the
wiki documentation for this parameter and was not able to understand it
clearly, can you please explain the significance of this ?

I have 2 questions:-
1. Also, this parameter is set to false by default should this be set to
true ?
2. I see that the number of mappers generated by setting this parameter to
true is less than the number of mappers generated by setting the
split.strategy=BI. Therefore, I am hoping that using this parameter along
with HYBRID is better than using BI split strategy. Can you please comment
on this?

Thanks,
Jayadeep

On Wed, Sep 13, 2017 at 3:14 PM, Rajesh Balamohan <rbalamo...@apache.org>
wrote:

> With "HYBRID" can you try with "hive.orc.cache.use.soft.references=true"?
> That should help in preventing OOM with Hybrid strategy.
>
> ~Rajesh.B
>
> On Wed, Sep 13, 2017 at 2:54 PM, Jay <jayadeep.jayara...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am running a simple select query as below
>>
>> select distinct vehicle_no from 
>> rmd.gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3
>> where incident_dt = '2999-01-01';
>>
>> The table is a 2 level partitioned table as shown below
>>
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2010-01-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2011-01-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:35
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2012-01-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2013-01-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-01-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-02-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-03-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:36
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-04-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:34
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-05-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:33
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-06-01
>> drwx------   - gpadmin hdfs          0 2017-09-12 14:33
>> /apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_
>> concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-07-01
>>
>>
>> The ORC files have been created with a rough size of 2 GB and have ZLIB
>> compression.
>>
>> When the hive.exec.orc.split.strategy is set to HYBRID in our HDP 2.6.1
>> cluster the MAP phase is stuck in the INITIALIZATION phases and after about
>> 20 minutes it fails with OOM.
>>
>> When I change hive.exec.orc.split.strategy to BI the SQL runs fine
>> without any issues.
>>
>> My question is what parameter controls the memory assigned while Hive/Tez
>> generates the splits?
>>
>> the hive container size is set to 8GB
>>
>> Thanks,
>> Jayadeep
>>
>
>

Reply via email to