Re: Hive external table not working in sparkSQL when subdirectories are present

Jörn Franke Wed, 07 Aug 2019 00:14:38 -0700

Do you use the HiveContext in Spark? Do you configure the same options there? 
Can you share some code?


> Am 07.08.2019 um 08:50 schrieb Rishikesh Gawade <rishikeshg1...@gmail.com>:
> 
> Hi.
> I am using Spark 2.3.2 and Hive 3.1.0. 
> Even if i use parquet files the result would be same, because after all 
> sparkSQL isn't able to descend into the subdirectories over which the table 
> is created. Could there be any other way?
> Thanks,
> Rishikesh
> 
>> On Tue, Aug 6, 2019, 1:03 PM Mich Talebzadeh <mich.talebza...@gmail.com> 
>> wrote:
>> which versions of Spark and Hive are you using.
>> 
>> what will happen if you use parquet tables instead?
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>>> On Tue, 6 Aug 2019 at 07:58, Rishikesh Gawade <rishikeshg1...@gmail.com> 
>>> wrote:
>>> Hi.
>>> I have built a Hive external table on top of a directory 'A' which has data 
>>> stored in ORC format. This directory has several subdirectories inside it, 
>>> each of which contains the actual ORC files.
>>> These subdirectories are actually created by spark jobs which ingest data 
>>> from other sources and write it into this directory.
>>> I tried creating a table and setting the table properties of the same as 
>>> hive.mapred.supports.subdirectories=TRUE and 
>>> mapred.input.dir.recursive=TRUE.
>>> As a result of this, when i fire the simplest query of select count(*) from 
>>> ExtTable via the Hive CLI, it successfully gives me the expected count of 
>>> records in the table.
>>> However, when i fire the same query via sparkSQL, i get count = 0.
>>> 
>>> I think the sparkSQL isn't able to descend into the subdirectories for 
>>> getting the data while hive is able to do so.
>>> Are there any configurations needed to be set on the spark side so that 
>>> this works as it does via hive cli? 
>>> I am using Spark on YARN.
>>> 
>>> Thanks,
>>> Rishikesh
>>> 
>>> Tags: subdirectories, subdirectory, recursive, recursion, hive external 
>>> table, orc, sparksql, yarn

Re: Hive external table not working in sparkSQL when subdirectories are present

Reply via email to