Do you use the HiveContext in Spark? Do you configure the same options there? Can you share some code?
> Am 07.08.2019 um 08:50 schrieb Rishikesh Gawade <rishikeshg1...@gmail.com>: > > Hi. > I am using Spark 2.3.2 and Hive 3.1.0. > Even if i use parquet files the result would be same, because after all > sparkSQL isn't able to descend into the subdirectories over which the table > is created. Could there be any other way? > Thanks, > Rishikesh > >> On Tue, Aug 6, 2019, 1:03 PM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> which versions of Spark and Hive are you using. >> >> what will happen if you use parquet tables instead? >> >> HTH >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> http://talebzadehmich.wordpress.com >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> >>> On Tue, 6 Aug 2019 at 07:58, Rishikesh Gawade <rishikeshg1...@gmail.com> >>> wrote: >>> Hi. >>> I have built a Hive external table on top of a directory 'A' which has data >>> stored in ORC format. This directory has several subdirectories inside it, >>> each of which contains the actual ORC files. >>> These subdirectories are actually created by spark jobs which ingest data >>> from other sources and write it into this directory. >>> I tried creating a table and setting the table properties of the same as >>> hive.mapred.supports.subdirectories=TRUE and >>> mapred.input.dir.recursive=TRUE. >>> As a result of this, when i fire the simplest query of select count(*) from >>> ExtTable via the Hive CLI, it successfully gives me the expected count of >>> records in the table. >>> However, when i fire the same query via sparkSQL, i get count = 0. >>> >>> I think the sparkSQL isn't able to descend into the subdirectories for >>> getting the data while hive is able to do so. >>> Are there any configurations needed to be set on the spark side so that >>> this works as it does via hive cli? >>> I am using Spark on YARN. >>> >>> Thanks, >>> Rishikesh >>> >>> Tags: subdirectories, subdirectory, recursive, recursion, hive external >>> table, orc, sparksql, yarn