What you can do is at hive creates partitioned column for example date and use Val finalDf = repartition(data frame.col("date-column")) and later say insert overwrite tablename partition(date-column) select * from tempview
Would work as expected On 11-Aug-2017 11:03 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> wrote: > we had spark.sql.partitions as 4 but in hdfs it is ending up with 200 > files and 4 files are actually having data and rest of them are having zero > bytes. > > My only requirement is to run fast for hive insert overwrite query from > spark temporary table and end up having less files instead of more files > with zero bytes. > > I am using spark sql query of hive insert overwite not the write method on > dataframe as it is not supported in 1.6 version of spark for kerberos > cluster. > > > On Fri, Aug 11, 2017 at 12:23 PM, Lukas Bradley <lukasbrad...@gmail.com> > wrote: > >> Please show the write() call, and the results in HDFS. What are all the >> files you see? >> >> On Fri, Aug 11, 2017 at 1:10 PM, KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> tempTable = union_df.registerTempTable("tempRaw") >>> >>> create = hc.sql('CREATE TABLE IF NOT EXISTS blab.pyspark_dpprq (vin >>> string, utctime timestamp, description string, descriptionuom string, >>> providerdesc string, dt_map string, islocation string, latitude double, >>> longitude double, speed double, value string)') >>> >>> insert = hc.sql('INSERT OVERWRITE TABLE blab.pyspark_dpprq SELECT * FROM >>> tempRaw') >>> >>> >>> >>> >>> On Fri, Aug 11, 2017 at 11:00 AM, Daniel van der Ende < >>> daniel.vandere...@gmail.com> wrote: >>> >>>> Hi Asmath, >>>> >>>> Could you share the code you're running? >>>> >>>> Daniel >>>> >>>> On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, < >>>> mdkhajaasm...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I am using spark sql to write data back to hdfs and it is resulting in >>>>> multiple output files. >>>>> >>>>> >>>>> >>>>> I tried changing number spark.sql.shuffle.partitions=1 but it >>>>> resulted in very slow performance. >>>>> >>>>> >>>>> >>>>> Also tried coalesce and repartition still the same issue. any >>>>> suggestions? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Asmath >>>>> >>>> >>> >> >