Re: Write only one output file in Spark SQL

Lukas Bradley Fri, 11 Aug 2017 10:24:35 -0700

Please show the write() call, and the results in HDFS.  What are all the
files you see?


On Fri, Aug 11, 2017 at 1:10 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> tempTable = union_df.registerTempTable("tempRaw")
>
> create = hc.sql('CREATE TABLE IF NOT EXISTS blab.pyspark_dpprq (vin
> string, utctime timestamp, description string, descriptionuom string,
> providerdesc string, dt_map string, islocation string, latitude double,
> longitude double, speed double, value string)')
>
> insert = hc.sql('INSERT OVERWRITE TABLE blab.pyspark_dpprq SELECT * FROM
> tempRaw')
>
>
>
>
> On Fri, Aug 11, 2017 at 11:00 AM, Daniel van der Ende <
> daniel.vandere...@gmail.com> wrote:
>
>> Hi Asmath,
>>
>> Could you share the code you're running?
>>
>> Daniel
>>
>> On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, <mdkhajaasm...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> I am using spark sql to write data back to hdfs and it is resulting in
>>> multiple output files.
>>>
>>>
>>>
>>> I tried changing number spark.sql.shuffle.partitions=1 but it resulted
>>> in very slow performance.
>>>
>>>
>>>
>>> Also tried coalesce and repartition still the same issue. any
>>> suggestions?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Asmath
>>>
>>
>

Re: Write only one output file in Spark SQL

Reply via email to