In my case I am just writing the data frame back to hive. so when is the
best case to repartition it. I did repartition before calling insert
overwrite on table

On Tue, Oct 17, 2017 at 3:07 PM, Sebastian Piu <sebastian....@gmail.com>
wrote:

> You have to repartition/coalesce *after *the action that is causing the
> shuffle as that one will take the value you've set
>
> On Tue, Oct 17, 2017 at 8:40 PM KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
>
>> Yes still I see more number of part files and exactly the number I have
>> defined did spark.sql.shuffle.partitions
>>
>> Sent from my iPhone
>>
>> On Oct 17, 2017, at 2:32 PM, Michael Artz <michaelea...@gmail.com> wrote:
>>
>> Have you tried caching it and using a coalesce?
>>
>>
>>
>> On Oct 17, 2017 1:47 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com>
>> wrote:
>>
>>> I tried repartitions but spark.sql.shuffle.partitions is taking up
>>> precedence over repartitions or coalesce. how to get the lesser number of
>>> files with same performance?
>>>
>>> On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara <
>>> tushar_adesh...@persistent.com> wrote:
>>>
>>>> You can also try coalesce as it will avoid full shuffle.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> *Tushar Adeshara*
>>>>
>>>> *Technical Specialist – Analytics Practice*
>>>>
>>>> *Cell: +91-81490 04192 <+91%2081490%2004192>*
>>>>
>>>> *Persistent Systems** Ltd. **| **Partners in Innovation **|* 
>>>> *www.persistentsys.com
>>>> <http://www.persistentsys.com/>*
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
>>>> *Sent:* 13 October 2017 09:35
>>>> *To:* user @spark
>>>> *Subject:* Spark - Partitions
>>>>
>>>> Hi,
>>>>
>>>> I am reading hive query and wiriting the data back into hive after
>>>> doing some transformations.
>>>>
>>>> I have changed setting spark.sql.shuffle.partitions to 2000 and since
>>>> then job completes fast but the main problem is I am getting 2000 files for
>>>> each partition
>>>> size of file is 10 MB .
>>>>
>>>> is there a way to get same performance but write lesser number of files
>>>> ?
>>>>
>>>> I am trying repartition now but would like to know if there are any
>>>> other options.
>>>>
>>>> Thanks,
>>>> Asmath
>>>> DISCLAIMER
>>>> ==========
>>>> This e-mail may contain privileged and confidential information which
>>>> is the property of Persistent Systems Ltd. It is intended only for the use
>>>> of the individual or entity to which it is addressed. If you are not the
>>>> intended recipient, you are not authorized to read, retain, copy, print,
>>>> distribute or use this message. If you have received this communication in
>>>> error, please notify the sender and delete all copies of this message.
>>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>>> mails.
>>>>
>>>
>>>

Reply via email to