Hi Pradeep,
Here is a way to partition your data into different files, by calling
repartition() on the dataframe:
df.repartition(12, $"Month")
.write
.format(...)
This is assuming you want to partition by a "month" column where there are
12 different values. Each partition will be stored in a
Hi,
I don't want to reduce partitions. Should write files depending upon the column
value.
Trying to understand how reducing partition size will make it work.
Regards,
Pradeep
> On May 9, 2016, at 6:42 PM, Gourav Sengupta wrote:
>
> Hi,
>
> its supported, try to use coalesce(1) (the spellin
Hi,
its supported, try to use coalesce(1) (the spelling is wrong) and after
that do the partitions.
Regards,
Gourav
On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote:
> Hi,
>
> I have to write tab delimited file and need to have one directory for each
> unique value of a column.
>
> I tried using
Hi,
I have to write tab delimited file and need to have one directory for each
unique value of a column.
I tried using spark-csv with partitionBy and seems it is not supported. Is
there any other option available for doing this?
Regards,
Pradeep
Hello,
I want to save Spark job result as LZO compressed CSV files partitioned by
one or more columns.
Given that partitionBy is not supported by spark-csv, is there any
recommendation for achieving this in user code?
One quick option is to
i) cache the result dataframe
ii) get unique partiti