subject:"Spark\-csv\- partitionBy"

Re: Spark-csv- partitionBy

2016-05-10 Thread Xinh Huynh

Hi Pradeep, Here is a way to partition your data into different files, by calling repartition() on the dataframe: df.repartition(12, $"Month") .write .format(...) This is assuming you want to partition by a "month" column where there are 12 different values. Each partition will be stored in a

Re: Spark-csv- partitionBy

2016-05-10 Thread Mail.com

Hi, I don't want to reduce partitions. Should write files depending upon the column value. Trying to understand how reducing partition size will make it work. Regards, Pradeep > On May 9, 2016, at 6:42 PM, Gourav Sengupta wrote: > > Hi, > > its supported, try to use coalesce(1) (the spellin

Re: Spark-csv- partitionBy

2016-05-09 Thread Gourav Sengupta

Hi, its supported, try to use coalesce(1) (the spelling is wrong) and after that do the partitions. Regards, Gourav On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote: > Hi, > > I have to write tab delimited file and need to have one directory for each > unique value of a column. > > I tried using

Spark-csv- partitionBy

2016-05-09 Thread Mail.com

Hi, I have to write tab delimited file and need to have one directory for each unique value of a column. I tried using spark-csv with partitionBy and seems it is not supported. Is there any other option available for doing this? Regards, Pradeep

spark-csv partitionBy

2016-02-09 Thread Srikanth

Hello, I want to save Spark job result as LZO compressed CSV files partitioned by one or more columns. Given that partitionBy is not supported by spark-csv, is there any recommendation for achieving this in user code? One quick option is to i) cache the result dataframe ii) get unique partiti

Re: Spark-csv- partitionBy

Re: Spark-csv- partitionBy

Re: Spark-csv- partitionBy

Spark-csv- partitionBy

spark-csv partitionBy

5 matches

Site Navigation

Mail list logo

Footer information