Thanks a lot! 2018-03-18 9:30 GMT+01:00 Denis Bolshakov <bolshakov.de...@gmail.com>:
> Please checkout. > > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand > > > and > > org.apache.spark.sql.execution.datasources.WriteRelation > > > I guess it's managed by > > job.getConfiguration.set(DATASOURCE_WRITEJOBUUID, uniqueWriteJobId.toString) > > > On 17 March 2018 at 20:46, Serega Sheypak <serega.shey...@gmail.com> > wrote: > >> Hi Denis, great to see you here :) >> It works, thanks! >> >> Do you know how spark generates datafile names? names look like >> part-0000 with uuid appended after >> >> part-00000-124a8c43-83b9-44e1-a9c4-dcc8676cdb99.c000.snappy.parquet >> >> >> >> >> 2018-03-17 14:15 GMT+01:00 Denis Bolshakov <bolshakov.de...@gmail.com>: >> >>> Hello Serega, >>> >>> https://spark.apache.org/docs/latest/sql-programming-guide.html >>> >>> Please try SaveMode.Append option. Does it work for you? >>> >>> >>> сб, 17 мар. 2018 г., 15:19 Serega Sheypak <serega.shey...@gmail.com>: >>> >>>> Hi, I', using spark-sql to process my data and store result as parquet >>>> partitioned by several columns >>>> >>>> ds.write >>>> .partitionBy("year", "month", "day", "hour", "workflowId") >>>> .parquet("/here/is/my/dir") >>>> >>>> >>>> I want to run more jobs that will produce new partitions or add more >>>> files to existing partitions. >>>> What is the right way to do it? >>>> >>> >> > > > -- > //with Best Regards > --Denis Bolshakov > e-mail: bolshakov.de...@gmail.com >