Re: Question about SaveMode.Ignore behaviour

2019-05-10 Thread Juho Autio
at 4:28 PM Juho Autio wrote: > Does spark handle 'ignore' mode on file level or partition level? > > > My code is like this: > > df.write \ > .option('mapreduce.fileoutputcommitter.algorithm.version', '2') \ > .mode('ignore') \ > .partit

Question about SaveMode.Ignore behaviour

2019-05-09 Thread Juho Autio
Does spark handle 'ignore' mode on file level or partition level? My code is like this: df.write \ .option('mapreduce.fileoutputcommitter.algorithm.version', '2') \ .mode('ignore') \ .partitionBy('p') \ .orc(target_path) When I used mode('append') my job

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
l challenge > > Le jeu. 25 avr. 2019 à 15:10, Juho Autio a écrit : > >> Ok, I've verified that hive> SHOW PARTITIONS is using get_partition_names, >> which is always quite fast. Spark's insertInto uses >> get_partitions_with_auth which is much slower (it als

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
uld. On Thu, Apr 25, 2019 at 10:12 AM vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Which metastore are you using? > > Le jeu. 25 avr. 2019 à 09:02, Juho Autio a écrit : > >> Would anyone be able to answer this question about the non-optimal >> implement

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
Would anyone be able to answer this question about the non-optimal implementation of insertInto? On Thu, Apr 18, 2019 at 4:45 PM Juho Autio wrote: > Hi, > > My job is writing ~10 partitions with insertInto. With the same input / > output data the total duration of the job is ve

[Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-18 Thread Juho Autio
Hi, My job is writing ~10 partitions with insertInto. With the same input / output data the total duration of the job is very different depending on how many partitions the target table has. Target table with 10 of partitions: 1 min 30 s Target table with ~1 partitions: 13 min 0 s It seems