RE: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-26 Thread van den Heever, Christian CC
Hi, How do I get the filename from textFileStream Using streaming. Thanks a mill Standard Bank email disclaimer and confidentiality note Please go to www.standardbank.co.za/site/homepage/emaildisclaimer.html to read our email disclaimer and confidentiality note. Kindly email

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
> Not sure if the dynamic overwrite logic is implemented in Spark or in Hive AFAIK I'm using spark implementation(s). Does the thread dump that I posted show that? I'd like to remain within Spark impl. What I'm trying to ask is, do you spark developers see some ways to optimize this? Otherwise,

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread vincent gromakowski
There is a probably a limit in the number of element you can pass in the list of partitions for the listPartitionsWithAuthInfo API call. Not sure if the dynamic overwrite logic is implemented in Spark or in Hive, in which case using hive 1.2.1 is probably the reason for un-optimized logic but also

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
Ok, I've verified that hive> SHOW PARTITIONS is using get_partition_names, which is always quite fast. Spark's insertInto uses get_partitions_with_auth which is much slower (it also gets location etc. of each partition). I created a test in java that with a local metastore client to measure the

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Khare, Ankit
Why do you need 1 partition when 10 partition is doing the job .. ?? Thanks Ankit From: vincent gromakowski Date: Thursday, 25. April 2019 at 09:12 To: Juho Autio Cc: user Subject: Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions Which metastore are you

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread vincent gromakowski
Which metastore are you using? Le jeu. 25 avr. 2019 à 09:02, Juho Autio a écrit : > Would anyone be able to answer this question about the non-optimal > implementation of insertInto? > > On Thu, Apr 18, 2019 at 4:45 PM Juho Autio wrote: > >> Hi, >> >> My job is writing ~10 partitions with

Re: [Spark SQL]: Slow insertInto overwrite if target table has many partitions

2019-04-25 Thread Juho Autio
Would anyone be able to answer this question about the non-optimal implementation of insertInto? On Thu, Apr 18, 2019 at 4:45 PM Juho Autio wrote: > Hi, > > My job is writing ~10 partitions with insertInto. With the same input / > output data the total duration of the job is very different