at 4:28 PM Juho Autio wrote:
> Does spark handle 'ignore' mode on file level or partition level?
>
>
> My code is like this:
>
> df.write \
> .option('mapreduce.fileoutputcommitter.algorithm.version', '2') \
> .mode('ignore') \
> .partit
Does spark handle 'ignore' mode on file level or partition level?
My code is like this:
df.write \
.option('mapreduce.fileoutputcommitter.algorithm.version', '2') \
.mode('ignore') \
.partitionBy('p') \
.orc(target_path)
When I used mode('append') my job
l challenge
>
> Le jeu. 25 avr. 2019 à 15:10, Juho Autio a écrit :
>
>> Ok, I've verified that hive> SHOW PARTITIONS is using get_partition_names,
>> which is always quite fast. Spark's insertInto uses
>> get_partitions_with_auth which is much slower (it als
uld.
On Thu, Apr 25, 2019 at 10:12 AM vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:
> Which metastore are you using?
>
> Le jeu. 25 avr. 2019 à 09:02, Juho Autio a écrit :
>
>> Would anyone be able to answer this question about the non-optimal
>> implement
Would anyone be able to answer this question about the non-optimal
implementation of insertInto?
On Thu, Apr 18, 2019 at 4:45 PM Juho Autio wrote:
> Hi,
>
> My job is writing ~10 partitions with insertInto. With the same input /
> output data the total duration of the job is ve
Hi,
My job is writing ~10 partitions with insertInto. With the same input /
output data the total duration of the job is very different depending on
how many partitions the target table has.
Target table with 10 of partitions:
1 min 30 s
Target table with ~1 partitions:
13 min 0 s
It seems