Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Igor Berman Sat, 01 Oct 2016 03:15:07 -0700

Takeshi, why are you saying this, how have you checked it's only used from
2.7.3?
We use spark 2.0 which is shipped with hadoop dependency of 2.7.2 and we
use this setting.
We've sort of "verified" it's used by configuring log of file output
commiter


On 30 September 2016 at 03:12, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Hi,
>
> FYI: Seems 
> `sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2”)`
> is only available at hadoop-2.7.3+.
>
> // maropu
>
>
> On Thu, Sep 29, 2016 at 9:28 PM, joffe.tal <joffe....@gmail.com> wrote:
>
>> You can use partition explicitly by adding "/<col_name>=<partition
>> value>" to
>> the end of the path you are writing to and then use overwrite.
>>
>> BTW in Spark 2.0 you just need to use:
>>
>> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.al
>> gorithm.version","2”)
>> and use s3a://
>>
>> and you can work with regular output committer (actually
>> DirectParquetOutputCommitter is no longer available in Spark 2.0)
>>
>> so if you are planning on upgrading this could be another motivation
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/S3-DirectParquetOutputCommitter-Partit
>> ionBy-SaveMode-Append-tp26398p27810.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Reply via email to