I'm happy to help out in this effort and will look at that label and see what
tests I can look into and/or fix.
From: Kay Ousterhout
Sent: Monday, March 27, 2017 9:47 PM
To: Reynold Xin
Cc: Saikat Kanjilal; Sean Owen;
Steve is right that the S3 committer isn't a ParquetOutputCommitter. I
think that the reason that check exists is to make sure Parquet writes
_metadata summary files to an output directory. But, I think the **summary
files are a bad idea**, so we bypass that logic and use the committer
directly if
We just fixed the build yesterday. I'll kick off a new RC today.
On Tue, Mar 28, 2017 at 8:04 AM, Asher Krim wrote:
> Hey Michael,
> any update on this? We're itching for a 2.1.1 release (specifically
> SPARK-14804 which is currently blocking us)
>
> Thanks,
> Asher Krim
>
Hey Michael,
any update on this? We're itching for a 2.1.1 release (specifically
SPARK-14804 which is currently blocking us)
Thanks,
Asher Krim
Senior Software Engineer
On Wed, Mar 22, 2017 at 7:44 PM, Michael Armbrust
wrote:
> An update: I cut the tag for RC1 last
> On 28 Mar 2017, at 05:20, sririshindra wrote:
>
> Hi
>
> I have a job which saves a dataframe as parquet file to s3.
>
> The built a jar using your repository https://github.com/rdblue/s3committer.
>
> I added the following config in the to the Spark Session
>
I am not sure why you want to transform rows in the dataframe using
mapPartitions like that.
If you want to project the rows with some expressions, you can use the API
like selectExpr and let Spark SQL to resolve expressions. To resolve
expressions manually, you need to (at least) deal with a