Re: PutParquet with S3

2017-12-05 Thread Bryan Bende
Take a look at the MergeRecord processor, you can use that before PutParquet to create the appropriately sized files. On Tue, Dec 5, 2017 at 10:36 PM Madhukar Thota wrote: > Thanks Joey, > > It worked. Do you know how to control the parquet file size when it writes >

Re: PutParquet with S3

2017-12-05 Thread Madhukar Thota
Thanks Joey, It worked. Do you know how to control the parquet file size when it writes to S3. I see lot of small files to s3. Is it possible to right either 512mb or 1GB size file? On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee wrote: > PutParquet doesn't have the AWS S3

Re: PutParquet with S3

2017-12-05 Thread Joey Frazee
PutParquet doesn't have the AWS S3 SDK included in it itself but it provides an "Additional Classpath Resources" property that you need to point at a directory with all the S3 dependencies. I just tested this the other day with the following jars: aws-java-sdk-1.7.4.jar hadoop-aws-2.7.3.jar

PutParquet with S3

2017-12-05 Thread Madhukar Thota
Hi Is it possible to use PutParquet processor to write files into S3? I tried by setting s3 bucket in core-site.xml file but i am getting *No FileSystem for scheme: s3a* *core-site.xml* fs.defaultFS s3a://testing fs.s3a.access.key fs.s3a.secret.key