Re: PutParquet with S3
Take a look at the MergeRecord processor, you can use that before PutParquet to create the appropriately sized files. On Tue, Dec 5, 2017 at 10:36 PM Madhukar Thotawrote: > Thanks Joey, > > It worked. Do you know how to control the parquet file size when it writes > to S3. I see lot of small files to s3. Is it possible to right either 512mb > or 1GB size file? > > > On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee > wrote: > >> PutParquet doesn't have the AWS S3 SDK included in it itself but it >> provides an "Additional Classpath Resources" property that you need to >> point at a directory with all the S3 dependencies. I just tested this the >> other day with the following jars: >> >> aws-java-sdk-1.7.4.jar >> hadoop-aws-2.7.3.jar >> hadoop-common-2.7.3.jar >> httpclient-4.5.3.jar >> httpcore-4.4.4.jar >> jackson-annotations-2.6.0.jar >> jackson-core-2.6.1.jar >> jackson-databind-2.6.1.jar >> >> So just grab those from maven central and you should be good to go. >> >> -joey >> >> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota , >> wrote: >> >> Hi >> >> Is it possible to use PutParquet processor to write files into S3? I >> tried by setting s3 bucket in core-site.xml file but i am getting *No >> FileSystem for scheme: s3a* >> >> *core-site.xml* >> >> >> >> >> >> >> >> >> >> fs.defaultFS >> s3a://testing >> >> >> fs.s3a.access.key >> >> >> >> fs.s3a.secret.key >> xxx >> >> >> >> > -- Sent from Gmail Mobile
Re: PutParquet with S3
Thanks Joey, It worked. Do you know how to control the parquet file size when it writes to S3. I see lot of small files to s3. Is it possible to right either 512mb or 1GB size file? On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazeewrote: > PutParquet doesn't have the AWS S3 SDK included in it itself but it > provides an "Additional Classpath Resources" property that you need to > point at a directory with all the S3 dependencies. I just tested this the > other day with the following jars: > > aws-java-sdk-1.7.4.jar > hadoop-aws-2.7.3.jar > hadoop-common-2.7.3.jar > httpclient-4.5.3.jar > httpcore-4.4.4.jar > jackson-annotations-2.6.0.jar > jackson-core-2.6.1.jar > jackson-databind-2.6.1.jar > > So just grab those from maven central and you should be good to go. > > -joey > > On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota , > wrote: > > Hi > > Is it possible to use PutParquet processor to write files into S3? I tried > by setting s3 bucket in core-site.xml file but i am getting *No > FileSystem for scheme: s3a* > > *core-site.xml* > > > > > > > > > > fs.defaultFS > s3a://testing > > > fs.s3a.access.key > > > > fs.s3a.secret.key > xxx > > > >
Re: PutParquet with S3
PutParquet doesn't have the AWS S3 SDK included in it itself but it provides an "Additional Classpath Resources" property that you need to point at a directory with all the S3 dependencies. I just tested this the other day with the following jars: aws-java-sdk-1.7.4.jar hadoop-aws-2.7.3.jar hadoop-common-2.7.3.jar httpclient-4.5.3.jar httpcore-4.4.4.jar jackson-annotations-2.6.0.jar jackson-core-2.6.1.jar jackson-databind-2.6.1.jar So just grab those from maven central and you should be good to go. -joey On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota, wrote: > Hi > > Is it possible to use PutParquet processor to write files into S3? I tried by > setting s3 bucket in core-site.xml file but i am gettingĀ No FileSystem for > scheme: s3a > > core-site.xml > > > > > > > > > > fs.defaultFS > s3a://testing > > > fs.s3a.access.key > > > > fs.s3a.secret.key > xxx > > >
PutParquet with S3
Hi Is it possible to use PutParquet processor to write files into S3? I tried by setting s3 bucket in core-site.xml file but i am getting *No FileSystem for scheme: s3a* *core-site.xml* fs.defaultFS s3a://testing fs.s3a.access.key fs.s3a.secret.key xxx