Take a look at the MergeRecord processor, you can use that before PutParquet to create the appropriately sized files.
On Tue, Dec 5, 2017 at 10:36 PM Madhukar Thota <[email protected]> wrote: > Thanks Joey, > > It worked. Do you know how to control the parquet file size when it writes > to S3. I see lot of small files to s3. Is it possible to right either 512mb > or 1GB size file? > > > On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee <[email protected]> > wrote: > >> PutParquet doesn't have the AWS S3 SDK included in it itself but it >> provides an "Additional Classpath Resources" property that you need to >> point at a directory with all the S3 dependencies. I just tested this the >> other day with the following jars: >> >> aws-java-sdk-1.7.4.jar >> hadoop-aws-2.7.3.jar >> hadoop-common-2.7.3.jar >> httpclient-4.5.3.jar >> httpcore-4.4.4.jar >> jackson-annotations-2.6.0.jar >> jackson-core-2.6.1.jar >> jackson-databind-2.6.1.jar >> >> So just grab those from maven central and you should be good to go. >> >> -joey >> >> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota <[email protected]>, >> wrote: >> >> Hi >> >> Is it possible to use PutParquet processor to write files into S3? I >> tried by setting s3 bucket in core-site.xml file but i am getting *No >> FileSystem for scheme: s3a* >> >> *core-site.xml* >> >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> <!-- >> Licensed to the Apache Software Foundation (ASF) under one or more >> contributor license agreements. See the NOTICE file distributed with >> this work for additional information regarding copyright ownership. >> The ASF licenses this file to You under the Apache License, Version 2.0 >> (the "License"); you may not use this file except in compliance with >> the License. You may obtain a copy of the License at >> http://www.apache.org/licenses/LICENSE-2.0 >> Unless required by applicable law or agreed to in writing, software >> distributed under the License is distributed on an "AS IS" BASIS, >> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> See the License for the specific language governing permissions and >> limitations under the License. >> --> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>fs.defaultFS</name> >> <value>s3a://testing</value> >> </property> >> <property> >> <name>fs.s3a.access.key</name> >> <value>xxxxxxxxxxxxxxxx</value> >> </property> >> <property> >> <name>fs.s3a.secret.key</name> >> <value>xxxxxxxxxxxxxxxxxxx</value> >> </property> >> </configuration> >> >> > -- Sent from Gmail Mobile
