Take a look at the MergeRecord processor, you can use that before
PutParquet to create the appropriately sized files.

On Tue, Dec 5, 2017 at 10:36 PM Madhukar Thota <[email protected]>
wrote:

> Thanks Joey,
>
> It worked. Do you know how to control the parquet file size when it writes
> to S3. I see lot of small files to s3. Is it possible to right either 512mb
> or 1GB size file?
>
>
> On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee <[email protected]>
> wrote:
>
>> PutParquet doesn't have the AWS S3 SDK included in it itself but it
>> provides an "Additional Classpath Resources" property that you need to
>> point at a directory with all the S3 dependencies. I just tested this the
>> other day with the following jars:
>>
>> aws-java-sdk-1.7.4.jar
>> hadoop-aws-2.7.3.jar
>> hadoop-common-2.7.3.jar
>> httpclient-4.5.3.jar
>> httpcore-4.4.4.jar
>> jackson-annotations-2.6.0.jar
>> jackson-core-2.6.1.jar
>> jackson-databind-2.6.1.jar
>>
>> So just grab those from maven central and you should be good to go.
>>
>> -joey
>>
>> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota <[email protected]>,
>> wrote:
>>
>> Hi
>>
>> Is it possible to use PutParquet processor to write files into S3? I
>> tried by setting s3 bucket in core-site.xml file but i am getting *No
>> FileSystem for scheme: s3a*
>>
>> *core-site.xml*
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <!--
>> Licensed to the Apache Software Foundation (ASF) under one or more
>> contributor license agreements. See the NOTICE file distributed with
>> this work for additional information regarding copyright ownership.
>> The ASF licenses this file to You under the Apache License, Version 2.0
>> (the "License"); you may not use this file except in compliance with
>> the License. You may obtain a copy of the License at
>> http://www.apache.org/licenses/LICENSE-2.0
>> Unless required by applicable law or agreed to in writing, software
>> distributed under the License is distributed on an "AS IS" BASIS,
>> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> See the License for the specific language governing permissions and
>> limitations under the License.
>> -->
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>> <property>
>> <name>fs.defaultFS</name>
>> <value>s3a://testing</value>
>> </property>
>> <property>
>> <name>fs.s3a.access.key</name>
>> <value>xxxxxxxxxxxxxxxx</value>
>> </property>
>> <property>
>> <name>fs.s3a.secret.key</name>
>> <value>xxxxxxxxxxxxxxxxxxx</value>
>> </property>
>> </configuration>
>>
>>
> --
Sent from Gmail Mobile

Reply via email to