Re: PutParquet with S3

2017-12-05 Thread Bryan Bende
Take a look at the MergeRecord processor, you can use that before
PutParquet to create the appropriately sized files.

On Tue, Dec 5, 2017 at 10:36 PM Madhukar Thota 
wrote:

> Thanks Joey,
>
> It worked. Do you know how to control the parquet file size when it writes
> to S3. I see lot of small files to s3. Is it possible to right either 512mb
> or 1GB size file?
>
>
> On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee 
> wrote:
>
>> PutParquet doesn't have the AWS S3 SDK included in it itself but it
>> provides an "Additional Classpath Resources" property that you need to
>> point at a directory with all the S3 dependencies. I just tested this the
>> other day with the following jars:
>>
>> aws-java-sdk-1.7.4.jar
>> hadoop-aws-2.7.3.jar
>> hadoop-common-2.7.3.jar
>> httpclient-4.5.3.jar
>> httpcore-4.4.4.jar
>> jackson-annotations-2.6.0.jar
>> jackson-core-2.6.1.jar
>> jackson-databind-2.6.1.jar
>>
>> So just grab those from maven central and you should be good to go.
>>
>> -joey
>>
>> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota ,
>> wrote:
>>
>> Hi
>>
>> Is it possible to use PutParquet processor to write files into S3? I
>> tried by setting s3 bucket in core-site.xml file but i am getting *No
>> FileSystem for scheme: s3a*
>>
>> *core-site.xml*
>>
>> 
>> 
>> 
>>
>> 
>>
>> 
>> 
>> fs.defaultFS
>> s3a://testing
>> 
>> 
>> fs.s3a.access.key
>> 
>> 
>> 
>> fs.s3a.secret.key
>> xxx
>> 
>> 
>>
>>
> --
Sent from Gmail Mobile


Re: PutParquet with S3

2017-12-05 Thread Madhukar Thota
Thanks Joey,

It worked. Do you know how to control the parquet file size when it writes
to S3. I see lot of small files to s3. Is it possible to right either 512mb
or 1GB size file?


On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee  wrote:

> PutParquet doesn't have the AWS S3 SDK included in it itself but it
> provides an "Additional Classpath Resources" property that you need to
> point at a directory with all the S3 dependencies. I just tested this the
> other day with the following jars:
>
> aws-java-sdk-1.7.4.jar
> hadoop-aws-2.7.3.jar
> hadoop-common-2.7.3.jar
> httpclient-4.5.3.jar
> httpcore-4.4.4.jar
> jackson-annotations-2.6.0.jar
> jackson-core-2.6.1.jar
> jackson-databind-2.6.1.jar
>
> So just grab those from maven central and you should be good to go.
>
> -joey
>
> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota ,
> wrote:
>
> Hi
>
> Is it possible to use PutParquet processor to write files into S3? I tried
> by setting s3 bucket in core-site.xml file but i am getting *No
> FileSystem for scheme: s3a*
>
> *core-site.xml*
>
> 
> 
> 
>
> 
>
> 
> 
> fs.defaultFS
> s3a://testing
> 
> 
> fs.s3a.access.key
> 
> 
> 
> fs.s3a.secret.key
> xxx
> 
> 
>
>


Re: PutParquet with S3

2017-12-05 Thread Joey Frazee
PutParquet doesn't have the AWS S3 SDK included in it itself but it provides an 
"Additional Classpath Resources" property that you need to point at a directory 
with all the S3 dependencies. I just tested this the other day with the 
following jars:

aws-java-sdk-1.7.4.jar
hadoop-aws-2.7.3.jar
hadoop-common-2.7.3.jar
httpclient-4.5.3.jar
httpcore-4.4.4.jar
jackson-annotations-2.6.0.jar
jackson-core-2.6.1.jar
jackson-databind-2.6.1.jar

So just grab those from maven central and you should be good to go.

-joey

On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota , wrote:
> Hi
>
> Is it possible to use PutParquet processor to write files into S3? I tried by 
> setting s3 bucket in core-site.xml file but i am gettingĀ No FileSystem for 
> scheme: s3a
>
> core-site.xml
>
> 
> 
> 
>
> 
>
> 
> 
> fs.defaultFS
> s3a://testing
> 
> 
> fs.s3a.access.key
> 
> 
> 
> fs.s3a.secret.key
> xxx
> 
> 
>


PutParquet with S3

2017-12-05 Thread Madhukar Thota
Hi

Is it possible to use PutParquet processor to write files into S3? I tried
by setting s3 bucket in core-site.xml file but i am getting *No FileSystem
for scheme: s3a*

*core-site.xml*









fs.defaultFS
s3a://testing


fs.s3a.access.key



fs.s3a.secret.key
xxx