Re: Write Streaming data to S3 in Parquet files

Harshvardhan Shinde Sun, 26 Sep 2021 11:37:06 -0700

Hi,

Thanks for the response.


How can this streaming data be written to S3 for the path to be given?
Also I see that the FileSink takes GenericRecord, so how can the DataStream
be converted to a GenericRecord?

Please bear with me if my questions don't make any sense.

On Sun, Sep 26, 2021 at 9:12 AM Guowei Ma <guowei....@gmail.com> wrote:

> Hi, Harshvardhan
>
> I think CaiZhi is right.
> I only have a small addition. Because I see that you want to convert Table
> to DataStream, you can look at FileSink (ParquetWriterFactory)[1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/file_sink/#bulk-encoded-formats
>
> Best,
> Guowei
>
>
> On Sun, Sep 26, 2021 at 10:31 AM Caizhi Weng <tsreape...@gmail.com> wrote:
>
>> Hi!
>>
>> Try the PARTITIONED BY clause. See
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/formats/parquet/
>>
>> Harshvardhan Shinde <harshvardhan.shi...@oyorooms.com> 于2021年9月24日周五
>> 下午5:52写道：
>>
>>> Hi,
>>> I wanted to know if we can write streaming data to S3 in parquet format
>>> with partitioning.
>>> Here's what I want to achieve:
>>> I have a kafka table which gets updated with the data from kafka topic
>>> and I'm using select statement to get the data into a Table and converting
>>> into a stream as:
>>>
>>> StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
>>> Table table = tableEnv.sqlQuery("Select * from test");
>>> DataStream<Row> stream = tableEnv.toDataStream(table);
>>>
>>> Now I want to write this stream to S3 in parquet files with hourly
>>> partitions.
>>>
>>> Here are my questions:
>>> 1. Is this possible?
>>> 2. If yes, how it can be achieved or link to appropriate documentation.
>>>
>>> Thanks and Regards,
>>> Harshvardhan
>>>
>>>

-- 
Thanks and Regards,
Harshvardhan
Data Platform

Re: Write Streaming data to S3 in Parquet files

Reply via email to