BucketingSink vs StreamingFileSink

Edward Rojas Fri, 16 Nov 2018 02:32:20 -0800

Hello,
We are currently using Flink 1.5 and we use the BucketingSink to save the
result of job processing to HDFS.
The data is in JSON format and we store one object per line in the resulting
files.


We are planning to upgrade to Flink 1.6 and we see that there is this new
StreamingFileSink,  from the description it looks very similar to
BucketingSink when using Row-encoded Output Format, my question is, should
we consider to move to StreamingFileSink?

I would like to better understand what are the suggested use cases for each
of the two options now (?)

We are also considering to additionally output the data in Parquet format
for data scientists (to be stored in HDFS as well), for this I see some
utils to work with StreamingFileSink, so I guess for this case it's
recommended to use that option(?).
Is it possible to use the Parquet writers even when the schema of the data
may evolve ?

Thanks in advance for your help.
(Sorry if I put too many questions in the same message)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

BucketingSink vs StreamingFileSink

Reply via email to