Have you looked at Hive's streaming ingest?
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
It is designed for this case, though it only handles insert (not update),
so if you need updates you'd have to do the merge as you are currently
doing.

Alan.

On Mon, Mar 11, 2019 at 2:09 PM David Morin <morin.david....@gmail.com>
wrote:

> Hello,
>
> I've just implemented a pipeline based on Apache Flink to synchronize data 
> between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink 
> jobs run on Yarn.
> I've used Orc files but without ACID properties.
> Then, we've created external tables on these hdfs directories that contain
> these delta Orc files.
> Then, MERGE INTO queries are executed periodically to merge data into the
> Hive target table.
> It works pretty well but we want to avoid the use of these Merge queries.
> How can I update Orc files directly from my Flink job ?
>
> Thanks,
> David
>
>

Reply via email to