Re: How to update Hive ACID tables in Flink

Alan Gates Tue, 12 Mar 2019 14:52:08 -0700

That's the old (Hive 2) version of ACID.  In the newer version (Hive 3)
there's no update, just insert and delete (update is insert + delete).  If
you're working against Hive 2 what you have is what you want.  If you're
working against Hive 3 you'll need the newer stuff.


Alan.

On Tue, Mar 12, 2019 at 12:24 PM David Morin <morin.david....@gmail.com>
wrote:

> Thanks Alan.
> Yes, the problem is fact was that this streaming API does not handle
> update and delete.
> I've used native Orc files and the next step I've planned to do is the use
> of ACID support as described here: https://orc.apache.org/docs/acid.html
> The INSERT/UPDATE/DELETE seems to be implemented:
> OPERATIONSERIALIZATION
> INSERT 0
> UPDATE 1
> DELETE 2
> Do you think this approach is suitable ?
>
>
>
> Le mar. 12 mars 2019 à 19:30, Alan Gates <alanfga...@gmail.com> a écrit :
>
>> Have you looked at Hive's streaming ingest?
>> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
>> It is designed for this case, though it only handles insert (not update),
>> so if you need updates you'd have to do the merge as you are currently
>> doing.
>>
>> Alan.
>>
>> On Mon, Mar 11, 2019 at 2:09 PM David Morin <morin.david....@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I've just implemented a pipeline based on Apache Flink to synchronize data 
>>> between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink 
>>> jobs run on Yarn.
>>> I've used Orc files but without ACID properties.
>>> Then, we've created external tables on these hdfs directories that contain
>>> these delta Orc files.
>>> Then, MERGE INTO queries are executed periodically to merge data into the
>>> Hive target table.
>>> It works pretty well but we want to avoid the use of these Merge queries.
>>> How can I update Orc files directly from my Flink job ?
>>>
>>> Thanks,
>>> David
>>>
>>>

Re: How to update Hive ACID tables in Flink

Reply via email to