Re: Reflect MySQL updates into Hive

Edward Capriolo Mon, 24 Dec 2012 06:37:41 -0800

Hive can not easily handle updates. The most creative way I saw this done
was someone managed to capture all updates and then use union queries which
rewrote the same hive table with the newest value.


original + union delta + column with latest timestamp = new original

But that is a lot of processing especially when you may not have man
updates. Hive has storage handlers that let you lay a table over hbase and
cassandra data. Store your data in those systems, they take updates, then
use hive to query those.

On Mon, Dec 24, 2012 at 9:29 AM, Ibrahim Yakti <iya...@souq.com> wrote:

> Edward can you explain more please? you suggesting that I should use HBase
> for such tasks instead of hive?
>
>
> --
> Ibrahim
>
>
> On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
>
>> You can only do the last_update idea if this is an insert only dataset.
>>
>> If your table takes updates you need a different strategy.
>> 1) full dumps every interval.
>> 2) Using a storage handler like hbase or cassandra that takes update
>> operations
>>
>>
>>
>> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
>> jeremiah.pesc...@gmail.com> wrote:
>>
>>> If it were me, I would find a way to identify the partitions that have
>>> modified data and then re-load a subset of the partitions (only the ones
>>> with changes) on a regular basis. Instead of updating/deleting data, you'll
>>> be re-loading specific partitions as an all or nothing action.
>>>
>>> On Monday, December 24, 2012, Ibrahim Yakti wrote:
>>>
>>>> This already done, but Hive does not support update nor deletion of
>>>> data, so when I import the data after specific "last_update_time" records,
>>>> hive will append it not replace.
>>>>
>>>>
>>>> --
>>>> Ibrahim
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>>
>>>> You can use Apache Oozie to schedule your imports.
>>>>
>>>> Alternatively, you can have an additional column in your SQL table, say
>>>> LastUpdatedTime or something. As soon as there is a change in this column
>>>> you can start the import from this point. This way you don't have to import
>>>> all the things everytime there is a change in your table. You just have to
>>>> move only the most recent data, say only the 'delta' amount of data.
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>>>
>>>> My question was how to reflect MySQL updates to hadoop/hive, this is
>>>> our problem now.
>>>>
>>>>
>>>> --
>>>> Ibrahim
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>>
>>>> Cool. Then go ahead :)
>>>>
>>>> Just in case you need something in realtime, you can have a look at
>>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>>>
>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with
>>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>>> computing, as I said we want to use Hive for analytical queries.
>>>>
>>>>
>>>> --
>>>> Ibrahim
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>>
>>>> Hello Ibrahim,
>>>>
>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>> for entirely different purposes. Hive is for batch processing and not for
>>>> real time system. So if you are requirements involve real time things, you
>>>> need to think before moving ahead.
>>>>
>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>>>
>>>> HTH
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> We are new to hadoop and hive, we are trying to use hive to
>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>> to hive. Hive does not support update/delete but there are many workarounds
>>>> to do this task.
>>>>
>>>> What's in our mind is importing all the
>>>>
>>>>
>>>
>>> --
>>> ---
>>> Jeremiah Peschka
>>> Founder, Brent Ozar Unlimited
>>> Microsoft SQL Server MVP
>>>
>>>
>>
>

Re: Reflect MySQL updates into Hive

Reply via email to