Hive can not easily handle updates. The most creative way I saw this done was someone managed to capture all updates and then use union queries which rewrote the same hive table with the newest value.
original + union delta + column with latest timestamp = new original But that is a lot of processing especially when you may not have man updates. Hive has storage handlers that let you lay a table over hbase and cassandra data. Store your data in those systems, they take updates, then use hive to query those. On Mon, Dec 24, 2012 at 9:29 AM, Ibrahim Yakti <iya...@souq.com> wrote: > Edward can you explain more please? you suggesting that I should use HBase > for such tasks instead of hive? > > > -- > Ibrahim > > > On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> You can only do the last_update idea if this is an insert only dataset. >> >> If your table takes updates you need a different strategy. >> 1) full dumps every interval. >> 2) Using a storage handler like hbase or cassandra that takes update >> operations >> >> >> >> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka < >> jeremiah.pesc...@gmail.com> wrote: >> >>> If it were me, I would find a way to identify the partitions that have >>> modified data and then re-load a subset of the partitions (only the ones >>> with changes) on a regular basis. Instead of updating/deleting data, you'll >>> be re-loading specific partitions as an all or nothing action. >>> >>> On Monday, December 24, 2012, Ibrahim Yakti wrote: >>> >>>> This already done, but Hive does not support update nor deletion of >>>> data, so when I import the data after specific "last_update_time" records, >>>> hive will append it not replace. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>> You can use Apache Oozie to schedule your imports. >>>> >>>> Alternatively, you can have an additional column in your SQL table, say >>>> LastUpdatedTime or something. As soon as there is a change in this column >>>> you can start the import from this point. This way you don't have to import >>>> all the things everytime there is a change in your table. You just have to >>>> move only the most recent data, say only the 'delta' amount of data. >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>>> >>>> My question was how to reflect MySQL updates to hadoop/hive, this is >>>> our problem now. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>> Cool. Then go ahead :) >>>> >>>> Just in case you need something in realtime, you can have a look at >>>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>>> >>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with >>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>>> computing, as I said we want to use Hive for analytical queries. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>> Hello Ibrahim, >>>> >>>> A quick questio. Are you planning to replace your SQL DB with >>>> Hive? If that is the case, I would not suggest to do that. Both are meant >>>> for entirely different purposes. Hive is for batch processing and not for >>>> real time system. So if you are requirements involve real time things, you >>>> need to think before moving ahead. >>>> >>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>> >>>> HTH >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>>> >>>> Hi All, >>>> >>>> We are new to hadoop and hive, we are trying to use hive to >>>> run analytical queries and we are using sqoop to import data into hive, in >>>> our RDBMS the data updated very frequently and this needs to be reflected >>>> to hive. Hive does not support update/delete but there are many workarounds >>>> to do this task. >>>> >>>> What's in our mind is importing all the >>>> >>>> >>> >>> -- >>> --- >>> Jeremiah Peschka >>> Founder, Brent Ozar Unlimited >>> Microsoft SQL Server MVP >>> >>> >> >