Bottom line: use sqoop to import data into HBase/Cassandra for storage and use Hive to query the data using external tables, did I miss anything?
-- Ibrahim On Mon, Dec 24, 2012 at 5:37 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Hive can not easily handle updates. The most creative way I saw this done > was someone managed to capture all updates and then use union queries which > rewrote the same hive table with the newest value. > > original + union delta + column with latest timestamp = new original > > But that is a lot of processing especially when you may not have man > updates. Hive has storage handlers that let you lay a table over hbase and > cassandra data. Store your data in those systems, they take updates, then > use hive to query those. > > > On Mon, Dec 24, 2012 at 9:29 AM, Ibrahim Yakti <iya...@souq.com> wrote: > >> Edward can you explain more please? you suggesting that I should use >> HBase for such tasks instead of hive? >> >> >> -- >> Ibrahim >> >> >> On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> You can only do the last_update idea if this is an insert only dataset. >>> >>> If your table takes updates you need a different strategy. >>> 1) full dumps every interval. >>> 2) Using a storage handler like hbase or cassandra that takes update >>> operations >>> >>> >>> >>> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka < >>> jeremiah.pesc...@gmail.com> wrote: >>> >>>> If it were me, I would find a way to identify the partitions that have >>>> modified data and then re-load a subset of the partitions (only the ones >>>> with changes) on a regular basis. Instead of updating/deleting data, you'll >>>> be re-loading specific partitions as an all or nothing action. >>>> >>>> On Monday, December 24, 2012, Ibrahim Yakti wrote: >>>> >>>>> This already done, but Hive does not support update nor deletion of >>>>> data, so when I import the data after specific "last_update_time" records, >>>>> hive will append it not replace. >>>>> >>>>> >>>>> -- >>>>> Ibrahim >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>>> >>>>> You can use Apache Oozie to schedule your imports. >>>>> >>>>> Alternatively, you can have an additional column in your SQL table, >>>>> say LastUpdatedTime or something. As soon as there is a change in this >>>>> column you can start the import from this point. This way you don't have >>>>> to >>>>> import all the things everytime there is a change in your table. You just >>>>> have to move only the most recent data, say only the 'delta' amount of >>>>> data. >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>> >>>>> My question was how to reflect MySQL updates to hadoop/hive, this is >>>>> our problem now. >>>>> >>>>> >>>>> -- >>>>> Ibrahim >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>>> >>>>> Cool. Then go ahead :) >>>>> >>>>> Just in case you need something in realtime, you can have a look at >>>>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>> >>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS >>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>>>> computing, as I said we want to use Hive for analytical queries. >>>>> >>>>> >>>>> -- >>>>> Ibrahim >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>>> >>>>> Hello Ibrahim, >>>>> >>>>> A quick questio. Are you planning to replace your SQL DB with >>>>> Hive? If that is the case, I would not suggest to do that. Both are meant >>>>> for entirely different purposes. Hive is for batch processing and not for >>>>> real time system. So if you are requirements involve real time things, you >>>>> need to think before moving ahead. >>>>> >>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>>> >>>>> HTH >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>> >>>>> Hi All, >>>>> >>>>> We are new to hadoop and hive, we are trying to use hive to >>>>> run analytical queries and we are using sqoop to import data into hive, in >>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>> to hive. Hive does not support update/delete but there are many >>>>> workarounds >>>>> to do this task. >>>>> >>>>> What's in our mind is importing all the >>>>> >>>>> >>>> >>>> -- >>>> --- >>>> Jeremiah Peschka >>>> Founder, Brent Ozar Unlimited >>>> Microsoft SQL Server MVP >>>> >>>> >>> >> >