What if you have many columns that need to be updated? a simple example: confirmation date, payment status(es) + status update time, delivery, ... etc ? on what base you will set your partition and how the old data will be removed because the updated data will be reloaded in other partition if I partition using payment status for example.
-- Ibrahim On Mon, Dec 24, 2012 at 5:25 PM, Mohammad Tariq <donta...@gmail.com> wrote: > I was actually trying to answer you actual questions. What are you > currently doing to tackle this update problem and what kind of tweak you > are looking for?There is no direct solution to achieve this, > out-of-the-box, as you have said. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 7:38 PM, Ibrahim Yakti <iya...@souq.com> wrote: > >> This already done, but Hive does not support update nor deletion of data, >> so when I import the data after specific "last_update_time" records, hive >> will append it not replace. >> >> >> -- >> Ibrahim >> >> >> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote: >> >>> You can use Apache Oozie to schedule your imports. >>> >>> Alternatively, you can have an additional column in your SQL table, say >>> LastUpdatedTime or something. As soon as there is a change in this column >>> you can start the import from this point. This way you don't have to import >>> all the things everytime there is a change in your table. You just have to >>> move only the most recent data, say only the 'delta' amount of data. >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>> >>>> My question was how to reflect MySQL updates to hadoop/hive, this is >>>> our problem now. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>>> Cool. Then go ahead :) >>>>> >>>>> Just in case you need something in realtime, you can have a look at >>>>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>> >>>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS >>>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>>>>> computing, as I said we want to use Hive for analytical queries. >>>>>> >>>>>> >>>>>> -- >>>>>> Ibrahim >>>>>> >>>>>> >>>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq >>>>>> <donta...@gmail.com>wrote: >>>>>> >>>>>>> Hello Ibrahim, >>>>>>> >>>>>>> A quick questio. Are you planning to replace your SQL DB with >>>>>>> Hive? If that is the case, I would not suggest to do that. Both are >>>>>>> meant >>>>>>> for entirely different purposes. Hive is for batch processing and not >>>>>>> for >>>>>>> real time system. So if you are requirements involve real time things, >>>>>>> you >>>>>>> need to think before moving ahead. >>>>>>> >>>>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Best Regards, >>>>>>> Tariq >>>>>>> +91-9741563634 >>>>>>> https://mtariq.jux.com/ >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>>>> run analytical queries and we are using sqoop to import data into >>>>>>>> hive, in >>>>>>>> our RDBMS the data updated very frequently and this needs to be >>>>>>>> reflected >>>>>>>> to hive. Hive does not support update/delete but there are many >>>>>>>> workarounds >>>>>>>> to do this task. >>>>>>>> >>>>>>>> What's in our mind is importing all the tables into hive as is, >>>>>>>> then we build the required tables for reporting. >>>>>>>> >>>>>>>> My questions are: >>>>>>>> >>>>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>>>> minimal resources? >>>>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>>>> 3. Is Hive the right tool to do this kind of queries or we >>>>>>>> should search for alternatives? >>>>>>>> >>>>>>>> Any hint will be useful, thanks in advanced. >>>>>>>> >>>>>>>> -- >>>>>>>> Ibrahim >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >