Thanks Mohammad, I will be waiting ... meanwhile, seems I will get into
HBase and give it a try ... unless someone advised with something
better/easier.
--
Ibrahim
On Wed, Dec 26, 2012 at 5:52 PM, Mohammad Tariq wrote:
> Hello Ibrahim,
>
>Sorry for the late response. Those replies
Hello Ibrahim,
Sorry for the late response. Those replies were for Kshiva. I
saw his question(exactly same as this one) multiple times on Pig mailing
list as well, so just thought of giving some pointers to him on how to use
the list. I should have specified it properly. Apologies for c
After more reading, a suggested scenario looks like:
MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
Queries ---> Update HBase as needed & Reload in Hive.
What do you think please?
--
Ibrahim
On We
Mohammad, I am not sure if the answers & the link were to me or to Kshiva's
question.
if I have partitioned my data based on status for example, when I run the
update query it will add the updated data on a new partition (success or
shipped for example) and it will keep the old data (confirmed or
Also, have a look at this :
http://www.catb.org/~esr/faqs/smart-questions.html
Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq wrote:
> Have a look at Beeswax.
>
> BTW, do you have access to Google at your station?Same question on the
Have a look at Beeswax.
BTW, do you have access to Google at your station?Same question on the Pig
mailing list as well, that too twice.
Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps wrote:
> Hi,
>
> Is there any Hive editors and where
Hi,
Is there any Hive editors and where we can write 100 to 150 Hive scripts
I'm believing is not essay to do in CLI mode all scripts .
Like IDE for JAVA /TOAD for SQL pls advice , many thanks
Thanks
On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
dean.wamp...@thinkbiganalytics.com> wrote:
>
My problem is in eliminating the duplicates and only keep the correct data,
any advise please?
On Dec 24, 2012 9:13 PM, "Dean Wampler"
wrote:
> Looks good, but a few suggestions. If you can eliminate duplicates, etc.
> as you ingest the data into HDFS, that would eliminate a cleansing step.
> Not
Looks good, but a few suggestions. If you can eliminate duplicates, etc. as
you ingest the data into HDFS, that would eliminate a cleansing step. Note
that if the target directory in HDFS IS the specified location for an
external Hive table/partition, then there will be no separate step to "load
in
Thanks Dean for the great reply, setting incremental import should be easy,
if I partitioned my data how hive will get me the updated rows only
considering that the row may have multiple fields that will be updated over
time? and how will I manage the tables that based on multiple sources? and
do y
This is not as hard as it sounds. The hardest part is setting up the
incremental query against your MySQL database. Then you can write the
results to new files in the HDFS directory for the table and Hive will see
them immediately. Yes, even though Hive doesn't support updates, it doesn't
care how
Bottom line: use sqoop to import data into HBase/Cassandra for storage and
use Hive to query the data using external tables, did I miss anything?
--
Ibrahim
On Mon, Dec 24, 2012 at 5:37 PM, Edward Capriolo wrote:
> Hive can not easily handle updates. The most creative way I saw this done
> was
Hive can not easily handle updates. The most creative way I saw this done
was someone managed to capture all updates and then use union queries which
rewrote the same hive table with the newest value.
original + union delta + column with latest timestamp = new original
But that is a lot of proces
Good points by Edward. I specially love the point no. 2.
Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Mon, Dec 24, 2012 at 7:58 PM, Edward Capriolo wrote:
> You can only do the last_update idea if this is an insert only dataset.
>
> If your table takes updates you need a differ
Edward can you explain more please? you suggesting that I should use HBase
for such tasks instead of hive?
--
Ibrahim
On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo wrote:
> You can only do the last_update idea if this is an insert only dataset.
>
> If your table takes updates you need a dif
What if you have many columns that need to be updated? a simple example:
confirmation date, payment status(es) + status update time, delivery, ...
etc ? on what base you will set your partition and how the old data will
be removed because the updated data will be reloaded in other partition if
I p
You can only do the last_update idea if this is an insert only dataset.
If your table takes updates you need a different strategy.
1) full dumps every interval.
2) Using a storage handler like hbase or cassandra that takes update
operations
On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
jer
I was actually trying to answer you actual questions. What are you
currently doing to tackle this update problem and what kind of tweak you
are looking for?There is no direct solution to achieve this,
out-of-the-box, as you have said.
Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On
If it were me, I would find a way to identify the partitions that have
modified data and then re-load a subset of the partitions (only the ones
with changes) on a regular basis. Instead of updating/deleting data, you'll
be re-loading specific partitions as an all or nothing action.
On Monday, Dece
This already done, but Hive does not support update nor deletion of data,
so when I import the data after specific "last_update_time" records, hive
will append it not replace.
--
Ibrahim
On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq wrote:
> You can use Apache Oozie to schedule your imports
You can use Apache Oozie to schedule your imports.
Alternatively, you can have an additional column in your SQL table, say
LastUpdatedTime or something. As soon as there is a change in this column
you can start the import from this point. This way you don't have to import
all the things everytime
My question was how to reflect MySQL updates to hadoop/hive, this is our
problem now.
--
Ibrahim
On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq wrote:
> Cool. Then go ahead :)
>
> Just in case you need something in realtime, you can have a look at
> Impala.(I know nobody likes to get preache
Cool. Then go ahead :)
Just in case you need something in realtime, you can have a look at
Impala.(I know nobody likes to get preached, but just in case ;) ).
Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti wrote:
> Thanks Mohammad, No
Thanks Mohammad, No, we do not have any plans to replace our RDBMS with
Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
computing, as I said we want to use Hive for analytical queries.
--
Ibrahim
On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq wrote:
> Hello Ibrahim,
>
>
Hello Ibrahim,
A quick questio. Are you planning to replace your SQL DB with Hive? If
that is the case, I would not suggest to do that. Both are meant for
entirely different purposes. Hive is for batch processing and not for real
time system. So if you are requirements involve real time thing
25 matches
Mail list logo