Re: on duplicate update equivalent?

2016-09-24 Thread Damien Carol
Another solution is to use HIVE over HBase. When you insert in this table, HIVE do an upsert. 2016-09-23 21:00 GMT+02:00 Mich Talebzadeh : > The fundamental question is: do you need these recurring updates to > dimension tables throttling your Hive tables. > >

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
The fundamental question is: do you need these recurring updates to dimension tables throttling your Hive tables. Besides why bother with ETL when one can do ELT. For dimension table just add two additional columns namely , op_type int , op_time timestamp op_type = 1/2/3

Re: on duplicate update equivalent?

2016-09-23 Thread Gopal Vijayaraghavan
> Dimensions change, and I'd rather do update than recreate a snapshot. Slow changing dimensions are the common use-case for Hive's ACID MERGE. The feature you need is most likely covered by https://issues.apache.org/jira/browse/HIVE-10924 2nd comment from that JIRA "Once an hour, a set of

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
Hi Vijay, If dimensional tables are reasonable size and frequently updated, then you can deploy *Spark SQL* to get data directly from your MySQL table through JDBC and do your join with your fact table stored in Hive. In general these days one can do better with Spark SQL. Your fact table still

RE: on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
handran [mailto:vi...@linkedin.com] > *Sent:* Friday, September 23, 2016 1:46 PM > *To:* user@hive.apache.org > *Subject:* Re: on duplicate update equivalent? > > > > > > On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
If these are dimension tables, what do you need to update there? Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 1:46 PM To: user@hive.apache.org Subject: Re: on duplicate update equivalent? On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh

Re: on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh wrote: > What is the use case for UPSERT in Hive. The functionality does not exist > but there are other solutions. > > Are we talking about a set of dimension tables with primary keys hat need > to be updated (existing

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
) from src as s full join trg as t on t.i = s.i; alter view trg as select * from trg1; drop table if exists trg2; etc… From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Friday, September 23, 2016 1:02 PM To: user@hive.apache.org Subject: RE: on duplicate update equivalent? We’re

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
Hi Vijay, What is the use case for UPSERT in Hive. The functionality does not exist but there are other solutions. Are we talking about a set of dimension tables with primary keys hat need to be updated (existing rows) or inserted (new rows)? HTH Dr Mich Talebzadeh LinkedIn *

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
We’re not there yet… https://issues.apache.org/jira/browse/HIVE-10924 Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 11:47 AM To: user@hive.apache.org Subject: on duplicate update equivalent? Hello. Is there a way to write a query with a behaviour

on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
Hello. Is there a way to write a query with a behaviour equivalent to mysql's "on duplicate update"? i.e., try to insert, and if key exists, update the row instead? thanks,