Another solution is to use HIVE over HBase.
When you insert in this table, HIVE do an upsert.
2016-09-23 21:00 GMT+02:00 Mich Talebzadeh :
> The fundamental question is: do you need these recurring updates to
> dimension tables throttling your Hive tables.
>
>
The fundamental question is: do you need these recurring updates to
dimension tables throttling your Hive tables.
Besides why bother with ETL when one can do ELT.
For dimension table just add two additional columns namely
, op_type int
, op_time timestamp
op_type = 1/2/3
> Dimensions change, and I'd rather do update than recreate a snapshot.
Slow changing dimensions are the common use-case for Hive's ACID MERGE.
The feature you need is most likely covered by
https://issues.apache.org/jira/browse/HIVE-10924
2nd comment from that JIRA
"Once an hour, a set of
Hi Vijay,
If dimensional tables are reasonable size and frequently updated, then you
can deploy *Spark SQL* to get data directly from your MySQL table through
JDBC and do your join with your fact table stored in Hive.
In general these days one can do better with Spark SQL. Your fact table
still
handran [mailto:vi...@linkedin.com]
> *Sent:* Friday, September 23, 2016 1:46 PM
> *To:* user@hive.apache.org
> *Subject:* Re: on duplicate update equivalent?
>
>
>
>
>
> On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote
If these are dimension tables, what do you need to update there?
Dudu
From: Vijay Ramachandran [mailto:vi...@linkedin.com]
Sent: Friday, September 23, 2016 1:46 PM
To: user@hive.apache.org
Subject: Re: on duplicate update equivalent?
On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh
On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh
wrote:
> What is the use case for UPSERT in Hive. The functionality does not exist
> but there are other solutions.
>
> Are we talking about a set of dimension tables with primary keys hat need
> to be updated (existing
) from src as s full join trg as t on t.i = s.i;
alter view trg as select * from trg1;
drop table if exists trg2;
etc…
From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Friday, September 23, 2016 1:02 PM
To: user@hive.apache.org
Subject: RE: on duplicate update equivalent?
We’re
Hi Vijay,
What is the use case for UPSERT in Hive. The functionality does not exist
but there are other solutions.
Are we talking about a set of dimension tables with primary keys hat need
to be updated (existing rows) or inserted (new rows)?
HTH
Dr Mich Talebzadeh
LinkedIn *
We’re not there yet…
https://issues.apache.org/jira/browse/HIVE-10924
Dudu
From: Vijay Ramachandran [mailto:vi...@linkedin.com]
Sent: Friday, September 23, 2016 11:47 AM
To: user@hive.apache.org
Subject: on duplicate update equivalent?
Hello.
Is there a way to write a query with a behaviour
Hello.
Is there a way to write a query with a behaviour equivalent to mysql's "on
duplicate update"? i.e., try to insert, and if key exists, update the row
instead?
thanks,
11 matches
Mail list logo