Re: Hive metadata on Hbase

2016-10-26 Thread Furcy Pin
Hi Mich,

No, I am not using HBase as a metastore now, but I am eager for it to
become production ready and released in CDH and HDP.

Concerning locks, I think HBase would do fine because it is ACID at the row
level. It only appends data on HDFS, but
it works by keeping regions in RAM, plus a write-ahead-log for failure
recovery.
So updates on rows are atomic and ACID.
This allows to have acid guarantees between elements that are stored on the
same row.
Since HBase supports a great number of dynamic columns in each rows
(large-columnar store, like Cassandra), the
smart way to design your tables is quite different from RDBMS.
I would expect that they will have something like a hbase table with one
row per hive table, with all the associated data with it. This would make
all modifications on a table atomic.

Concerning locks, as they involve multiple tables, I guess they would have
to manually put a global lock on the "hbase lock table" before editing it.

I agree that you should not touch the system tables too much, but sometimes
you have to remove the deadlock or fix an inconsistency yourself. I guess
removing deadlocks in HBase should not be much harder, using the
hbase-shell (new syntax to learn, however)

It would be nice if Hive had some syntax to manually remove deadlocks when
they happen, you would not have to worry about the metastore implementation
then.



On Wed, Oct 26, 2016 at 12:58 AM, Mich Talebzadeh  wrote:

> Hi Furcy,
>
> Having used Hbase for part of Batch layer in Lambda Architecture I have
> come to conclusion that it is a very good product despite the fact that
> because of its cryptic nature it is not much loved or appreciated. However,
> it may be useful to have a Hive metastore skin on top of Hbase tables so
> admin and others can interrogate Hbase tables. Definitely there is a need
> for some sort of interface to Hive metastore on Hbase, whether through Hive
> or Phoenix.
>
> Then we still have to handle lock and concurrency on metastore tables.
> RDBMS is transactional and ACID compliant. I do not know enough about
> Hbase. As far as I know Hbase appends data. Currently when I have an issue
> with transactions and locks I go to metadata and do some plastic surgery on
> TRXN and LOCKS tables that resolves the issue. I am not sure how I am going
> to achieve that in Hbase. Puritans might argue that one should not touch
> these system tables but things are not generally that simple.
>
> Are you using Hbase as Hive metastore now?
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 October 2016 at 13:44, Furcy Pin  wrote:
>
>> Hi Mich,
>>
>> I mostly agree with you, but I would comment on the part about using
>> HBase as a maintenance free core product:
>> I would say that most medium company using Hadoop rely on Hortonworks or
>> Cloudera, that both provides a pre-packaged HBase installation. It would
>> probably make sense for them to ship pre-installed versions of Hive relying
>> on HBase as metastore.
>> And as Alan stated, it would also be a good way to improve the
>> integration between Hive and HBase.
>>
>> I am not well placed to give an opinion on this, but I agree that
>> maintaining integration between both HBase and regular RDBMS might be a
>> real pain.
>> I am also worried about the fact that if indeed HBase grant us the
>> possibility to have all nodes calling the metastore, then any optimization
>> making use
>> of this will only work for a cluster with a Hive metastore on HBase?
>>
>> Anyway, I am still looking forward to this, as despite working in a small
>> company, our metastore sometimes seems to be a bottleneck, especially
>> when running more than 20 queries on tables with 10 000 partitions...
>> But perhaps migrating it on a bigger host would be enough for us...
>>
>>
>>
>> On Mon, Oct 24, 2016 at 10:21 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks Alan for detailed explanation.
>>>
>>> Please bear in mind that any tool that needs to work with some
>>> repository (Oracle TimesTen IMDB has its metastore on Oracle classic),
>>> SAP Replication Server has its repository RSSD on SAP ASE and others
>>> First thing they do, they go and cache those tables and keep it in
>>> memory of the big brother database until they are shutdown. I reversed
>>> engineered and created Hive data model from physical schema (on Oracle).
>>> There are 

Re: Hive metadata on Hbase

2016-10-25 Thread Furcy Pin
Hi Mich,

I mostly agree with you, but I would comment on the part about using HBase
as a maintenance free core product:
I would say that most medium company using Hadoop rely on Hortonworks or
Cloudera, that both provides a pre-packaged HBase installation. It would
probably make sense for them to ship pre-installed versions of Hive relying
on HBase as metastore.
And as Alan stated, it would also be a good way to improve the integration
between Hive and HBase.

I am not well placed to give an opinion on this, but I agree that
maintaining integration between both HBase and regular RDBMS might be a
real pain.
I am also worried about the fact that if indeed HBase grant us the
possibility to have all nodes calling the metastore, then any optimization
making use
of this will only work for a cluster with a Hive metastore on HBase?

Anyway, I am still looking forward to this, as despite working in a small
company, our metastore sometimes seems to be a bottleneck, especially
when running more than 20 queries on tables with 10 000 partitions...
But perhaps migrating it on a bigger host would be enough for us...



On Mon, Oct 24, 2016 at 10:21 PM, Mich Talebzadeh  wrote:

> Thanks Alan for detailed explanation.
>
> Please bear in mind that any tool that needs to work with some repository
> (Oracle TimesTen IMDB has its metastore on Oracle classic), SAP Replication
> Server has its repository RSSD on SAP ASE and others
> First thing they do, they go and cache those tables and keep it in memory
> of the big brother database until they are shutdown. I reversed engineered
> and created Hive data model from physical schema (on Oracle). There are
> around 194 tables in total that can be easily cached.
>
> For small medium enterprise (SME), they don't really have much data so
> anything will do and they are the ones that use open source databases. For
> bigger companies, they already pay bucks for Oracle and alike and they are
> the one that would not touch an open source database (not talking about big
> data), because in this new capital-sensitive risk-averse world, they do
> not want to expose themselves to unnecessary risk.  So I am not sure
> whether they will take something like Hbase as a core product, unless it is
> going to be maintenance free.
>
> Going back to your point
>
> ".. but you have to pay for an expensive commercial license to make the
> metadata really work well is a non-starter"
>
> They already do and pay more if they have to. We will stick with Hive
> metadata on Oracle with schema on SSD
> .
>
> HTH
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 20:14, Alan Gates  wrote:
>
>> Some thoughts on this:
>>
>> First, there’s no plan to remove the option to use an RDBMS such as
>> Oracle as your backend.  Hive’s RawStore interface is built such that
>> various implementations of the metadata storage can easily coexist.
>> Obviously different users will make different choices about what metadata
>> store makes sense for them.
>>
>> As to why HBase:
>> 1) We desperately need to get rid of the ORM layer.  It’s causing us
>> performance problems, as evidenced by things like it taking several minutes
>> to fetch all of the partition data for queries that span many partitions.
>> HBase is a way to achieve this, not the only way.  See in particular
>> Yahoo’s work on optimizing Oracle access https://issues.apache.org/jira
>> /browse/HIVE-14870  The question around this is whether we can optimize
>> for Oracle, MySQL, Postgres, and SQLServer without creating a maintenance
>> and testing nightmare for ourselves.  I’m skeptical, but others think it’s
>> possible.  See comments on that JIRA.
>>
>> 2) We’d like to scale to much larger sizes, both in terms of data and
>> access from nodes.  Not that we’re worried about the amount of metadata,
>> but we’d like to be able to cache more stats, file splits, etc.  And we’d
>> like to allow nodes in the cluster to contact the metastore, which we do
>> not today since many RDBMSs don’t handle a thousand plus simultaneous
>> connections well.  Obviously both data and connection scale can be met with
>> high end commercial stores.  But saying that we have this great open source
>> database but you have to pay for an expensive commercial license to make
>> the metadata really work well is a non-starter.
>>
>> 3) By using tools within the 

Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Thanks Alan for detailed explanation.

Please bear in mind that any tool that needs to work with some repository
(Oracle TimesTen IMDB has its metastore on Oracle classic), SAP Replication
Server has its repository RSSD on SAP ASE and others
First thing they do, they go and cache those tables and keep it in memory
of the big brother database until they are shutdown. I reversed engineered
and created Hive data model from physical schema (on Oracle). There are
around 194 tables in total that can be easily cached.

For small medium enterprise (SME), they don't really have much data so
anything will do and they are the ones that use open source databases. For
bigger companies, they already pay bucks for Oracle and alike and they are
the one that would not touch an open source database (not talking about big
data), because in this new capital-sensitive risk-averse world, they do
not want to expose themselves to unnecessary risk.  So I am not sure
whether they will take something like Hbase as a core product, unless it is
going to be maintenance free.

Going back to your point

".. but you have to pay for an expensive commercial license to make the
metadata really work well is a non-starter"

They already do and pay more if they have to. We will stick with Hive
metadata on Oracle with schema on SSD
.

HTH









Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 20:14, Alan Gates  wrote:

> Some thoughts on this:
>
> First, there’s no plan to remove the option to use an RDBMS such as Oracle
> as your backend.  Hive’s RawStore interface is built such that various
> implementations of the metadata storage can easily coexist.  Obviously
> different users will make different choices about what metadata store makes
> sense for them.
>
> As to why HBase:
> 1) We desperately need to get rid of the ORM layer.  It’s causing us
> performance problems, as evidenced by things like it taking several minutes
> to fetch all of the partition data for queries that span many partitions.
> HBase is a way to achieve this, not the only way.  See in particular
> Yahoo’s work on optimizing Oracle access https://issues.apache.org/
> jira/browse/HIVE-14870  The question around this is whether we can
> optimize for Oracle, MySQL, Postgres, and SQLServer without creating a
> maintenance and testing nightmare for ourselves.  I’m skeptical, but others
> think it’s possible.  See comments on that JIRA.
>
> 2) We’d like to scale to much larger sizes, both in terms of data and
> access from nodes.  Not that we’re worried about the amount of metadata,
> but we’d like to be able to cache more stats, file splits, etc.  And we’d
> like to allow nodes in the cluster to contact the metastore, which we do
> not today since many RDBMSs don’t handle a thousand plus simultaneous
> connections well.  Obviously both data and connection scale can be met with
> high end commercial stores.  But saying that we have this great open source
> database but you have to pay for an expensive commercial license to make
> the metadata really work well is a non-starter.
>
> 3) By using tools within the Hadoop ecosystem like HBase we are helping to
> drive improvement in the system
>
> To explain the HBase work a little more, it doesn’t use Phoenix, but works
> directly against HBase, with the help of a transaction manager (Omid).  In
> performance tests we’ve done so far it’s faster than Hive 1 with the ORM
> layer, but not yet to the 10x range that we’d like to see.  We haven’t yet
> done the work to put in co-processors and such that we expect would speed
> it up further.
>
> Alan.
>
> > On Oct 23, 2016, at 15:46, Mich Talebzadeh 
> wrote:
> >
> >
> > A while back there was some notes on having Hive metastore on Hbase as
> opposed to conventional RDBMSs
> >
> > I am currently involved with some hefty work with Hbase and Phoenix for
> batch ingestion of trade data. As long as you define your Hbase table
> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
> impressive.
> >
> > I am not sure how much having Hbase as Hive metastore is going to add to
> Hive performance. We use Oracle 12c as Hive metastore and the Hive
> database/schema is built on solid state disks. Never had any issues with
> lock and concurrency.
> >
> > Therefore I am not sure what one is going to gain by having Hbase as the
> Hive metastore? I trust that we can still use our existing schemas 

Re: Hive metadata on Hbase

2016-10-24 Thread Alan Gates
Some thoughts on this:

First, there’s no plan to remove the option to use an RDBMS such as Oracle as 
your backend.  Hive’s RawStore interface is built such that various 
implementations of the metadata storage can easily coexist.  Obviously 
different users will make different choices about what metadata store makes 
sense for them.

As to why HBase:
1) We desperately need to get rid of the ORM layer.  It’s causing us 
performance problems, as evidenced by things like it taking several minutes to 
fetch all of the partition data for queries that span many partitions.  HBase 
is a way to achieve this, not the only way.  See in particular Yahoo’s work on 
optimizing Oracle access https://issues.apache.org/jira/browse/HIVE-14870  The 
question around this is whether we can optimize for Oracle, MySQL, Postgres, 
and SQLServer without creating a maintenance and testing nightmare for 
ourselves.  I’m skeptical, but others think it’s possible.  See comments on 
that JIRA.

2) We’d like to scale to much larger sizes, both in terms of data and access 
from nodes.  Not that we’re worried about the amount of metadata, but we’d like 
to be able to cache more stats, file splits, etc.  And we’d like to allow nodes 
in the cluster to contact the metastore, which we do not today since many 
RDBMSs don’t handle a thousand plus simultaneous connections well.  Obviously 
both data and connection scale can be met with high end commercial stores.  But 
saying that we have this great open source database but you have to pay for an 
expensive commercial license to make the metadata really work well is a 
non-starter.

3) By using tools within the Hadoop ecosystem like HBase we are helping to 
drive improvement in the system

To explain the HBase work a little more, it doesn’t use Phoenix, but works 
directly against HBase, with the help of a transaction manager (Omid).  In 
performance tests we’ve done so far it’s faster than Hive 1 with the ORM layer, 
but not yet to the 10x range that we’d like to see.  We haven’t yet done the 
work to put in co-processors and such that we expect would speed it up further.

Alan.

> On Oct 23, 2016, at 15:46, Mich Talebzadeh  wrote:
> 
> 
> A while back there was some notes on having Hive metastore on Hbase as 
> opposed to conventional RDBMSs
> 
> I am currently involved with some hefty work with Hbase and Phoenix for batch 
> ingestion of trade data. As long as you define your Hbase table through 
> Phoenix and with secondary Phoenix indexes on Hbase, the speed is impressive.
> 
> I am not sure how much having Hbase as Hive metastore is going to add to Hive 
> performance. We use Oracle 12c as Hive metastore and the Hive database/schema 
> is built on solid state disks. Never had any issues with lock and concurrency.
> 
> Therefore I am not sure what one is going to gain by having Hbase as the Hive 
> metastore? I trust that we can still use our existing schemas on Oracle.
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  



Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Hi Furcy,

Thanks for updates.

transactional tables creates issue for us. When many updates are done they
create many delta files that require compaction.

This by itself is not an issue for Hive. However, Spark fails to read these
delta files so the job crashes.

Regards,

Mich

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 08:39, Furcy Pin  wrote:

> Hi Mich,
>
> the umbrella JIRA for this gives a few reason.
> https://issues.apache.org/jira/browse/HIVE-9452
> (with even more details in the attached pdf https://issues.apache.org/
> jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf)
>
> In my experience, Hive tables with a lot of partitions (> 10 000) may
> become really slow, especially with Spark.
> The latency induced by the metastore can be really big compared to the
> whole duration of the query itself,
> because the driver needs to fetch a lot of info about partitions just to
> optimize the query, before even running it.
>
> I guess another advantage is that using a RDBMS as metastore makes it a
> SPOF, unless you setup replication etc. while, HBase would give HA for free.
>
>
>
> On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> @Per
>>
>> We run full transactional enabled Hive metadb on an Oracle DB.
>>
>> I don't have statistics now but will collect from AWR reports no problem.
>>
>> @Jorn,
>>
>> The primary reason Oracle was chosen is because the company has global
>> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
>> Grade databases.
>>
>> None of MySQL and others are classified as such so they cannot be
>> deployed in production.
>>
>> Besides, for us to have Hive metadata on Oracle makes sense as our
>> infrastructure does all the support, HA etc for it and they have trained
>> DBAs to look after it 24x7.
>>
>> Admittedly we are now relying on HDFS itself plus Hbase as well for
>> persistent storage. So the situation might change.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>>
>>> I thought the main gain was to get ACID on Hive performant enough.
>>>
>>> @Mich: Do you run with ACID-enabled tables? How many
>>> Create/Update/Deletes do you do per second?
>>>
>>> best regards
>>> /Pelle
>>>
>>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>>> wrote:
>>>
 I think the main gain is more about getting rid of a dedicated database
 including maintenance and potential license cost.
 For really large clusters and a lot of users this might be even more
 beneficial. You can avoid clustering the database etc.

 On 24 Oct 2016, at 00:46, Mich Talebzadeh 
 wrote:


 A while back there was some notes on having Hive metastore on Hbase as
 opposed to conventional RDBMSs

 I am currently involved with some hefty work with Hbase and Phoenix for
 batch ingestion of trade data. As long as you define your Hbase table
 through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
 impressive.

 I am not sure how much having Hbase as Hive metastore is going to add
 to Hive performance. We use Oracle 12c as Hive metastore and the Hive
 database/schema is built on solid state disks. Never had any issues with
 lock and concurrency.

 Therefore I am not sure what one is going to gain by having Hbase as
 the Hive metastore? I trust that we can still use our existing schemas on
 Oracle.

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



Re: Hive metadata on Hbase

2016-10-24 Thread Furcy Pin
Hi Mich,

the umbrella JIRA for this gives a few reason.
https://issues.apache.org/jira/browse/HIVE-9452
(with even more details in the attached pdf
https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
)

In my experience, Hive tables with a lot of partitions (> 10 000) may
become really slow, especially with Spark.
The latency induced by the metastore can be really big compared to the
whole duration of the query itself,
because the driver needs to fetch a lot of info about partitions just to
optimize the query, before even running it.

I guess another advantage is that using a RDBMS as metastore makes it a
SPOF, unless you setup replication etc. while, HBase would give HA for free.



On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh 
wrote:

> @Per
>
> We run full transactional enabled Hive metadb on an Oracle DB.
>
> I don't have statistics now but will collect from AWR reports no problem.
>
> @Jorn,
>
> The primary reason Oracle was chosen is because the company has global
> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
> Grade databases.
>
> None of MySQL and others are classified as such so they cannot be deployed
> in production.
>
> Besides, for us to have Hive metadata on Oracle makes sense as our
> infrastructure does all the support, HA etc for it and they have trained
> DBAs to look after it 24x7.
>
> Admittedly we are now relying on HDFS itself plus Hbase as well for
> persistent storage. So the situation might change.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>
>> I thought the main gain was to get ACID on Hive performant enough.
>>
>> @Mich: Do you run with ACID-enabled tables? How many
>> Create/Update/Deletes do you do per second?
>>
>> best regards
>> /Pelle
>>
>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>> wrote:
>>
>>> I think the main gain is more about getting rid of a dedicated database
>>> including maintenance and potential license cost.
>>> For really large clusters and a lot of users this might be even more
>>> beneficial. You can avoid clustering the database etc.
>>>
>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh 
>>> wrote:
>>>
>>>
>>> A while back there was some notes on having Hive metastore on Hbase as
>>> opposed to conventional RDBMSs
>>>
>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>> batch ingestion of trade data. As long as you define your Hbase table
>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>> impressive.
>>>
>>> I am not sure how much having Hbase as Hive metastore is going to add to
>>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>> database/schema is built on solid state disks. Never had any issues with
>>> lock and concurrency.
>>>
>>> Therefore I am not sure what one is going to gain by having Hbase as the
>>> Hive metastore? I trust that we can still use our existing schemas on
>>> Oracle.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Per Ullberg*
>> Data Vault Tech Lead
>> Odin Uppsala
>> +46 701612693 <+46+701612693>
>>
>> Klarna AB (publ)
>> Sveavägen 46, 111 34 Stockholm
>> Tel: +46 8 120 120 00 <+46812012000>
>> Reg no: 556737-0431
>> klarna.com
>>
>>
>


Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Hive 2.0.1
Subversion git://reznor-mbp-2.local/Users/sergey/git/hivegit -r
e3cfeebcefe9a19c5055afdcbb00646908340694
Compiled by sergey on Tue May 3 21:03:11 PDT 2016
>From source with checksum 5a49522e4b572555dbbe5dd4773bc7c2

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 08:29, Per Ullberg  wrote:

> What version of hive are you running?
>
> /Pelle
>
>
> On Monday, October 24, 2016, Mich Talebzadeh 
> wrote:
>
>> @Per
>>
>> We run full transactional enabled Hive metadb on an Oracle DB.
>>
>> I don't have statistics now but will collect from AWR reports no problem.
>>
>> @Jorn,
>>
>> The primary reason Oracle was chosen is because the company has global
>> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
>> Grade databases.
>>
>> None of MySQL and others are classified as such so they cannot be
>> deployed in production.
>>
>> Besides, for us to have Hive metadata on Oracle makes sense as our
>> infrastructure does all the support, HA etc for it and they have trained
>> DBAs to look after it 24x7.
>>
>> Admittedly we are now relying on HDFS itself plus Hbase as well for
>> persistent storage. So the situation might change.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>>
>>> I thought the main gain was to get ACID on Hive performant enough.
>>>
>>> @Mich: Do you run with ACID-enabled tables? How many
>>> Create/Update/Deletes do you do per second?
>>>
>>> best regards
>>> /Pelle
>>>
>>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>>> wrote:
>>>
 I think the main gain is more about getting rid of a dedicated database
 including maintenance and potential license cost.
 For really large clusters and a lot of users this might be even more
 beneficial. You can avoid clustering the database etc.

 On 24 Oct 2016, at 00:46, Mich Talebzadeh 
 wrote:


 A while back there was some notes on having Hive metastore on Hbase as
 opposed to conventional RDBMSs

 I am currently involved with some hefty work with Hbase and Phoenix for
 batch ingestion of trade data. As long as you define your Hbase table
 through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
 impressive.

 I am not sure how much having Hbase as Hive metastore is going to add
 to Hive performance. We use Oracle 12c as Hive metastore and the Hive
 database/schema is built on solid state disks. Never had any issues with
 lock and concurrency.

 Therefore I am not sure what one is going to gain by having Hbase as
 the Hive metastore? I trust that we can still use our existing schemas on
 Oracle.

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




>>>
>>>
>>> --
>>>
>>> *Per Ullberg*
>>> Data Vault Tech Lead
>>> Odin Uppsala
>>> +46 701612693 <+46+701612693>
>>>
>>> Klarna AB (publ)
>>> Sveavägen 46, 111 34 Stockholm
>>> Tel: +46 8 120 120 00 <+46812012000>
>>> Reg no: 556737-0431
>>> klarna.com
>>>
>>>
>>
>
> --
>
> *Per Ullberg*
> Data Vault Tech Lead
> Odin Uppsala
> 

Re: Hive metadata on Hbase

2016-10-24 Thread Per Ullberg
What version of hive are you running?

/Pelle

On Monday, October 24, 2016, Mich Talebzadeh 
wrote:

> @Per
>
> We run full transactional enabled Hive metadb on an Oracle DB.
>
> I don't have statistics now but will collect from AWR reports no problem.
>
> @Jorn,
>
> The primary reason Oracle was chosen is because the company has global
> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
> Grade databases.
>
> None of MySQL and others are classified as such so they cannot be deployed
> in production.
>
> Besides, for us to have Hive metadata on Oracle makes sense as our
> infrastructure does all the support, HA etc for it and they have trained
> DBAs to look after it 24x7.
>
> Admittedly we are now relying on HDFS itself plus Hbase as well for
> persistent storage. So the situation might change.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 06:46, Per Ullberg  > wrote:
>
>> I thought the main gain was to get ACID on Hive performant enough.
>>
>> @Mich: Do you run with ACID-enabled tables? How many
>> Create/Update/Deletes do you do per second?
>>
>> best regards
>> /Pelle
>>
>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke > > wrote:
>>
>>> I think the main gain is more about getting rid of a dedicated database
>>> including maintenance and potential license cost.
>>> For really large clusters and a lot of users this might be even more
>>> beneficial. You can avoid clustering the database etc.
>>>
>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh >> > wrote:
>>>
>>>
>>> A while back there was some notes on having Hive metastore on Hbase as
>>> opposed to conventional RDBMSs
>>>
>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>> batch ingestion of trade data. As long as you define your Hbase table
>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>> impressive.
>>>
>>> I am not sure how much having Hbase as Hive metastore is going to add to
>>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>> database/schema is built on solid state disks. Never had any issues with
>>> lock and concurrency.
>>>
>>> Therefore I am not sure what one is going to gain by having Hbase as the
>>> Hive metastore? I trust that we can still use our existing schemas on
>>> Oracle.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Per Ullberg*
>> Data Vault Tech Lead
>> Odin Uppsala
>> +46 701612693 <+46+701612693>
>>
>> Klarna AB (publ)
>> Sveavägen 46, 111 34 Stockholm
>> Tel: +46 8 120 120 00 <+46812012000>
>> Reg no: 556737-0431
>> klarna.com
>>
>>
>

-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
@Per

We run full transactional enabled Hive metadb on an Oracle DB.

I don't have statistics now but will collect from AWR reports no problem.

@Jorn,

The primary reason Oracle was chosen is because the company has global
licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
Grade databases.

None of MySQL and others are classified as such so they cannot be deployed
in production.

Besides, for us to have Hive metadata on Oracle makes sense as our
infrastructure does all the support, HA etc for it and they have trained
DBAs to look after it 24x7.

Admittedly we are now relying on HDFS itself plus Hbase as well for
persistent storage. So the situation might change.

HTH







Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 06:46, Per Ullberg  wrote:

> I thought the main gain was to get ACID on Hive performant enough.
>
> @Mich: Do you run with ACID-enabled tables? How many Create/Update/Deletes
> do you do per second?
>
> best regards
> /Pelle
>
> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke  wrote:
>
>> I think the main gain is more about getting rid of a dedicated database
>> including maintenance and potential license cost.
>> For really large clusters and a lot of users this might be even more
>> beneficial. You can avoid clustering the database etc.
>>
>> On 24 Oct 2016, at 00:46, Mich Talebzadeh 
>> wrote:
>>
>>
>> A while back there was some notes on having Hive metastore on Hbase as
>> opposed to conventional RDBMSs
>>
>> I am currently involved with some hefty work with Hbase and Phoenix for
>> batch ingestion of trade data. As long as you define your Hbase table
>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>> impressive.
>>
>> I am not sure how much having Hbase as Hive metastore is going to add to
>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>> database/schema is built on solid state disks. Never had any issues with
>> lock and concurrency.
>>
>> Therefore I am not sure what one is going to gain by having Hbase as the
>> Hive metastore? I trust that we can still use our existing schemas on
>> Oracle.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>
>
> --
>
> *Per Ullberg*
> Data Vault Tech Lead
> Odin Uppsala
> +46 701612693 <+46+701612693>
>
> Klarna AB (publ)
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00 <+46812012000>
> Reg no: 556737-0431
> klarna.com
>
>


Re: Hive metadata on Hbase

2016-10-23 Thread Per Ullberg
I thought the main gain was to get ACID on Hive performant enough.

@Mich: Do you run with ACID-enabled tables? How many Create/Update/Deletes
do you do per second?

best regards
/Pelle

On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke  wrote:

> I think the main gain is more about getting rid of a dedicated database
> including maintenance and potential license cost.
> For really large clusters and a lot of users this might be even more
> beneficial. You can avoid clustering the database etc.
>
> On 24 Oct 2016, at 00:46, Mich Talebzadeh 
> wrote:
>
>
> A while back there was some notes on having Hive metastore on Hbase as
> opposed to conventional RDBMSs
>
> I am currently involved with some hefty work with Hbase and Phoenix for
> batch ingestion of trade data. As long as you define your Hbase table
> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
> impressive.
>
> I am not sure how much having Hbase as Hive metastore is going to add to
> Hive performance. We use Oracle 12c as Hive metastore and the Hive
> database/schema is built on solid state disks. Never had any issues with
> lock and concurrency.
>
> Therefore I am not sure what one is going to gain by having Hbase as the
> Hive metastore? I trust that we can still use our existing schemas on
> Oracle.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Hive metadata on Hbase

2016-10-23 Thread Jörn Franke
I think the main gain is more about getting rid of a dedicated database 
including maintenance and potential license cost. 
For really large clusters and a lot of users this might be even more 
beneficial. You can avoid clustering the database etc.

> On 24 Oct 2016, at 00:46, Mich Talebzadeh  wrote:
> 
> 
> A while back there was some notes on having Hive metastore on Hbase as 
> opposed to conventional RDBMSs
> 
> I am currently involved with some hefty work with Hbase and Phoenix for batch 
> ingestion of trade data. As long as you define your Hbase table through 
> Phoenix and with secondary Phoenix indexes on Hbase, the speed is impressive.
> 
> I am not sure how much having Hbase as Hive metastore is going to add to Hive 
> performance. We use Oracle 12c as Hive metastore and the Hive database/schema 
> is built on solid state disks. Never had any issues with lock and concurrency.
> 
> Therefore I am not sure what one is going to gain by having Hbase as the Hive 
> metastore? I trust that we can still use our existing schemas on Oracle.
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>