Thanks John,

 

I have already registered my interest on development work for Hive. So
hopefully I may be able to contribute at some level.

 

Regards,

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: John Pullokkaran [mailto:jpullokka...@hortonworks.com] 
Sent: 19 April 2015 20:37
To: user@hive.apache.org
Subject: Re: Orc file and Hive Optimiser

 

ORC format is transparent to CBO.

Currently we are working on a new cost model which might reflect ORC's
performance advantages in optimization decisions.

 

Thanks

John

 

From: Mich Talebzadeh <m...@peridale.co.uk>
Reply-To: "user@hive.apache.org" <user@hive.apache.org>
Date: Sunday, April 19, 2015 at 12:32 PM
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: Orc file and Hive Optimiser

 

My understanding is that the Optimized Row Columnar (ORC) file format
provides a highly efficient way to store Hive data. 

 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC

 

 

In a nutshell the columnar storage allows pretty efficient compression of
columns on par with what Data Warehouses databases  like Sybase IQ provide.
In short if a normal Hive table is "Row based implementation of relational
model", then ORC is the equivalent for "Columnar based implementation of
relational model"

 

I find ORC file format pretty interesting as it provides a more efficient
performance compared to other Hive file formats Trying testing it). MY only
question is whether the Cost Based Optimiser (CBO) of Hive is aware of ORC
storage format and it treats the table accordingly?

 

Finally this is more of a speculative question. If we have ORC files that
provide good functionality, is there any reason why one should deploy a
columnar database such as Hbase or Cassandra If Hive can do the job as well?

 

Thanks,

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

Reply via email to