If I understand you correctly this could be just another Hive storage format.
> On 06 Jan 2016, at 07:24, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > Hi, > > Thinking loudly. > > Ideally we should consider a totally columnar storage offering in which each > column of table is stored as compressed value (I disregard for now how > actually ORC does this but obviously it is not exactly a columnar storage). > > So each table can be considered as a loose federation of columnar storage > and each column is effectively an index? > > As columns are far narrower than tables, each index block will be very > higher density and all operations like aggregates can be done directly on > index rather than table. > > This type of table offering will be in true nature of data warehouse > storage. Of course row operations (get me all rows for this table) will be > slower but that is the trade-off that we need to consider. > > Expecting users to write their own IndexHandler may be technically > interesting but commercially not viable as Hive needs to be a product on its > own merit not a development base. Writing your own storage attributes etc. > requires skills that will put off people seeing Hive as an attractive > proposition (requiring considerable investment in skill sets in order to > maintain Hive). > > Thus my thinking on this is to offer true columnar storage in Hive to be a > proper data warehouse. In addition, the development tools cab ne made > available for those interested in tailoring their own specific Hive > solutions. > > > HTH > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr > V8Pw > > Sybase ASE 15 Gold Medal Award 2008 > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908. > pdf > Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15", > ISBN 978-0-9563693-0-7. > co-author "Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4 > Publications due shortly: > Complex Event Processing in Heterogeneous Environments, ISBN: > 978-0-9563693-3-8 > Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume > one out shortly > > http://talebzadehmich.wordpress.com > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. > > > -----Original Message----- > From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal > Vijayaraghavan > Sent: 05 January 2016 23:55 > To: user@hive.apache.org > Subject: Re: Is Hive Index officially not recommended? > > >> So in a nutshell in Hive if "external" indexes are not used for >> improving query response, what value they add and can we forget them for > now? > > The builtin indexes - those that write data as smaller tables are only > useful in a pre-columnar world, where the indexes offer a huge reduction in > IO. > > Part #1 of using hive indexes effectively is to write your own > HiveIndexHandler, with usesIndexTable=false; > > And then write a IndexPredicateAnalyzer, which lets you map arbitrary > lookups into other range conditions. > > Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA" > which consolidates the "internal" index into an external store (HBase). > > Some of the index data now lives in the HBase metastore, so that the > inclusion/exclusion of whole partitions can be done off the consolidated > index. > > https://issues.apache.org/jira/browse/HIVE-11676 > > > The experience from BI workloads run by customers is that in general, the > lookup to the right "slice" of data is more of a problem than the actual > aggregate. > > And that for a workhorse data warehouse, this has to survive even if there's > a non-stop stream of updates into it. > > Cheers, > Gopal >