Apache Trafodion provides SQL on top of HBase. On Thu, Nov 17, 2016 at 7:40 AM, Mich Talebzadeh <[email protected]> wrote:
> thanks John. > > How about using Phoenix or using Spark RDDs on top of Hbase? > > Many people think Phoenix is not a good choice? > > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 17 November 2016 at 14:24, John Leach <[email protected]> wrote: > > > Mich, > > > > I have not found too many happy users of Hive on top of HBase in my > > experience. For every query in Hive, you will have to read the data from > > the filesystem into hbase and then serialize the data via an HBase > scanner > > into Hive. The throughput through this mechanism is pretty poor and now > > when you read 1 million records you actually read 1 Million records in > > HBase and 1 Million Records in Hive. There are significant resource > > management issues with this approach as well. > > > > At Splice Machine (open source), we have written an implementation to > read > > the store files directly from the file system (via embedded Spark) and > then > > we do incremental deltas with HBase to maintain consistency. When we > read > > 1 million records, Spark reads most of them directly from the filesystem. > > Spark provides resource management and fair scheduling of those queries > as > > well. > > > > We released some of our performance results at HBaseCon East in NYC. > Here > > is the video. https://www.youtube.com/watch?v=cgIz-cjehJ0 < > > https://www.youtube.com/watch?v=cgIz-cjehJ0> . > > > > Regards, > > John Leach > > > > > On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh < > [email protected]> > > wrote: > > > > > > H, > > > > > > My approach to have a SQL engine on top of Hbase has been (excluding > > Spark > > > & Phoenix for now) is to create Hbase table as is, then create an > > EXTERNAL > > > Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to interface > > with > > > Hbase table. > > > > > > My reasoning with creating Hive external table is to avoid accidentally > > > dropping Hbase table etc. Is this a reasonable approach? > > > > > > Then that Hive table can be used by a variety of tools like Spark, > > Tableau, > > > Zeppelin. > > > > > > Is this a viable solution as Hive seems to be preferred on top of Hbase > > > compared to Phoenix etc. > > > > > > Thaks > > > > > > Dr Mich Talebzadeh > > > > > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > <https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > OABUrV8Pw>* > > > > > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for > any > > > loss, damage or destruction of data or any other property which may > arise > > > from relying on this email's technical content is explicitly > disclaimed. > > > The author will in no case be liable for any monetary damages arising > > from > > > such loss, damage or destruction. > > > > > -- Thanks, Gunnar *If you think you can you can, if you think you can't you're right.*
