Hi, Demai If you think something helpful can be done within HBase, feel free to propose on the JIRA.
Jerry On Fri, Oct 21, 2016 at 2:41 PM, Mich Talebzadeh <[email protected]> wrote: > Hi Demai, > > As I understand you want to use Hbase as the real time layer and Hive Data > Warehouse as the batch layer for analytics. > > In other words ingest data real time from source into Hbase and push that > data into Hive recurring > > If you partition your target ORC table with DtStamp and INSERT/OVERWRITE > into this table using Spark as the execution engine for Hive (as opposed to > map-reduce) it should pretty fast. > > Hive is going to get an in-memory database in the next release or so it is > a perfect choice. > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 21 October 2016 at 22:28, Demai Ni <[email protected]> wrote: > > > Mich, > > > > thanks for the detail instructions. > > > > While aware of the Hive method, I have a few questions/concerns: > > 1) the Hive method is a "INSERT FROM SELECT " ,which usually not perform > as > > good as a bulk load though I am not familiar with the real implementation > > 2) I have another SQL-on-Hadoop engine working well with ORC file. So if > > possible, I'd like to avoid the system dependency on Hive(one fewer > > component to maintain). > > 3) HBase has well running back-end process for Replication(HBASE-1295) or > > Backup(HBASE-7912), so wondering anything can be piggy-back on it to > deal > > with day-to-day works > > > > The goal is to have HBase as a OLTP front(to receive data), and the ORC > > file(with a SQL engine) as the OLAP end for reporting/analytic. the ORC > > file will also serve as my backup in the case for DR. > > > > Demai > > > > > > On Fri, Oct 21, 2016 at 1:57 PM, Mich Talebzadeh < > > [email protected]> > > wrote: > > > > > Create an external table in Hive on Hbase atble. Pretty straight > forward. > > > > > > hive> create external table marketDataHbase (key STRING, ticker > STRING, > > > timecreated STRING, price STRING) > > > > > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH > > > SERDEPROPERTIES ("hbase.columns.mapping" = > > > ":key,price_info:ticker,price_info:timecreated, price_info:price") > > > > > > TBLPROPERTIES ("hbase.table.name" = "marketDataHbase"); > > > > > > > > > > > > then create a normal table in hive as ORC > > > > > > > > > CREATE TABLE IF NOT EXISTS marketData ( > > > KEY string > > > , TICKER string > > > , TIMECREATED string > > > , PRICE float > > > ) > > > PARTITIONED BY (DateStamp string) > > > STORED AS ORC > > > TBLPROPERTIES ( > > > "orc.create.index"="true", > > > "orc.bloom.filter.columns"="KEY", > > > "orc.bloom.filter.fpp"="0.05", > > > "orc.compress"="SNAPPY", > > > "orc.stripe.size"="16777216", > > > "orc.row.index.stride"="10000" ) > > > ; > > > --show create table marketData; > > > --Populate target table > > > INSERT OVERWRITE TABLE marketData PARTITION (DateStamp = "${TODAY}") > > > SELECT > > > KEY > > > , TICKER > > > , TIMECREATED > > > , PRICE > > > FROM MarketDataHbase > > > > > > > > > Run this job as a cron every often > > > > > > > > > HTH > > > > > > > > > > > > Dr Mich Talebzadeh > > > > > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > <https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > > OABUrV8Pw>* > > > > > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for > any > > > loss, damage or destruction of data or any other property which may > arise > > > from relying on this email's technical content is explicitly > disclaimed. > > > The author will in no case be liable for any monetary damages arising > > from > > > such loss, damage or destruction. > > > > > > > > > > > > On 21 October 2016 at 21:48, Demai Ni <[email protected]> wrote: > > > > > > > hi, > > > > > > > > I am wondering whether there are existing methods to ETL HBase data > to > > > > ORC(or other open source columnar) file? > > > > > > > > I understand in Hive "insert into Hive_ORC_Table from SELET * from > > > > HIVE_HBase_Table", can probably get the job done. Is this the common > > way > > > to > > > > do so? Performance is acceptable and able to handle the delta update > in > > > the > > > > case HBase table changed? > > > > > > > > I did a bit google, and find this > > > > https://community.hortonworks.com/questions/2632/loading- > > > > hbase-from-hive-orc-tables.html > > > > > > > > which is another way around. > > > > > > > > Will it perform better(comparing to above Hive stmt) if using either > > > > replication logic or snapshot backup to generate ORC file from hbase > > > tables > > > > and with incremental update ability? > > > > > > > > I hope to has as fewer dependency as possible. in the Example of ORC, > > > will > > > > only depend on Apache ORC's API, and not depend on Hive > > > > > > > > Demai > > > > > > > > > >
