Re: RDBMS to HBASE schema migration

grashmi13 Thu, 28 Jun 2012 12:01:38 -0700

Assets table is having numeric sequential ID and a one number out of
(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say 10
rows only.

hmmm.. after some more surfing, i came to know that we have to manually
denormalize a relational DB. there are no preset rules for denormalization. 

today I was doing a denormalization.. can you please confirm if rdbms to
hbase change i have done is correct or not: 
These tables are from stock trading application: 

1. I have two tables... 
Assets 
========= 
AssetID | assetName 

second table is: 

INSTRUMENTS 
============ 
InstrumentID | assetID | Symbol | name | created | modifiedAt 

While converting them into Hbase two questions were there in my mind: 
1. Will there be two tables or one? If we have a web GUI and have two
screens, one for assets management and one for instuments management. In
instrument management, there would be a combo having assets. An asset can be
assigned to multiple instruments.

So if user will first populate all assets thru assets screen. So there
should be one master table for Assets. because it is not dependent on any
other entity. Am I right here? 

And now user will populate instruments thru instrument management screen.
Here another table would come into picture. "INSTRUMENTS". this table would
be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
properties, all assets information should be duplicated here too. Now
question here, if I delete or update some asset in future, and there are
corresponding records in INSTRUMENTS table. what if asset is deletion or
updation goes successful for assets table but fails for instruments table?
How we can maintain consistancy and integrity here? 

HBASE table conversion here: 

ASSETS 
================== 
row key - AssetID 
================ 
Columns: CF1 - AssetName 
================================================ 

INSTRUMENTS 
===================================== 
row key-- instrumentID(a sequential numeric value) 
==================================== 
InstrumentInfo(family) - 
Symbol 
name 
AssetsInfo(family2) - 
AssetID 
AssetName 
===================================== 

Please confirm, two tables would be required or one only? If one only, how
assets population and after that instruments population would be possible?
If two, how we can maintain integrity and consistancy during assets
deletion\updation? 

Please confirm if my approach is correct? 

Yes, as read everywhere, in hbase, we must think about purpose of data in
our application before creating tables. So purpose here is, to
populate\delete\update assets and instruments thru web UI and then run a job
which extract instruments by asset name. Also, assets by instrument name. 

======================================

After above conversion, I got stuck on one more point. "SELF-JOIN" 

Lets say, I have a table InstrumentsStock having 10 columns in it, out of it
one primary key stockID and other normal column BaseCurrencyID, both points
to an instrumentID. 

StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID is
ID of an instrument whose assetID is 3. 

InstrumentsStock 
======================= 
stockID | AssetID | BaseCurrencyID | 10 more columns here 
======================================== 

where StockID = InstrumentID of Instrument with AssetID 1 
and BaseCurrencyID = InstrumentID of Instrument with AssetID 3 

StockID is primary key 

E.g. I have below data in tables: 
Instruments: 
====================================== 
InstrumentID | assetsID | Symbol | Name | Comment 
====================================== 
22 | 1 | WMT | Ins1| createdByRashmi 
23 | 1 | HOG | Ins2| createdByRashmi 
.... 
.... 
.... 
40 | 3 | HPQ | Ins3| createdByRashmi 
41 | 3 | KO | Ins4| createdByRashmi 

InstrumentsStock 
========================================== 
StockID | assetsID | BasecurrencyID | x | y | z 
====================================== 
22 | 1 | 41 | stock1 | y1 | z1 
23 | 1 | 40 | stock2 | y2 | z2 
24 | 1 | 41 | stock3 | y3 | z3 
25 | 1 | 40 | stock4 | y4 | z4 
... 
... 

Relationship between Instruments and InstrumentsStock is one to one. 

Question: retrieve all Stocks where baseCurrencyID is 40. 

Change I made: Combined all three tables, Instruments, Assets,
InstrumentsStock and created one table "INstruments_Asset_Stocks" with three
families (AssetInfo, InstrumentInfo, StockINfo) 

INstruments_Asset_Stocks 
============================================================== 
rowKey : | | | 
InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo 
=============================================================== 
| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name 
=============================================================== 
22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1 
23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2 
24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3 
25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4 
40 | 3 | assetTwo | | HPQ, Ins5 
41 | 3 | assetTwo | | KO, Ins6 

On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all
Stock relation detail from this table. On basis of AssetInfo family, assets
related stocks, on basis of IntrumentInfo, related stock we can retrieve... 

Is my understanding correct, or I am so far very wrong. 

Here SELF-JOIn was used in relational schema... how do we achieve same JOIn
in hbase, as above or some other way? 

What is nested entities and how do we create nested entities using "Hbase
Shell" and using hbase java API. There is API to create family, column... is
there API to create Nested Entities? 

There is one more table: 

Same StockID and localCurrencyID are populated using InstrumentID only. Data
of this table is like: 
Listing 
============================== 
ID | stockID | LocalCurrencyID | location 
============================== 
L1| 22 | 51 | India 
L1| 22 | 52 | Japan 
L1| 22 | 40 | London 
L1| 23 | 57 | USA 
L1| 23 | 41 | Africa

Here again self-join is used and many-many relationship... 

How do I integrate this table with above INstruments_Asset_Stocks table? I
want to get All stocks details for given locaCurrency. 

What I did: 
INstruments_Asset_Stocks_Listing 
=============================================== 
rowKey - instrumentID_ListingID 
=============================================== 
Cf1: AssetINnfo (assetID, assetName) 
Cf2: InstrumentInfo(symbol, name, comment) 
cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID, nested_enties_for_LocalCurrency
) 

Nested entities for local currency would be having localCurrencyID as key
and location_listingID as value. 

Is this change correct to get all Stocks for given localCurrency. As read
somewhere, I have to use map\reduce to get this data from nested entities?
is it? 

How do we implement nested entities? How(from which API or shell command)
client scans, searched these entities? Can I search it like I do for column
qualifier BaseCurrencyID? 

Your inputs would really help me in understanding this Hbase design. 

Regards, 
Rashmi

grashmi13 wrote:
> 
> Hi,
> 
> I want to change my RDBMS to HBASE schema, to be used with Hadoop
> platform.
> 
> I have changed two RDBMS tables into HBASE tables. I have ignored
> constraints, indexes and foreign key relationship. Because I dont know how
> to convert these relationships in Hbase schema.
> 
> Please confirm if the change I have made is correct?
> 
> Relational Schema of tables are:
> ==========================
> 
> TABLE : ASSTES
> ------------------------------------------------------------------
> 
> CREATE TABLE [dbo].[Assets](
> [AssetId] [int] NOT NULL,
> [AssetName] [varchar](50) NOT NULL,
> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED 
> (
> [AssetId] ASC
> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
> ) ON [PRIMARY]
> -------------------------------------------------------------------------------------------------------------------------------
> 
> 
> --------------------------------
> TABLE: Instruments
> ----------------------------------
> CREATE TABLE [dbo].[Instruments](
> [InstrumentId] [int] IDENTITY(1,1) NOT NULL,
> [AssetId] [int] NOT NULL,
> [Symbol] [varchar](50) NOT NULL,
> [Name] [varchar](250) NOT NULL,
> [Created] [datetime] NOT NULL,
> [Modified] [datetime] NULL,
> [Comments] [varchar](250) NULL,
> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED 
> (
> [InstrumentId] ASC
> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED 
> (
> [InstrumentId] ASC,
> [AssetId] ASC
> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
> ) ON [PRIMARY]
> 
> GO
> 
> SET ANSI_PADDING ON
> GO
> 
> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT
> [FK_Instruments_Assets] FOREIGN KEY([AssetId])
> REFERENCES [dbo].[Assets] ([AssetId])
> ON UPDATE CASCADE
> GO
> 
> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT [FK_Instruments_Assets]
> GO
> 
> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT [DF_Instruments_Created_1]
> DEFAULT (getdate()) FOR [Created]
> GO
> 
> ====================================
> 
> 
> 
> 
> 
> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables:
> ====================================================================
> 
> Assets table
> ===============
> RowKey        -- AssetID
> ColumnFamilies        (AssetName      ) ---- ColumnName - (Name)
> 
> 
> 
> Instruments table
> ==============
> RowKey        - InstrumentID
> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created, Modified,
> Comments
> (Assets) ---  Columns: AssetID
> 
> 
> Please confirm if given conversion is proper?
> 
> Also, how do i convert constraints and indexes and foreign key
> relationship?
> 
> 
> 
> 
> Thanks in advance
> 
> Regards,
> rashmi
> 

-- 
View this message in context: 
http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: RDBMS to HBASE schema migration

Reply via email to