Hi there-

I commend your enthusiasm for the Hbase project.  For the "ground rules of
Hbase" you probably want to read this closelyŠ

http://hbase.apache.org/book.html#datamodel

Š as it covers things like having one PK per table, no secondary indexes,
etc.  With a solid understanding of these rules the table relationship and
search questions you're asking will become clearer.






On 6/28/12 3:01 PM, "grashmi13" <[email protected]> wrote:

>
>Assets table is having numeric sequential ID and a one number out of
>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
>10
>rows only. 
>
>
>hmmm.. after some more surfing, i came to know that we have to manually
>denormalize a relational DB. there are no preset rules for
>denormalization. 
>
>today I was doing a denormalization.. can you please confirm if rdbms to
>hbase change i have done is correct or not:
>These tables are from stock trading application:
>
>1. I have two tables...
>Assets 
>========= 
>AssetID | assetName
>
>
>second table is: 
>
>INSTRUMENTS 
>============ 
>InstrumentID | assetID | Symbol | name | created | modifiedAt
>
>
>While converting them into Hbase two questions were there in my mind:
>1. Will there be two tables or one? If we have a web GUI and have two
>screens, one for assets management and one for instuments management. In
>instrument management, there would be a combo having assets. An asset can
>be
>assigned to multiple instruments.
>
>So if user will first populate all assets thru assets screen. So there
>should be one master table for Assets. because it is not dependent on any
>other entity. Am I right here?
>
>And now user will populate instruments thru instrument management screen.
>Here another table would come into picture. "INSTRUMENTS". this table
>would
>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
>properties, all assets information should be duplicated here too. Now
>question here, if I delete or update some asset in future, and there are
>corresponding records in INSTRUMENTS table. what if asset is deletion or
>updation goes successful for assets table but fails for instruments table?
>How we can maintain consistancy and integrity here?
>
>
>HBASE table conversion here:
>
>ASSETS 
>==================
>row key - AssetID 
>================ 
>Columns: CF1 - AssetName
>================================================
>
>INSTRUMENTS 
>=====================================
>row key-- instrumentID(a sequential numeric value)
>====================================
>InstrumentInfo(family) -
>Symbol 
>name 
>AssetsInfo(family2) -
>AssetID 
>AssetName 
>=====================================
>
>
>Please confirm, two tables would be required or one only? If one only, how
>assets population and after that instruments population would be possible?
>If two, how we can maintain integrity and consistancy during assets
>deletion\updation?
>
>Please confirm if my approach is correct?
>
>Yes, as read everywhere, in hbase, we must think about purpose of data in
>our application before creating tables. So purpose here is, to
>populate\delete\update assets and instruments thru web UI and then run a
>job
>which extract instruments by asset name. Also, assets by instrument name.
>
>======================================
>
>After above conversion, I got stuck on one more point. "SELF-JOIN"
>
>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
>it
>one primary key stockID and other normal column BaseCurrencyID, both
>points
>to an instrumentID.
>
>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
>is
>ID of an instrument whose assetID is 3.
>
>InstrumentsStock 
>=======================
>stockID | AssetID | BaseCurrencyID | 10 more columns here
>========================================
>
>where StockID = InstrumentID of Instrument with AssetID 1
>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
>
>StockID is primary key
>
>
>E.g. I have below data in tables:
>Instruments: 
>======================================
>InstrumentID | assetsID | Symbol | Name | Comment
>======================================
>22 | 1 | WMT | Ins1| createdByRashmi
>23 | 1 | HOG | Ins2| createdByRashmi
>.... 
>.... 
>.... 
>40 | 3 | HPQ | Ins3| createdByRashmi
>41 | 3 | KO | Ins4| createdByRashmi
>
>InstrumentsStock 
>==========================================
>StockID | assetsID | BasecurrencyID | x | y | z
>======================================
>22 | 1 | 41 | stock1 | y1 | z1
>23 | 1 | 40 | stock2 | y2 | z2
>24 | 1 | 41 | stock3 | y3 | z3
>25 | 1 | 40 | stock4 | y4 | z4
>... 
>... 
>
>
>Relationship between Instruments and InstrumentsStock is one to one.
>
>Question: retrieve all Stocks where baseCurrencyID is 40.
>
>
>Change I made: Combined all three tables, Instruments, Assets,
>InstrumentsStock and created one table "INstruments_Asset_Stocks" with
>three
>families (AssetInfo, InstrumentInfo, StockINfo)
>
>
>
>INstruments_Asset_Stocks
>==============================================================
>rowKey : | | | 
>InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo
>===============================================================
>| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name
>===============================================================
>22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1
>23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2
>24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3
>25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4
>40 | 3 | assetTwo | | HPQ, Ins5
>41 | 3 | assetTwo | | KO, Ins6
>
>
>
>On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all
>Stock relation detail from this table. On basis of AssetInfo family,
>assets
>related stocks, on basis of IntrumentInfo, related stock we can
>retrieve... 
>
>Is my understanding correct, or I am so far very wrong.
>
>
>Here SELF-JOIn was used in relational schema... how do we achieve same
>JOIn
>in hbase, as above or some other way?
>
>
>What is nested entities and how do we create nested entities using "Hbase
>Shell" and using hbase java API. There is API to create family, column...
>is
>there API to create Nested Entities?
>
>
>There is one more table:
>
>Same StockID and localCurrencyID are populated using InstrumentID only.
>Data
>of this table is like:
>Listing 
>==============================
>ID | stockID | LocalCurrencyID | location
>==============================
>L1| 22 | 51 | India
>L1| 22 | 52 | Japan
>L1| 22 | 40 | London
>L1| 23 | 57 | USA 
>L1| 23 | 41 | Africa
> 
>Here again self-join is used and many-many relationship...
>
>How do I integrate this table with above INstruments_Asset_Stocks table? I
>want to get All stocks details for given locaCurrency.
>
>
>What I did: 
>INstruments_Asset_Stocks_Listing
>===============================================
>rowKey - instrumentID_ListingID
>===============================================
>Cf1: AssetINnfo (assetID, assetName)
>Cf2: InstrumentInfo(symbol, name, comment)
>cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID,
>nested_enties_for_LocalCurrency
>) 
>
>Nested entities for local currency would be having localCurrencyID as key
>and location_listingID as value.
>
>Is this change correct to get all Stocks for given localCurrency. As read
>somewhere, I have to use map\reduce to get this data from nested entities?
>is it? 
>
>How do we implement nested entities? How(from which API or shell command)
>client scans, searched these entities? Can I search it like I do for
>column
>qualifier BaseCurrencyID?
>
>
>Your inputs would really help me in understanding this Hbase design.
>
>
>Regards, 
>Rashmi
>
>
>grashmi13 wrote:
>> 
>> Hi,
>> 
>> I want to change my RDBMS to HBASE schema, to be used with Hadoop
>> platform.
>> 
>> I have changed two RDBMS tables into HBASE tables. I have ignored
>> constraints, indexes and foreign key relationship. Because I dont know
>>how
>> to convert these relationships in Hbase schema.
>> 
>> Please confirm if the change I have made is correct?
>> 
>> Relational Schema of tables are:
>> ==========================
>> 
>> TABLE : ASSTES
>> ------------------------------------------------------------------
>> 
>> CREATE TABLE [dbo].[Assets](
>> [AssetId] [int] NOT NULL,
>> [AssetName] [varchar](50) NOT NULL,
>> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED
>> (
>> [AssetId] ASC
>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>> ) ON [PRIMARY]
>> 
>>-------------------------------------------------------------------------
>>------------------------------------------------------
>> 
>> 
>> --------------------------------
>> TABLE: Instruments
>> ----------------------------------
>> CREATE TABLE [dbo].[Instruments](
>> [InstrumentId] [int] IDENTITY(1,1) NOT NULL,
>> [AssetId] [int] NOT NULL,
>> [Symbol] [varchar](50) NOT NULL,
>> [Name] [varchar](250) NOT NULL,
>> [Created] [datetime] NOT NULL,
>> [Modified] [datetime] NULL,
>> [Comments] [varchar](250) NULL,
>> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED
>> (
>> [InstrumentId] ASC
>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
>> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED
>> (
>> [InstrumentId] ASC,
>> [AssetId] ASC
>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>> ) ON [PRIMARY]
>> 
>> GO
>> 
>> SET ANSI_PADDING ON
>> GO
>> 
>> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT
>> [FK_Instruments_Assets] FOREIGN KEY([AssetId])
>> REFERENCES [dbo].[Assets] ([AssetId])
>> ON UPDATE CASCADE
>> GO
>> 
>> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT [FK_Instruments_Assets]
>> GO
>> 
>> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT
>>[DF_Instruments_Created_1]
>> DEFAULT (getdate()) FOR [Created]
>> GO
>> 
>> ====================================
>> 
>> 
>> 
>> 
>> 
>> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables:
>> ====================================================================
>> 
>> Assets table
>> ===============
>> RowKey       -- AssetID
>> ColumnFamilies       (AssetName      ) ---- ColumnName - (Name)
>> 
>> 
>> 
>> Instruments table
>> ==============
>> RowKey       - InstrumentID
>> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created, Modified,
>> Comments
>> (Assets) --- Columns: AssetID
>> 
>> 
>> Please confirm if given conversion is proper?
>> 
>> Also, how do i convert constraints and indexes and foreign key
>> relationship?
>> 
>> 
>> 
>> 
>> Thanks in advance
>> 
>> Regards,
>> rashmi
>> 
>
>-- 
>View this message in context:
>http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958.
>html
>Sent from the HBase User mailing list archive at Nabble.com.
>
>


Reply via email to