Hi there- I commend your enthusiasm for the Hbase project. For the "ground rules of Hbase" you probably want to read this closelyŠ
http://hbase.apache.org/book.html#datamodel Š as it covers things like having one PK per table, no secondary indexes, etc. With a solid understanding of these rules the table relationship and search questions you're asking will become clearer. On 6/28/12 3:01 PM, "grashmi13" <[email protected]> wrote: > >Assets table is having numeric sequential ID and a one number out of >(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say >10 >rows only. > > >hmmm.. after some more surfing, i came to know that we have to manually >denormalize a relational DB. there are no preset rules for >denormalization. > >today I was doing a denormalization.. can you please confirm if rdbms to >hbase change i have done is correct or not: >These tables are from stock trading application: > >1. I have two tables... >Assets >========= >AssetID | assetName > > >second table is: > >INSTRUMENTS >============ >InstrumentID | assetID | Symbol | name | created | modifiedAt > > >While converting them into Hbase two questions were there in my mind: >1. Will there be two tables or one? If we have a web GUI and have two >screens, one for assets management and one for instuments management. In >instrument management, there would be a combo having assets. An asset can >be >assigned to multiple instruments. > >So if user will first populate all assets thru assets screen. So there >should be one master table for Assets. because it is not dependent on any >other entity. Am I right here? > >And now user will populate instruments thru instrument management screen. >Here another table would come into picture. "INSTRUMENTS". this table >would >be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID >properties, all assets information should be duplicated here too. Now >question here, if I delete or update some asset in future, and there are >corresponding records in INSTRUMENTS table. what if asset is deletion or >updation goes successful for assets table but fails for instruments table? >How we can maintain consistancy and integrity here? > > >HBASE table conversion here: > >ASSETS >================== >row key - AssetID >================ >Columns: CF1 - AssetName >================================================ > >INSTRUMENTS >===================================== >row key-- instrumentID(a sequential numeric value) >==================================== >InstrumentInfo(family) - >Symbol >name >AssetsInfo(family2) - >AssetID >AssetName >===================================== > > >Please confirm, two tables would be required or one only? If one only, how >assets population and after that instruments population would be possible? >If two, how we can maintain integrity and consistancy during assets >deletion\updation? > >Please confirm if my approach is correct? > >Yes, as read everywhere, in hbase, we must think about purpose of data in >our application before creating tables. So purpose here is, to >populate\delete\update assets and instruments thru web UI and then run a >job >which extract instruments by asset name. Also, assets by instrument name. > >====================================== > >After above conversion, I got stuck on one more point. "SELF-JOIN" > >Lets say, I have a table InstrumentsStock having 10 columns in it, out of >it >one primary key stockID and other normal column BaseCurrencyID, both >points >to an instrumentID. > >StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID >is >ID of an instrument whose assetID is 3. > >InstrumentsStock >======================= >stockID | AssetID | BaseCurrencyID | 10 more columns here >======================================== > >where StockID = InstrumentID of Instrument with AssetID 1 >and BaseCurrencyID = InstrumentID of Instrument with AssetID 3 > >StockID is primary key > > >E.g. I have below data in tables: >Instruments: >====================================== >InstrumentID | assetsID | Symbol | Name | Comment >====================================== >22 | 1 | WMT | Ins1| createdByRashmi >23 | 1 | HOG | Ins2| createdByRashmi >.... >.... >.... >40 | 3 | HPQ | Ins3| createdByRashmi >41 | 3 | KO | Ins4| createdByRashmi > >InstrumentsStock >========================================== >StockID | assetsID | BasecurrencyID | x | y | z >====================================== >22 | 1 | 41 | stock1 | y1 | z1 >23 | 1 | 40 | stock2 | y2 | z2 >24 | 1 | 41 | stock3 | y3 | z3 >25 | 1 | 40 | stock4 | y4 | z4 >... >... > > >Relationship between Instruments and InstrumentsStock is one to one. > >Question: retrieve all Stocks where baseCurrencyID is 40. > > >Change I made: Combined all three tables, Instruments, Assets, >InstrumentsStock and created one table "INstruments_Asset_Stocks" with >three >families (AssetInfo, InstrumentInfo, StockINfo) > > > >INstruments_Asset_Stocks >============================================================== >rowKey : | | | >InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo >=============================================================== >| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name >=============================================================== >22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1 >23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2 >24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3 >25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4 >40 | 3 | assetTwo | | HPQ, Ins5 >41 | 3 | assetTwo | | KO, Ins6 > > > >On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all >Stock relation detail from this table. On basis of AssetInfo family, >assets >related stocks, on basis of IntrumentInfo, related stock we can >retrieve... > >Is my understanding correct, or I am so far very wrong. > > >Here SELF-JOIn was used in relational schema... how do we achieve same >JOIn >in hbase, as above or some other way? > > >What is nested entities and how do we create nested entities using "Hbase >Shell" and using hbase java API. There is API to create family, column... >is >there API to create Nested Entities? > > >There is one more table: > >Same StockID and localCurrencyID are populated using InstrumentID only. >Data >of this table is like: >Listing >============================== >ID | stockID | LocalCurrencyID | location >============================== >L1| 22 | 51 | India >L1| 22 | 52 | Japan >L1| 22 | 40 | London >L1| 23 | 57 | USA >L1| 23 | 41 | Africa > >Here again self-join is used and many-many relationship... > >How do I integrate this table with above INstruments_Asset_Stocks table? I >want to get All stocks details for given locaCurrency. > > >What I did: >INstruments_Asset_Stocks_Listing >=============================================== >rowKey - instrumentID_ListingID >=============================================== >Cf1: AssetINnfo (assetID, assetName) >Cf2: InstrumentInfo(symbol, name, comment) >cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID, >nested_enties_for_LocalCurrency >) > >Nested entities for local currency would be having localCurrencyID as key >and location_listingID as value. > >Is this change correct to get all Stocks for given localCurrency. As read >somewhere, I have to use map\reduce to get this data from nested entities? >is it? > >How do we implement nested entities? How(from which API or shell command) >client scans, searched these entities? Can I search it like I do for >column >qualifier BaseCurrencyID? > > >Your inputs would really help me in understanding this Hbase design. > > >Regards, >Rashmi > > >grashmi13 wrote: >> >> Hi, >> >> I want to change my RDBMS to HBASE schema, to be used with Hadoop >> platform. >> >> I have changed two RDBMS tables into HBASE tables. I have ignored >> constraints, indexes and foreign key relationship. Because I dont know >>how >> to convert these relationships in Hbase schema. >> >> Please confirm if the change I have made is correct? >> >> Relational Schema of tables are: >> ========================== >> >> TABLE : ASSTES >> ------------------------------------------------------------------ >> >> CREATE TABLE [dbo].[Assets]( >> [AssetId] [int] NOT NULL, >> [AssetName] [varchar](50) NOT NULL, >> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED >> ( >> [AssetId] ASC >> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = >> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] >> ) ON [PRIMARY] >> >>------------------------------------------------------------------------- >>------------------------------------------------------ >> >> >> -------------------------------- >> TABLE: Instruments >> ---------------------------------- >> CREATE TABLE [dbo].[Instruments]( >> [InstrumentId] [int] IDENTITY(1,1) NOT NULL, >> [AssetId] [int] NOT NULL, >> [Symbol] [varchar](50) NOT NULL, >> [Name] [varchar](250) NOT NULL, >> [Created] [datetime] NOT NULL, >> [Modified] [datetime] NULL, >> [Comments] [varchar](250) NULL, >> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED >> ( >> [InstrumentId] ASC >> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = >> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], >> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED >> ( >> [InstrumentId] ASC, >> [AssetId] ASC >> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = >> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] >> ) ON [PRIMARY] >> >> GO >> >> SET ANSI_PADDING ON >> GO >> >> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT >> [FK_Instruments_Assets] FOREIGN KEY([AssetId]) >> REFERENCES [dbo].[Assets] ([AssetId]) >> ON UPDATE CASCADE >> GO >> >> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT [FK_Instruments_Assets] >> GO >> >> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT >>[DF_Instruments_Created_1] >> DEFAULT (getdate()) FOR [Created] >> GO >> >> ==================================== >> >> >> >> >> >> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables: >> ==================================================================== >> >> Assets table >> =============== >> RowKey -- AssetID >> ColumnFamilies (AssetName ) ---- ColumnName - (Name) >> >> >> >> Instruments table >> ============== >> RowKey - InstrumentID >> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created, Modified, >> Comments >> (Assets) --- Columns: AssetID >> >> >> Please confirm if given conversion is proper? >> >> Also, how do i convert constraints and indexes and foreign key >> relationship? >> >> >> >> >> Thanks in advance >> >> Regards, >> rashmi >> > >-- >View this message in context: >http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958. >html >Sent from the HBase User mailing list archive at Nabble.com. > >
