Hi Rashmi, I know you were in a hurry so the following may not meet your time constraints, but I found the following video to be helpful when considering HBase Schema design http://www.cloudera.com/resource/video-hbasecon-2012-hbasecon-2012/
Best, Adam Smieszny On Thu, Jun 28, 2012 at 8:25 PM, grashmi13 <[email protected]>wrote: > > Thanks Doug Meil for your valuable comments. > > Actually I need to provide some output to my manager by today.. so i asked > help from experts point of view. > > I will go thru with suggested maerial. But right now I dont have much time. > > Also, this is my perception towards hbase desgin by so far surfing and > reading blogs and other material. > > As written by you, tables must have one PK.... I have also taken one PK > only. Also, I havent used any secondary index here. > > I need your inputs if my change is correct or not. if not, what should be > hbase schema as per your perspective and expertise for above given tables? > > I used nested entities in last hbase table. if that single table design for > all the RDBMS tables, is correct or not? if not, what should be it as per > your perception? How do we program nested entities using hbase API? > > > design suggested by you, would help in into further lEARNING ANd further > designing. > > Thanks again, looking forward for your valuable inputs. > > Regards, > Rashmi > > Doug Meil-2 wrote: > > > > > > Hi there- > > > > I commend your enthusiasm for the Hbase project. For the "ground rules > of > > Hbase" you probably want to read this closelyŠ > > > > http://hbase.apache.org/book.html#datamodel > > > > Š as it covers things like having one PK per table, no secondary indexes, > > etc. With a solid understanding of these rules the table relationship > and > > search questions you're asking will become clearer. > > > > > > > > > > > > > > On 6/28/12 3:01 PM, "grashmi13" <[email protected]> wrote: > > > >> > >>Assets table is having numeric sequential ID and a one number out of > >>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say > >>10 > >>rows only. > >> > >> > >>hmmm.. after some more surfing, i came to know that we have to manually > >>denormalize a relational DB. there are no preset rules for > >>denormalization. > >> > >>today I was doing a denormalization.. can you please confirm if rdbms to > >>hbase change i have done is correct or not: > >>These tables are from stock trading application: > >> > >>1. I have two tables... > >>Assets > >>========= > >>AssetID | assetName > >> > >> > >>second table is: > >> > >>INSTRUMENTS > >>============ > >>InstrumentID | assetID | Symbol | name | created | modifiedAt > >> > >> > >>While converting them into Hbase two questions were there in my mind: > >>1. Will there be two tables or one? If we have a web GUI and have two > >>screens, one for assets management and one for instuments management. In > >>instrument management, there would be a combo having assets. An asset can > >>be > >>assigned to multiple instruments. > >> > >>So if user will first populate all assets thru assets screen. So there > >>should be one master table for Assets. because it is not dependent on any > >>other entity. Am I right here? > >> > >>And now user will populate instruments thru instrument management screen. > >>Here another table would come into picture. "INSTRUMENTS". this table > >>would > >>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID > >>properties, all assets information should be duplicated here too. Now > >>question here, if I delete or update some asset in future, and there are > >>corresponding records in INSTRUMENTS table. what if asset is deletion or > >>updation goes successful for assets table but fails for instruments > table? > >>How we can maintain consistancy and integrity here? > >> > >> > >>HBASE table conversion here: > >> > >>ASSETS > >>================== > >>row key - AssetID > >>================ > >>Columns: CF1 - AssetName > >>================================================ > >> > >>INSTRUMENTS > >>===================================== > >>row key-- instrumentID(a sequential numeric value) > >>==================================== > >>InstrumentInfo(family) - > >>Symbol > >>name > >>AssetsInfo(family2) - > >>AssetID > >>AssetName > >>===================================== > >> > >> > >>Please confirm, two tables would be required or one only? If one only, > how > >>assets population and after that instruments population would be > possible? > >>If two, how we can maintain integrity and consistancy during assets > >>deletion\updation? > >> > >>Please confirm if my approach is correct? > >> > >>Yes, as read everywhere, in hbase, we must think about purpose of data in > >>our application before creating tables. So purpose here is, to > >>populate\delete\update assets and instruments thru web UI and then run a > >>job > >>which extract instruments by asset name. Also, assets by instrument name. > >> > >>====================================== > >> > >>After above conversion, I got stuck on one more point. "SELF-JOIN" > >> > >>Lets say, I have a table InstrumentsStock having 10 columns in it, out of > >>it > >>one primary key stockID and other normal column BaseCurrencyID, both > >>points > >>to an instrumentID. > >> > >>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID > >>is > >>ID of an instrument whose assetID is 3. > >> > >>InstrumentsStock > >>======================= > >>stockID | AssetID | BaseCurrencyID | 10 more columns here > >>======================================== > >> > >>where StockID = InstrumentID of Instrument with AssetID 1 > >>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3 > >> > >>StockID is primary key > >> > >> > >>E.g. I have below data in tables: > >>Instruments: > >>====================================== > >>InstrumentID | assetsID | Symbol | Name | Comment > >>====================================== > >>22 | 1 | WMT | Ins1| createdByRashmi > >>23 | 1 | HOG | Ins2| createdByRashmi > >>.... > >>.... > >>.... > >>40 | 3 | HPQ | Ins3| createdByRashmi > >>41 | 3 | KO | Ins4| createdByRashmi > >> > >>InstrumentsStock > >>========================================== > >>StockID | assetsID | BasecurrencyID | x | y | z > >>====================================== > >>22 | 1 | 41 | stock1 | y1 | z1 > >>23 | 1 | 40 | stock2 | y2 | z2 > >>24 | 1 | 41 | stock3 | y3 | z3 > >>25 | 1 | 40 | stock4 | y4 | z4 > >>... > >>... > >> > >> > >>Relationship between Instruments and InstrumentsStock is one to one. > >> > >>Question: retrieve all Stocks where baseCurrencyID is 40. > >> > >> > >>Change I made: Combined all three tables, Instruments, Assets, > >>InstrumentsStock and created one table "INstruments_Asset_Stocks" with > >>three > >>families (AssetInfo, InstrumentInfo, StockINfo) > >> > >> > >> > >>INstruments_Asset_Stocks > >>============================================================== > >>rowKey : | | | > >>InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo > >>=============================================================== > >>| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name > >>=============================================================== > >>22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1 > >>23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2 > >>24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3 > >>25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4 > >>40 | 3 | assetTwo | | HPQ, Ins5 > >>41 | 3 | assetTwo | | KO, Ins6 > >> > >> > >> > >>On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all > >>Stock relation detail from this table. On basis of AssetInfo family, > >>assets > >>related stocks, on basis of IntrumentInfo, related stock we can > >>retrieve... > >> > >>Is my understanding correct, or I am so far very wrong. > >> > >> > >>Here SELF-JOIn was used in relational schema... how do we achieve same > >>JOIn > >>in hbase, as above or some other way? > >> > >> > >>What is nested entities and how do we create nested entities using "Hbase > >>Shell" and using hbase java API. There is API to create family, column... > >>is > >>there API to create Nested Entities? > >> > >> > >>There is one more table: > >> > >>Same StockID and localCurrencyID are populated using InstrumentID only. > >>Data > >>of this table is like: > >>Listing > >>============================== > >>ID | stockID | LocalCurrencyID | location > >>============================== > >>L1| 22 | 51 | India > >>L1| 22 | 52 | Japan > >>L1| 22 | 40 | London > >>L1| 23 | 57 | USA > >>L1| 23 | 41 | Africa > >> > >>Here again self-join is used and many-many relationship... > >> > >>How do I integrate this table with above INstruments_Asset_Stocks table? > I > >>want to get All stocks details for given locaCurrency. > >> > >> > >>What I did: > >>INstruments_Asset_Stocks_Listing > >>=============================================== > >>rowKey - instrumentID_ListingID > >>=============================================== > >>Cf1: AssetINnfo (assetID, assetName) > >>Cf2: InstrumentInfo(symbol, name, comment) > >>cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID, > >>nested_enties_for_LocalCurrency > >>) > >> > >>Nested entities for local currency would be having localCurrencyID as key > >>and location_listingID as value. > >> > >>Is this change correct to get all Stocks for given localCurrency. As read > >>somewhere, I have to use map\reduce to get this data from nested > entities? > >>is it? > >> > >>How do we implement nested entities? How(from which API or shell command) > >>client scans, searched these entities? Can I search it like I do for > >>column > >>qualifier BaseCurrencyID? > >> > >> > >>Your inputs would really help me in understanding this Hbase design. > >> > >> > >>Regards, > >>Rashmi > >> > >> > >>grashmi13 wrote: > >>> > >>> Hi, > >>> > >>> I want to change my RDBMS to HBASE schema, to be used with Hadoop > >>> platform. > >>> > >>> I have changed two RDBMS tables into HBASE tables. I have ignored > >>> constraints, indexes and foreign key relationship. Because I dont know > >>>how > >>> to convert these relationships in Hbase schema. > >>> > >>> Please confirm if the change I have made is correct? > >>> > >>> Relational Schema of tables are: > >>> ========================== > >>> > >>> TABLE : ASSTES > >>> ------------------------------------------------------------------ > >>> > >>> CREATE TABLE [dbo].[Assets]( > >>> [AssetId] [int] NOT NULL, > >>> [AssetName] [varchar](50) NOT NULL, > >>> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED > >>> ( > >>> [AssetId] ASC > >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = > >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] > >>> ) ON [PRIMARY] > >>> > > >>>------------------------------------------------------------------------- > >>>------------------------------------------------------ > >>> > >>> > >>> -------------------------------- > >>> TABLE: Instruments > >>> ---------------------------------- > >>> CREATE TABLE [dbo].[Instruments]( > >>> [InstrumentId] [int] IDENTITY(1,1) NOT NULL, > >>> [AssetId] [int] NOT NULL, > >>> [Symbol] [varchar](50) NOT NULL, > >>> [Name] [varchar](250) NOT NULL, > >>> [Created] [datetime] NOT NULL, > >>> [Modified] [datetime] NULL, > >>> [Comments] [varchar](250) NULL, > >>> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED > >>> ( > >>> [InstrumentId] ASC > >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = > >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], > >>> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED > >>> ( > >>> [InstrumentId] ASC, > >>> [AssetId] ASC > >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = > >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] > >>> ) ON [PRIMARY] > >>> > >>> GO > >>> > >>> SET ANSI_PADDING ON > >>> GO > >>> > >>> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT > >>> [FK_Instruments_Assets] FOREIGN KEY([AssetId]) > >>> REFERENCES [dbo].[Assets] ([AssetId]) > >>> ON UPDATE CASCADE > >>> GO > >>> > >>> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT > [FK_Instruments_Assets] > >>> GO > >>> > >>> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT > >>>[DF_Instruments_Created_1] > >>> DEFAULT (getdate()) FOR [Created] > >>> GO > >>> > >>> ==================================== > >>> > >>> > >>> > >>> > >>> > >>> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables: > >>> ==================================================================== > >>> > >>> Assets table > >>> =============== > >>> RowKey -- AssetID > >>> ColumnFamilies (AssetName ) ---- ColumnName - (Name) > >>> > >>> > >>> > >>> Instruments table > >>> ============== > >>> RowKey - InstrumentID > >>> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created, > Modified, > >>> Comments > >>> (Assets) --- Columns: AssetID > >>> > >>> > >>> Please confirm if given conversion is proper? > >>> > >>> Also, how do i convert constraints and indexes and foreign key > >>> relationship? > >>> > >>> > >>> > >>> > >>> Thanks in advance > >>> > >>> Regards, > >>> rashmi > >>> > >> > >>-- > >>View this message in context: > >> > http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958. > >>html > >>Sent from the HBase User mailing list archive at Nabble.com. > >> > >> > > > > > > > > > > -- > View this message in context: > http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34089135.html > Sent from the HBase User mailing list archive at Nabble.com. > > -- Adam Smieszny Cloudera | Systems Engineer | http://tiny.cloudera.com/about 917.830.4156 | http://www.linkedin.com/in/adamsmieszny
