Re: How to Manage Data Architecture & Modeling for HBase

Michael Segel Mon, 06 Apr 2015 09:55:31 -0700

So this is the hardest thing to do… teach someone not to look at the data in 
terms of an RDBMs model.

And there aren’t any hard and fast rules… 

Lets look at an example. 

You’re creating an application for Medicare/Medicaid to help identify potential 
abuses and fraud within the system. 

In part of your application, you’re going to store all relevant patient 
information and billing/claim records. 

Within your patient claim data, you have a procedure code. 

In a traditional RDBMS DW, you’d have a fact table and you’d have a 
relationship between the code, its description, and whatever other data, and 
then link to it within your patient record. 

But in HBase, your claim record would have all of this information with no 
reference to the lookup table. 

You would still want the lookup table for your application so that you could 
load it in to memory when you’re writing or processing records, yet you’re 
storing the relevant fact data in to the record.  But the lookup table isn’t 
associated with your base claim data.  (When the claim comes in… you may get 
the diagnostic code, but during the ingestion process, you’d want to add in the 
relevant information surrounding the diagnostic code. This could be anything 
from a description, or the entire record. 

In theory, HBase should not be normalized.  The idea is that when I pull a 
record from the base table, most if not all of the data should be present. 
This is why a hierarchical model is a better fit. 

In terms of a DW, you don’t have a star schema.  In fact, you really shouldn’t 
have much of a schema outside of a box. or a simple schema with a box and 
children representing the column families. 

The best example that I can give is looking at the BBC’s Sherlock Homes serial. 
 In one episode, the villain created a mental image of a library with a bunch 
of record cards in his mind and this is how he accessed information that he 
could use to blackmail people. 

So think of a medical records filing cabinet. When you go to see the doctor, he 
pulls out your folder and it contains everything that he has on you and your 
medical history. Its all there in one record. He pulls out the folder and your 
medical history is in reverse chronological order. Each patient visit, lab 
result, etc … 

You have to remember that in HBase, you don’t want to join tables to get a 
result. Too slow and too cumbersome.  Remember its a distributed database. 

This is why you have to look at things from the 80’s like Revelation (Dick 
Pick’s OS/Database) , Universe / U2 (Ascential/Informix/IBM)  and other 
systems. 

HTH

-Mike

> On Apr 6, 2015, at 8:34 AM, Ben Liang <[email protected]> wrote:
> 
> Thank you for your prompt reply.
> 
> In my daily work, I mainly used Oracle DB to build a data warehouse with star 
> topology data modeling, about financial analysis and marketing analysis.
> Now I trying to use Hbase to do it. 
> 
> I has a question,
> 1) many tables from ERP should be Incremental loading every day , Including 
> some insert and some update,  this scenario is appropriate to use  hbase to 
> build data worehose？
> 2) Is there some case about Enterprise BI Solutions with HBASE? 
> 
> thanks.
> 
> 
> Regards,
> Ben Liang
> 
>> On Apr 6, 2015, at 20:27, Michael Segel <[email protected]> wrote:
>> 
>> Yeah. Jean-Marc is right. 
>> 
>> You have to think more in terms of a hierarchical model where you’re 
>> modeling records not relationships. 
>> 
>> Your model would look like a single ER box per record type. 
>> 
>> The HBase schema is very simple.  Tables, column families and that’s it for 
>> static structures.  Even then, column families tend to get misused. 
>> 
>> If you’re looking at a relational model… Phoenix or Splice Machines would 
>> allow you to do something… although Phoenix is still VERY primitive. 
>> (Do they take advantage of cell versioning like spice machines yet? ) 
>> 
>> 
>> There are a couple of interesting things where you could create your own 
>> modeling tool / syntax (relationships)… 
>> 
>> 1) HBase is more 3D than RDBMS 2D and similar to ORDBMSs. 
>> 2) You can join entities on either a FK principle or on a weaker 
>> relationship type. 
>> 
>> HBase stores CLOBS/BLOBs in each cell. Its all just byte arrays with a 
>> finite bounded length not to exceed the size of a region. So you could store 
>> an entire record as a CLOB within a cell.  Its in this sense that a cell can 
>> represent multiple attributes of your object/record that you gain an 
>> additional dimension and why you only need to use a single data type. 
>> 
>> HBase and Hadoop in general allow one to join orthogonal data sets that have 
>> a weak relationship.  So while you can still join sets against a FK which 
>> implies a relationship, you don’t have to do it. 
>> 
>> Imagine if you wanted to find out the average cost of a front end collision 
>> by car of college aged drivers by major. 
>> You would be joining insurance records against registrations for all of the 
>> universities in the US for those students between the ages of 17 and 25. 
>> 
>> How would you model this when in fact neither defining attribute is a FK? 
>> (This is why you need a good Secondary Indexing implementation and not 
>> something brain dead that wasn’t alcohol induced. ;-) 
>> 
>> Does that make sense? 
>> 
>> Note: I don’t know if anyone like CCCis, Allstate, State Farm, or 
>> Progressive Insurance are doing anything like this. But they could.
>> 
>>> On Apr 5, 2015, at 7:54 PM, Jean-Marc Spaggiari <[email protected]> 
>>> wrote:
>>> 
>>> Not sure you want to ever do that... Designing an HBase application is far
>>> different from designing an RDBMS one. Not sure those tools fit well here.
>>> 
>>> What's you're goal? Designing your HBase schema somewhere and then let the
>>> tool generate your HBase tables?
>>> 
>>> 2015-04-05 18:26 GMT-04:00 Ben Liang <[email protected]>:
>>> 
>>>> Hi all,
>>>>      Do you have any tools to manage Data Architecture & Modeling for
>>>> HBase( or Phoenix) ?  Can we  use Powerdesinger or ERWin to do it?
>>>> 
>>>>      Please give me some advice.
>>>> 
>>>> Regards,
>>>> Ben Liang
>>>> 
>>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive 
>> thought, that is purely accidental. 
>> Use at your own risk. 
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to Manage Data Architecture & Modeling for HBase

Reply via email to