structured data knowledge store in HBase

Phillip Nelson Wed, 03 Nov 2010 22:17:18 -0700

Hi Guys,

Thank you guys ahead of time for reading through this and for any feedback you 
guys can give. I'm relatively new to HBase but I'm really enjoying working with 
it.


I'm working on a project to store a large amount of simple structured data into 
HBase. Basically, it's a subset of owl+rdf: each object has a set of types, and 
then a set of predicate(property) => value mappings. 

My first design is this:  public_objects with two column families: t: for types 
and p: for properties. 

in the current set-up, here's an example of a wikipedia object: (with some 
formatting)

hbase(main):005:0> get 'public_objects', 'http://dbpedia.org/resource/%21Hero'
COLUMN                                                                  CELL    
                                                                         
 p:http://dbpedia.org/ontology/basedOn                  
value=o:http://dbpedia.org/resource/Bible                           
 p:http://dbpedia.org/ontology/musicBy                  
value=['o:http://dbpedia.org/resource/Eddie_DeGarmo', 
'o:http://dbpedia.org/resource/Farrell_and_Farrell']                            
  
 p:http://xmlns.com/foaf/0.1/name                               value=l:!Hero   
                                                                    
 t:o:http://dbpedia.org/onto logy/Musical                       value=1         
                                                                        
 t:o:http://dbpedia.org/ontology/Work                   value=1                 
                                
          
so this is great because I can quickly scan for all musicals (scan 
'public_objects' {COLUMNS=> 't:o:http://dbpedia.org/onto logy/Musical'}). But 
it's definitely not good enough. So there are a few questions: 

1. when I have many-to-one relationships, I serialize the python list and slap 
it into the value. i don't think this will be too expensive to match, 
especially if i don't have to deserialize, in my mappers... but is there a 
better way to do this type of relationship? I also need to differentiate 
between objects, classes, and literals. (hence the hack-ish namespacing of 
uris).

2. Ideally i want to be able to do scans for types AND properties, and feed the 
values into my M/R process...  is there a good way to do this? I was thinking 
of concatenating the type and the property into the p: coumn value (ie 
p:http://dbpedia.org/ontology/Work_http://xmlns.com/foaf/0.1/name) but this 
would have to be repeated for each property.

3. Somewhat unrelated to schema design- how do secondary tableindexes fit into 
this? I don't see how this is accessed via the thrift interface.

Thanks again,
Phillip Nelson

structured data knowledge store in HBase

Reply via email to