Thanks Sean. We have some internal requirements that lead us to most likely need to stick with native HBase API's. But the suggestion is still appreciated - I was not aware of that project.
2014-09-10 12:09 GMT-07:00 Sean Busbey <[email protected]>: > Hi Stephen! > > Have you taken a look at Apache Gora? It uses Avro for its data model, > which supports nested data structures, and can store in a variety of > backing stores, including HBase. > > -Sean > > On Tue, Sep 9, 2014 at 4:20 PM, Stephen Boesch <[email protected]> wrote: > > > Thanks Michael, yes cells are byte[]; therefore, storing JSON or other > > document structures is always possible. Our use cases include querying > > individual elements in the structure - so that would require > reconstituting > > the documents and then parsing them for every row. We probably are not > > headed in the direction of HBase for those use cases: but we are trying > to > > make that determination after having carefully considered the extent of > the > > mismatch. > > > > 2014-09-09 13:37 GMT-07:00 Michael Segel <[email protected]>: > > > > > You do realize that everything you store in Hbase are byte arrays, > right? > > > That is each cell is a blob. > > > > > > So you have the ability to create nested structures like… JSON records? > > ;-) > > > > > > So to your point. You can have a column A which represents a set of > > values. > > > > > > This is one reason why you shouldn’t think of HBase in terms of being > > > relational. In fact for Hadoop, you really don’t want to think in terms > > of > > > relational structures. > > > Think more of Hierarchical. > > > > > > So yes, you can do what you want to do… > > > > > > HTH > > > > > > -Mike > > > > > > On Sep 8, 2014, at 10:06 PM, Stephen Boesch <[email protected]> wrote: > > > > > > > While I am aware that HBase does not have native support for nested > > > > structures, surely there are some of you that have thought through > this > > > use > > > > case carefully. > > > > > > > > Our particular use case is likely having single digit nested layers > > with > > > > tens to hundreds of items in the lists at each level. > > > > > > > > An example would be a > > > > > > > > top Level 300 items > > > > middle level : 1 to 100 items ("1 value" may indicate a single > value > > > as > > > > opposed to a list) > > > > third level: 1 to 50 items > > > > fourth level 1 to 20 items > > > > > > > > The column names are likely known ahead of time- which may or may not > > > > matter for hbase. We could model the above structure in a Parquet > File > > > or > > > > in Hive (with nested struct's)- but we would like to consider whether > > > > HBase.might also be an option. > > > > > > > > > > > > -- > Sean >
