Deploy filter on per table baiss
Hi, According to the HBAse definitive guide, I need to change to change hbase-env.sh and put my jars in hbase's classpath, then I also need to restart hbase daemon to make my customized filters effective. In the Coprocessor loading section, it also mentioned that coprocessor can be setup and loaded on per table basis. So is it also possible for filter? The main problem is that I don't have HBase admin permissions to do the change. -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Deploy filter on per table baiss
Please take a look at HBASE-1936 Cheers On Mon, Sep 8, 2014 at 11:26 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, According to the HBAse definitive guide, I need to change to change hbase-env.sh and put my jars in hbase's classpath, then I also need to restart hbase daemon to make my customized filters effective. In the Coprocessor loading section, it also mentioned that coprocessor can be setup and loaded on per table basis. So is it also possible for filter? The main problem is that I don't have HBase admin permissions to do the change. -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Upadting a HBase KeyValue using bulk upload
Hi, I have a MapReduce job which creates a StoreFile which I can load using LoadIncrementalFiles in HBase. I am also using the timestamp component of the KeyValue in my mapper to maintain version in an custom manner. But when I am trying to overwrite the same version using the bulk import, it is not working. When I try to perform a git, it returns me to the old version. Also, if I try to update a KeyValue by overwriting the timestamp in the hbase shell, I can see that the value is getting updated. eg. put 't1', 'r1', 'c1', 'value', ts1 Can someone help on why the updates are not reflecting when using bulk import ?
Re: Deploy filter on per table baiss
Thanks Ted! Jianshi On Tue, Sep 9, 2014 at 10:39 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at HBASE-1936 Cheers On Mon, Sep 8, 2014 at 11:26 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, According to the HBAse definitive guide, I need to change to change hbase-env.sh and put my jars in hbase's classpath, then I also need to restart hbase daemon to make my customized filters effective. In the Coprocessor loading section, it also mentioned that coprocessor can be setup and loaded on per table basis. So is it also possible for filter? The main problem is that I don't have HBase admin permissions to do the change. -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Deploy filter on per table baiss
Kudo goes to Jimmy, not me. Cheers On Tue, Sep 9, 2014 at 8:17 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Thanks Ted! Jianshi On Tue, Sep 9, 2014 at 10:39 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at HBASE-1936 Cheers On Mon, Sep 8, 2014 at 11:26 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, According to the HBAse definitive guide, I need to change to change hbase-env.sh and put my jars in hbase's classpath, then I also need to restart hbase daemon to make my customized filters effective. In the Coprocessor loading section, it also mentioned that coprocessor can be setup and loaded on per table basis. So is it also possible for filter? The main problem is that I don't have HBase admin permissions to do the change. -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
HBase custom filter protocol buffers
Hi, I'm making the switch from 0.92.1 to 0.98.1, and I'm in the process of updating all my custom filters to conform to the new HBase Filter API. I have quite a few custom filters, so my question is: Must I create a custom protocol buffer for each of my filters or I can reuse the custom logic that I had in writeFields() and readFields() in toByteArray() and parseFrom(byte[]), respectively? I did post this same question on Cloudera's CDH User Google group, but I figured it was better suited to be asked on the official HBase mailing list. (Sorry for posting in multiple locations.) Thanks, Kevin
Re: HBase custom filter protocol buffers
For each of your filters that carries custom information (limit, range, etc), you need to create corresponding protobuf entity. See hbase-protocol/src/main/protobuf/Filter.proto for examples. Cheers On Tue, Sep 9, 2014 at 12:55 PM, Kevin kevin.macksa...@gmail.com wrote: Hi, I'm making the switch from 0.92.1 to 0.98.1, and I'm in the process of updating all my custom filters to conform to the new HBase Filter API. I have quite a few custom filters, so my question is: Must I create a custom protocol buffer for each of my filters or I can reuse the custom logic that I had in writeFields() and readFields() in toByteArray() and parseFrom(byte[]), respectively? I did post this same question on Cloudera's CDH User Google group, but I figured it was better suited to be asked on the official HBase mailing list. (Sorry for posting in multiple locations.) Thanks, Kevin
Re: Nested data structures examples for HBase
You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. So you have the ability to create nested structures like… JSON records? ;-) So to your point. You can have a column A which represents a set of values. This is one reason why you shouldn’t think of HBase in terms of being relational. In fact for Hadoop, you really don’t want to think in terms of relational structures. Think more of Hierarchical. So yes, you can do what you want to do… HTH -Mike On Sep 8, 2014, at 10:06 PM, Stephen Boesch java...@gmail.com wrote: While I am aware that HBase does not have native support for nested structures, surely there are some of you that have thought through this use case carefully. Our particular use case is likely having single digit nested layers with tens to hundreds of items in the lists at each level. An example would be a top Level 300 items middle level : 1 to 100 items (1 value may indicate a single value as opposed to a list) third level: 1 to 50 items fourth level 1 to 20 items The column names are likely known ahead of time- which may or may not matter for hbase. We could model the above structure in a Parquet File or in Hive (with nested struct's)- but we would like to consider whether HBase.might also be an option.
Re: HBase - Performance issue
So you have large RS and you have large regions. Your regions are huge relative to your RS memory heap. (Not ideal.) You have slow drives (5400rpm) and you have 1GbE network. Do didn’t say how many drives per server. Under load, you will saturate your network with just 4 drives. (Give or take. Never tried 5400 RPM drives) So you hit one bandwidth bottleneck there. The other is the ratio of spindles to CPU. So if you have 4 drives and 8 cores… again under load, you’ll start to see an I/O bottleneck … On average, how many regions do you have per table per server? I’d consider shrinking your regions. Sometimes you need to dial back from 11 do a more reasonable listening level… ;-) HTH -Mike On Sep 8, 2014, at 8:23 AM, kiran kiran.sarvabho...@gmail.com wrote: Hi Lars, Ours is a problem of I/O wait and network bandwidth increase around the same time Lars, Sorry to say this... our's is a production cluster and we ideally should never want a downtime... Also lars, we had very miserable experience while upgrading from 0.92 to 0.94... There was a never a mention of change in split policy in the release notes... and the policy was not ideal for our cluster and it took us atleast a week to figure out that Our cluster runs on commodity hardware with big regions (5-10gb)... Region sever mem is 10gb... 2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1 gig So please suggest us any work around with 0.94.1 On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote: Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU load, but *not* IOWAIT. 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and performance enhancements went in since 0.94.4. You can do a rolling upgrade straight to 0.94.23. With that out of the way, can you post a jstack of the processes that experience high wait times? -- Lars -- *From:* kiran kiran.sarvabho...@gmail.com *To:* user@hbase.apache.org; lars hofhansl la...@apache.org *Sent:* Saturday, September 6, 2014 11:30 AM *Subject:* Re: HBase - Performance issue Lars, We are facing a similar situation on the similar cluster configuration... We are having high I/O wait percentages on some machines in our cluster... We have short circuit reads enabled but still we are facing the similar problem.. the cpu wait goes upto 50% also in some case while issuing scan commands with multiple threads.. Is there a work around other than applying the patch for 0.94.4 ?? Thanks Kiran On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote: You may have run into https://issues.apache.org/jira/browse/HBASE-7336 (which is in 0.94.4) (Although I had not observed this effect as much when short circuit reads are enabled) - Original Message - From: kzurek kzu...@proximetry.pl To: user@hbase.apache.org Cc: Sent: Wednesday, April 24, 2013 3:12 AM Subject: HBase - Performance issue The problem is that when I'm putting my data (multithreaded client, ~30MB/s traffic outgoing) into the cluster the load is equally spread over all RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When I've added similar, mutlithreaded client that Scans for, let say, 100 last samples of randomly generated key from chosen time range, I'm getting high CPU wait time (20% and up) on two (or more if there is higher number of threads, default 10) random RegionServers. Therefore, machines that held those RS are getting very hot - one of the consequences is that number of store file is constantly increasing, up to the maximum limit. Rest of the RS are having 10-12% CPU wait time and everything seems to be OK (number of store files varies so they are being compacted and not increasing over time). Any ideas? Maybe I could prioritize writes over reads somehow? Is it possible? If so what would be the best way to that and where it should be placed - on the client or cluster side)? Cluster specification: HBase Version0.94.2-cdh4.2.0 Hadoop Version2.0.0-cdh4.2.0 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes Other settings: - Bloom filters (ROWCOL) set - Short circuit turned on - HDFS Block Size: 128MB - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB - Java Heap Size of HBase RegionServer in Bytes: 12 GiB - Java Heap Size of HBase Master in Bytes: 4 GiB - Java Heap Size of DataNode in Bytes: 1 GiB (default) Number of regions per RegionServer: 19 (total 114 regions on 6 RS) Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N Table design: 1 column family with 20 columns of 8 bytes Get client: Multiple threads Each thread have its own tables instance with their Scanner. Each thread have its own range of UUIDs and randomly draws beginning of time range to build rowkey properly (see above). Each time Scan requests same
Re: One-table w/ multi-CF or multi-table w/ one-CF?
Locality? Then the data should be in the same column family. That’s as local as you can get. I would suggest that you think of the following: What’s the predominant use case? How are you querying the data. If you’re always hitting multiple CFs to get the data… then you should have it in the same table. I think more people would benefit if they took more time thinking about their design and how the data is being used and stored… it would help. Also knowing that there really isn’t a single ‘right’ answer. Just a lot of wrong ones. ;-) Most people still try to think of HBase in terms of relational modeling and not in terms of records and more of a hierarchial system. Things like CFs and Versioning are often misused because people see them as shortcuts. Also people tend not to think of their data in HBase in terms of 3D but in terms of 2D. (CF’s would be 2+D) The one question which really hasn’t been answered is how fat is fat in terms of a row’s width and when is it too fat? This may seem like a simple thing, but it can impact a couple of things in your design. (I never got a good answer, and its one of those questions that if your wife were to ask if the pants she’s wearing makes her fat, its time to run for the hills because you can’t win no matter how you answer!) Seriously though, the optimal width of the column is not that easy to answer and sometimes you have to just guess as to which would be a better design. One of the problems with CFs is that if there’s an imbalance in terms of the size of data being stored in each CF, you can run in to issues. CFs are stored in separate files and split when the base CF splits. (Assuming you have a base CF and then multiple CFs that are related but store smaller records per row.) And then there’s the issue in terms of each CF is stored separately. (If memory serves its a separate file per CF, but right now my last living brain cell decided to call it quits and went on strike for more beer.) [Damn you last brain cell!!!] :-) Again the idea is to follow KISS. HTH -Mike On Sep 8, 2014, at 7:17 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Locality is important, that why I chose CF to put related data into one group. I can surely put the CF part to the head of rowkey to achieve similar result, but since the number of types is fixed, I don't any benefit doing that. With the setLoadColumnFamiliesOnDemand I learned from Ted, looks like the performance should be similar. Am I missing something? Please enlighten me. Jianshi On Mon, Sep 8, 2014 at 3:41 AM, Michael Segel michael_se...@hotmail.com wrote: I would suggest rethinking column families and look at your potential for a slightly different row key. Going with column families doesn’t really make sense. Also how wide are the rows? (worst case?) one idea is to make type part of the RK… HTH -Mike On Sep 7, 2014, at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi Michael, Thanks for the questions. I'm modeling dynamic Graphs in HBase, all elements (vertices, edges) have a timestamp and I can query things like events between A and B for the last 7 days. CFs are used for grouping different types of data for the same account. However, I have lots of skews in the data, to avoid having too much for the same row, I had to put what was in CQs to now RKs. So CF now acts more like a table. There's one CF containing sequence of events ordered by timestamp, and this CF is quite different as the use case is mostly in mapreduce jobs. Jianshi On Sun, Sep 7, 2014 at 4:52 AM, Michael Segel michael_se...@hotmail.com wrote: Again, a silly question. Why are you using column families? Just to play devil’s advocate in terms of design, why are you not treating your row as a record? Think hierarchal not relational. This really gets in to some design theory. Think Column Family as a way to group data that has the same row key, reference the same thing, yet the data in each column family is used separately. The example I always turn to when teaching, is to think of an order entry system at a retailer. You generate data which is segmented by business process. (order entry, pick slips, shipping, invoicing) All reflect a single order, yet the data in each process tends to be accessed separately. (You don’t need the order entry when using the pick slip to pull orders from the warehouse.) So here, the data access pattern is that each column family is used separately, except in generating the data (the order entry is used to generate the pick slip(s) and set up things like backorders and then the pick process generates the shipping slip(s) etc … And since they are all focused on the same order, they have the same row key. So its reasonable to ask how you are accessing the data and how you are designing your HBase model? Many times, developers create a model using
Re: Nested data structures examples for HBase
Thanks Michael, yes cells are byte[]; therefore, storing JSON or other document structures is always possible. Our use cases include querying individual elements in the structure - so that would require reconstituting the documents and then parsing them for every row. We probably are not headed in the direction of HBase for those use cases: but we are trying to make that determination after having carefully considered the extent of the mismatch. 2014-09-09 13:37 GMT-07:00 Michael Segel michael_se...@hotmail.com: You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. So you have the ability to create nested structures like… JSON records? ;-) So to your point. You can have a column A which represents a set of values. This is one reason why you shouldn’t think of HBase in terms of being relational. In fact for Hadoop, you really don’t want to think in terms of relational structures. Think more of Hierarchical. So yes, you can do what you want to do… HTH -Mike On Sep 8, 2014, at 10:06 PM, Stephen Boesch java...@gmail.com wrote: While I am aware that HBase does not have native support for nested structures, surely there are some of you that have thought through this use case carefully. Our particular use case is likely having single digit nested layers with tens to hundreds of items in the lists at each level. An example would be a top Level 300 items middle level : 1 to 100 items (1 value may indicate a single value as opposed to a list) third level: 1 to 50 items fourth level 1 to 20 items The column names are likely known ahead of time- which may or may not matter for hbase. We could model the above structure in a Parquet File or in Hive (with nested struct's)- but we would like to consider whether HBase.might also be an option.
SKIP_FLUSH
Hi, anybody knows why I can't skip flush when taking snapshot ? snapshot 'aaa', 'aaa_snapshot', {SKIP_FLUSH = true} NameError: uninitialized constant SKIP_FLUSH without {SKIP_FLUSH = true}, the command works fine/ Regards, Guangle
Re: SKIP_FLUSH
which version are you using? Matteo On Tue, Sep 9, 2014 at 5:34 PM, Guangle Fan fanguan...@gmail.com wrote: Hi, anybody knows why I can't skip flush when taking snapshot ? snapshot 'aaa', 'aaa_snapshot', {SKIP_FLUSH = true} NameError: uninitialized constant SKIP_FLUSH without {SKIP_FLUSH = true}, the command works fine/ Regards, Guangle
Re: SKIP_FLUSH
Matteo is so fast :-) HBASE-10935 went into 0.98.4 FYI On Tue, Sep 9, 2014 at 5:35 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: which version are you using? Matteo On Tue, Sep 9, 2014 at 5:34 PM, Guangle Fan fanguan...@gmail.com wrote: Hi, anybody knows why I can't skip flush when taking snapshot ? snapshot 'aaa', 'aaa_snapshot', {SKIP_FLUSH = true} NameError: uninitialized constant SKIP_FLUSH without {SKIP_FLUSH = true}, the command works fine/ Regards, Guangle
Re: SKIP_FLUSH
That explains. I'm on .96 On Tue, Sep 9, 2014 at 5:37 PM, Ted Yu yuzhih...@gmail.com wrote: Matteo is so fast :-) HBASE-10935 went into 0.98.4 FYI On Tue, Sep 9, 2014 at 5:35 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: which version are you using? Matteo On Tue, Sep 9, 2014 at 5:34 PM, Guangle Fan fanguan...@gmail.com wrote: Hi, anybody knows why I can't skip flush when taking snapshot ? snapshot 'aaa', 'aaa_snapshot', {SKIP_FLUSH = true} NameError: uninitialized constant SKIP_FLUSH without {SKIP_FLUSH = true}, the command works fine/ Regards, Guangle
Re: need help understand log output
out of curiosity, did you see below messages in RS log? LOG.warn(Snapshot called again without clearing previous. + Doing nothing. Another ongoing flush or did we fail last attempt?); thanks. On Tue, Sep 9, 2014 at 2:15 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I’ve resolved these problems by restarting the region server that owned the region in question. I don’t know what the underlying issue was, but at this point it’s not worth pursuing. Thanks for responding. Brian On Sep 8, 2014, at 11:06 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I realized today that the region server logs for the region being updated (startKey=\x00DDD@) contains the following: 2014-09-08 06:25:50,223 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 11302 2014-09-08 06:26:00,222 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 21682 2014-09-08 06:26:10,223 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 5724 2014-09-08 06:26:20,223 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 11962 2014-09-08 06:26:30,223 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 7693 2014-09-08 06:26:40,224 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 5578 2014-09-08 06:26:50,223 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region Host,\x00DDD@,1400624237999.5bb6bd41597ddd8dd7ca03e78f3a3e65. after a delay of 12420 a log entry being generated every 10 seconds starting about 4 days ago. I presume these problems are related. On Sep 8, 2014, at 7:10 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: When number of attempts is greater than the value of hbase.client.start.log.errors.counter (default 9), AsyncProcess would produce logs cited below. The interval following 'retrying after ' is the backoff time. Which release of HBase are you using ? HBase Version 0.98.0.2.1.1.0-385-hadoop2 The MR job is reading from an HBase snapshot, if that’s relevant. Cheers On Sun, Sep 7, 2014 at 8:50 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I have a map/reduce job that is consistently failing with timeouts. The failing mapper log files contain a series of records similar to those below. When I look at the hbase and hdfs logs (on foo.net in this case) I don’t see anything obvious at these timestamps. The mapper task times out at/near attempt=25/35. Can anyone shed light on what these log entries mean? Thanks - Brian 2014-09-07 09:36:51,421 INFO [htable-pool1-t1] org.apache.hadoop.hbase.client.AsyncProcess: #3, table=Host, primary, attempt=10/35 failed 1062 ops, last exception: null on foo.net ,60020,1406043467187, tracking started null, retrying after 10029 ms, replay 1062 ops 2014-09-07 09:37:01,642 INFO [htable-pool1-t1] org.apache.hadoop.hbase.client.AsyncProcess: #3, table=Host, primary, attempt=11/35 failed 1062 ops, last exception: null on foo.net ,60020,1406043467187, tracking started null, retrying after 10023 ms, replay 1062 ops 2014-09-07 09:37:12,064 INFO [htable-pool1-t1] org.apache.hadoop.hbase.client.AsyncProcess: #3, table=Host, primary, attempt=12/35 failed 1062 ops, last exception: null on foo.net ,60020,1406043467187, tracking started null, retrying after 20182 ms, replay 1062 ops 2014-09-07 09:37:32,708 INFO [htable-pool1-t1] org.apache.hadoop.hbase.client.AsyncProcess: #3, table=Host, primary, attempt=13/35 failed 1062 ops, last exception: null on foo.net ,60020,1406043467187, tracking started null, retrying after 20140 ms, replay 1062 ops 2014-09-07 09:37:52,940 INFO [htable-pool1-t1] org.apache.hadoop.hbase.client.AsyncProcess: #3, table=Host, primary, attempt=14/35 failed 1062 ops, last exception: null on foo.net ,60020,1406043467187, tracking started null, retrying after 20041 ms, replay 1062 ops 2014-09-07 09:38:13,324 INFO [htable-pool1-t1]
Re: Nested data structures examples for HBase
Are you just kicking the tires or do you want to roll up your sleeves and do some work? You have options. Secondary Indexes. I don’t mean an inverted table but things like SOLR, Lucene, Elastic search… The only downside is that depending on what you index, you can see an explosion in the data being stored in HBase. But that may be beyond you. Its a non-trivial task, and to be honest… a bit of ‘rocket science’. Its still doable… On Sep 9, 2014, at 10:20 PM, Stephen Boesch java...@gmail.com wrote: Thanks Michael, yes cells are byte[]; therefore, storing JSON or other document structures is always possible. Our use cases include querying individual elements in the structure - so that would require reconstituting the documents and then parsing them for every row. We probably are not headed in the direction of HBase for those use cases: but we are trying to make that determination after having carefully considered the extent of the mismatch. 2014-09-09 13:37 GMT-07:00 Michael Segel michael_se...@hotmail.com: You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. So you have the ability to create nested structures like… JSON records? ;-) So to your point. You can have a column A which represents a set of values. This is one reason why you shouldn’t think of HBase in terms of being relational. In fact for Hadoop, you really don’t want to think in terms of relational structures. Think more of Hierarchical. So yes, you can do what you want to do… HTH -Mike On Sep 8, 2014, at 10:06 PM, Stephen Boesch java...@gmail.com wrote: While I am aware that HBase does not have native support for nested structures, surely there are some of you that have thought through this use case carefully. Our particular use case is likely having single digit nested layers with tens to hundreds of items in the lists at each level. An example would be a top Level 300 items middle level : 1 to 100 items (1 value may indicate a single value as opposed to a list) third level: 1 to 50 items fourth level 1 to 20 items The column names are likely known ahead of time- which may or may not matter for hbase. We could model the above structure in a Parquet File or in Hive (with nested struct's)- but we would like to consider whether HBase.might also be an option.