On Sat, Jan 21, 2012 at 5:34 AM, Wayne <[email protected]> wrote: > Sorry but it would be too hard for us to be able to provide enough info in > a Jira to accurately reproduce. Our read problem is through thrift and has > everything to do with the row just being too big to bring back in its > entirety (13 million col row times out 1/3 of the time). Filters in .92 and > thrift should help us there. I just closed > https://issues.apache.org/jira/browse/HBASE-4187 as filters now support > offset, limit patterns for the get. Of course we would all prefer a > streaming model to avoid any of these issues and having to build our > own pseudo streaming model. Is Thrift still the best option for high > performance python based reads? From Hadoop World it seems some people are > pushing thrift and others are pushing Avro. Does .92 bundle/work with > Thrift .8 and are the memory leaks fixed in .8? > > For what you are doing, python client, I'd say yes.
> As far as the write bottleneck it has a lot to do with memory, and other > low level config issues. I would hope that the automated tests of hbase can > eventually include patterns for large col counts. In order for hbase to > truly be a col based storage system it needs to scale cols into the 100s > millions and beyond. This is the pattern we have the hardest time modeling > in base because there is an unknown "limit" here we have to watch out for. > There is a known limit that a row must be stored within 1 and only one > region, but that should not be a problem. One single large region storing > one large row should still "work". > > We don't have such a test in our suite currently. It would be a good idea to add it. I made HBASE-5244. St.Ack
