On Mon, Jan 25, 2016 at 10:29 AM, Henning Blohm <henning.bl...@zfabrik.de> wrote:
> Hi, > > I am looking for advise on an HBase mass data access optimization problem. > > In our application all data records stored in Hbase have a time dimension > (as inverted time) and a GUID in the row key. Retrieving a record requires > issueing a scan with the GUID as prefix. > > So GUID precedes the inverted timestamp? > In order to get to entry (there is various access paths) we use a simple > secondary index that also has a time dimension in the row and so needs a > scan as well. > > For mass updates I am currently seeking ways to improve lookup performance. > > I found various discussions and issues on multi-scans (as in multi-Get, > multi-Delete) but none of it was really helpful in sorting out the most > promising direction. > > The multi-Get does not help? Downside is one slow server slows the whole query. It is not satisfactorily parallel enough in its querying? > Currently I am experimenting with simply parallelizing lookups in chunks > from the client. That reduces eplapsed wait time a bit. It seems though > that avoiding roundtrips altogether by "scanning in parallel server-side" > should show much better improvements. > How would this work? You'd pass over a list of GUIDs you knew were on a particular server, then in a coprocessor, we'd do whatever per GUID? St.Ack > Thanks, > Henning > > -- > Henning Blohm > > *ZFabrik Software GmbH & Co. KG* > > T: +49 6227 3984255 > F: +49 6227 3984254 > M: +49 1781891820 > > Lammstrasse 2 69190 Walldorf > > henning.bl...@zfabrik.de <mailto:henning.bl...@zfabrik.de> > Linkedin <http://www.linkedin.com/pub/henning-blohm/0/7b5/628> > ZFabrik <http://www.zfabrik.de> > Blog <http://www.z2-environment.net/blog> > Z2-Environment <http://www.z2-environment.eu> > Z2 Wiki <http://redmine.z2-environment.net> > >