Looking at HBASE-1537, it seems that it only limits the number of columns (or column families?) that the scanner returns. This is not useful to us; we have only one column with one column family. One solution might potentially be to add a dummy column family and then use HBASE-1537 with a limit of 1. We'd do it if it's the only option, but we'd rather not online and offline our tables (we are talking about many terabytes in a production cluster).
Is it somehow possible to use KeyValue.getRow() or KeyValue.getKey() in a scanner without returning KeyValue's? Basically, any code that tries to return or make copies of entire KeyValue's is incredibly inefficient, since (a) we only need the key and (b) sizeof(key) is many orders of magnitude smaller than sizeof(value). On Wed, Dec 22, 2010 at 1:53 PM, Ted Yu <[email protected]> wrote: > How about HBASE-1537 ? > > On Wed, Dec 22, 2010 at 1:38 PM, Leo Alekseyev <[email protected]> wrote: > >> I need to retrieve row keys from several big tables. Is it possible >> to do so by just reading the key and truncating the value? I see that >> HBASE-1481 implements FirstKeyOnlyFilter, but it doesn't help in our >> case, since we have only one KeyValue per row, and it stores binary >> data. >> >> Is there an easy way to accomplish fast key-only retrieval? >> >> --Leo >> >
