Hey Wade!
It's great that you take some time to write a blog post about that,
I'm sure it's going to be useful to others too!
Rest of my answer is inline.
J-D
> I am playing with htable.batch for multi get to see if I can remove my
> external hbase indexes. This is what I am trying to do.
>
> #1 What is the best model for a column family that is just used as an index?
>
> Currently I am using a columns family _idx_ with:
> column:<row>
> value:<timstamp>
>
> This allows me to have a new column for each index with a value of when it
> was added. That allows me to purge the column family by the value greater
> than some time in Map reduce.
So I guess you're not using TTL because you don't want to remove the
most recent cell in a qualifier?
Recently I started recommending using short family names as it is
stored along every value in memory and on disk, so in your case you
could save a few bytes per row by having shorter name than _idx_
>
> #2 What is the most performant way to get this back into a get object? This
> is what I am doing so far but want to validate my thoughts.
- So I guess that in your code base, compared to this standalone code,
you reuse the config object? If not, well do reuse it.
- Same comment regarding HTables, they should be reused. Not kidding.
- Why are you creating a HTD each time? Can't you just create the
HTable directly?
- I wonder why you're doing a getFamilyMap call, if all you want is
those keys then do a result.raw and iterate through that. Like the
javadoc says: "This API is faster than using getFamilyMap() and
getMap()"
- You should delay creating the array of Gets until you know how many
objects you need in order to create the list directly with the right
size.
>
> Configuration config = HBaseConfiguration.create();
> HTableDescriptor transactionsbycompany_descriptor = new
> HTableDescriptor(table);
> HTable transactionsbycompany_table = new HTable(config,
> transactionsbycompany_descriptor.getName());
> HTable transactions_table = new HTable(config,
> transactions_descriptor.getName());
> List<Row> gets = new ArrayList<Row>();
> Get g = new Get(Bytes.toBytes(key));
> Result result = transactionsbycompany_table.get(g);
> NavigableMap<byte[], byte[]> nmap =
> result.getFamilyMap(Bytes.toBytes(colfam_index));
>
> Set<byte[]> keySet = nmap.keySet();
> Iterator<byte[]> iter = keySet.iterator();
> HTableDescriptor transactions_descriptor = new
> HTableDescriptor("transactions");
>
> while (iter.hasNext()) {
> byte[] idx_key = iter.next();
> Get get = new Get(idx_key);
> get.addColumn(Bytes.toBytes("details"), Bytes.toBytes("amount"));
> gets.add(get);
>
> }
> Result[] multiRes = new Result[gets.size()];
> try {
> transactions_table.batch(gets, multiRes);
> } catch (InterruptedException e) {
> // TODO Auto-generated catch block
>
> e.printStackTrace();
> }
>
>
>
> With appreciation;
> Wade Arnold
>
>