Hi,
We have the output of a clustering algorithm in an hbase table which has the
following structure:
{NAME => 'clusters', FAMILIES => [{NAME => 'products', COMPRESS
true
ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
'655
36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
row ids are cluster ids.
Columns in products column family are the id of the products.
an example row is:
1-1000936175-1879240683-185 column=products:21840054,
timestamp=1291817353183, value=\x00\x00\x00\x01
1-1000936175-1879240683-185 column=products:23194179,
timestamp=1291817353183, value=\x00\x00\x00\x01
1-1000936175-1879240683-185 column=products:23585765,
timestamp=1291817353183, value=\x00\x00\x00\x01
1-1000936175-1879240683-185 column=products:24544087,
timestamp=1291817353183, value=\x00\x00\x00\x01
When we want to determine which clusters a product belongs to, we perform a
scan over the table using column,
e.g.
Scan s = new Scan();
s.addColumn(Bytes.toBytes("products"), Bytes.toBytes("24659517"));
ResultScanner scanner = table.getScanner(s);
I am not sure this is the best way, it is slow, could you suggest a faster
way to determine such rows?
Is there a secondary index implementation that we can add to a column family
after adding data to table?
--
Gokhan