Hi,

We have the output of a clustering algorithm in an hbase table which has the
following structure:

{NAME => 'clusters', FAMILIES => [{NAME => 'products', COMPRESS
true
 ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
'655
 36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}

row ids are cluster ids.
Columns in products column family are the id of the products.

an example row is:
 1-1000936175-1879240683-185 column=products:21840054,
timestamp=1291817353183, value=\x00\x00\x00\x01

 1-1000936175-1879240683-185 column=products:23194179,
timestamp=1291817353183, value=\x00\x00\x00\x01

 1-1000936175-1879240683-185 column=products:23585765,
timestamp=1291817353183, value=\x00\x00\x00\x01

 1-1000936175-1879240683-185 column=products:24544087,
timestamp=1291817353183, value=\x00\x00\x00\x01



When we want to determine which clusters a product  belongs to, we perform a
scan over the table using column,

e.g.

Scan s = new Scan();
s.addColumn(Bytes.toBytes("products"), Bytes.toBytes("24659517"));
ResultScanner scanner = table.getScanner(s);

I am not sure this is the best way, it is slow, could you suggest a faster
way to determine such rows?
Is there a secondary index implementation that we can add to a column family
after adding data to table?

-- 
Gokhan

Reply via email to