Hi all, I have the following hbase use case: One Hbase table, with a row key (built with a combination of md5 hashes) and 2 column families. Logically, the table stores sentences. The table has hundreds of millions of records.
I have a webapp that connects to this hbase table, and needs to randomly export sentences, based on some conditions. Currently, all these conditions can be looked-up just by using the rowkey. Typically, one export would contain just a couple of hundreds sentences. The important restriction is that once some segments are exported, they should not be present in any subsequent export. So my question is related to this - how should I make sure the same segments do not get exported again? Should I 'mark' the exported segments, by updating a flag, after each export happens? This has the drawback that, when looking at which segments meet my conditions, I wouldn't be able to use just the rowkey for identifying those records, but also that flag. Hence, I would need to use filters, which I know are way slower. Is there a better approach for this? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-update-use-case-tp4049091.html Sent from the HBase User mailing list archive at Nabble.com.
