Suppose the maximum length of number of S(X), eg. 10, you can set the start key to S(0000000000) (ten 0s) and the end key to S(99999999999) ( 11 9s).
On Mon, May 17, 2010 at 8:43 AM, Raghava Mutharaju <[email protected]> wrote: > Hi all, > > Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of the > set in multiple MR job iterations i.e. multiple MR jobs would be run one > after another several times. In each iteration, a subset of the values would > be computed i.e. the value of the set would be computed incrementally. I am > using HBase to store the result. In this scenario, my design is as follows > > Schema Design: > > - S(X) is the row key. > - Each element would be a column in the column family. The label of the > column would be the iteration number followed by a number indicating the > position of the element in the subset. > Eg: In iteration 1, subset {a,b} has been computed. Then the row would be > S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of the > column family. > > I can add the results of subsequent iterations (other subsets) to S(X) by > adding more columns. > Would this design be appropriate for the above scenario? > > There would be many S(X) - X can be X1, X2, X3, .... and many elements in > the set, S(X). > > Filtering: > > To retrieve all the sets, S(X), a range fetch should be performed. I > wouldn't know the startkey and endkey because number of S(X) sets is not > known before hand. Can I use PrefixFilter for this, by setting prefix as > 'S'? > > Thank you in advance. > > Regards, > Raghava. > -- Regards Angus
