Interesting. Im actually building something similar. A fullblown SQL implementation is bit overkill for my particular usecase and the query API is the final piece to the puzzle. But ill definitely have a look for some inspiration.
Thanks! On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[email protected]>wrote: > Hi Kristoffer, > Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? > You could model your schema much like an O/R mapper and issue SQL queries > through Phoenix for your filtering. > > James > @JamesPlusPlus > http://phoenix-hbase.blogspot.com > > On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]> > wrote: > > > Thanks for your help Mike. Much appreciated. > > > > I dont store rows/columns in JSON format. The schema is exactly that of a > > specific java class, where the rowkey is a unique object identifier with > > the class type encoded into it. Columns are the field names of the class > > and the values are that of the object instance. > > > > Did think about coprocessors but the schema is discovered a runtime and I > > cant hard code it. > > > > However, I still believe that filters might work. Had a look > > at SingleColumnValueFilter and this filter is be able to target specific > > column qualifiers with specific WritableByteArrayComparables. > > > > But list comparators are still missing... So I guess the only way is to > > write these comparators? > > > > Do you follow my reasoning? Will it work? > > > > > > > > > > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel > > <[email protected]>wrote: > > > >> Ok... > >> > >> If you want to do type checking and schema enforcement... > >> > >> You will need to do this as a coprocessor. > >> > >> The quick and dirty way... (Not recommended) would be to hard code the > >> schema in to the co-processor code.) > >> > >> A better way... at start up, load up ZK to manage the set of known table > >> schemas which would be a map of column qualifier to data type. > >> (If JSON then you need to do a separate lookup to get the records > schema) > >> > >> Then a single java class that does the look up and then handles the > known > >> data type comparators. > >> > >> Does this make sense? > >> (Sorry, kinda was thinking this out as I typed the response. But it > should > >> work ) > >> > >> At least it would be a design approach I would talk. YMMV > >> > >> Having said that, I expect someone to say its a bad idea and that they > >> have a better solution. > >> > >> HTH > >> > >> -Mike > >> > >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[email protected]> > wrote: > >> > >>> I see your point. Everything is just bytes. > >>> > >>> However, the schema is known and every row is formatted according to > this > >>> schema, although some columns may not exist, that is, no value exist > for > >>> this property on this row. > >>> > >>> So if im able to apply these "typed comparators" to the right cell > values > >>> it may be possible? But I cant find a filter that target specific > >> columns? > >>> > >>> Seems like all filters scan every column/qualifier and there is no way > of > >>> knowing what column is currently being evaluated? > >>> > >>> > >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel > >>> <[email protected]>wrote: > >>> > >>>> You have to remember that HBase doesn't enforce any sort of typing. > >>>> That's why this can be difficult. > >>>> > >>>> You'd have to write a coprocessor to enforce a schema on a table. > >>>> Even then YMMV if you're writing JSON structures to a column because > >> while > >>>> the contents of the structures could be the same, the actual strings > >> could > >>>> differ. > >>>> > >>>> HTH > >>>> > >>>> -Mike > >>>> > >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[email protected]> > >> wrote: > >>>> > >>>>> I realize standard comparators cannot solve this. > >>>>> > >>>>> However I do know the type of each column so writing custom list > >>>>> comparators for boolean, char, byte, short, int, long, float, double > >>>> seems > >>>>> quite straightforward. > >>>>> > >>>>> Long arrays, for example, are stored as a byte array with 8 bytes per > >>>> item > >>>>> so a comparator might look like this. > >>>>> > >>>>> public class LongsComparator extends WritableByteArrayComparable { > >>>>> public int compareTo(byte[] value, int offset, int length) { > >>>>> long[] values = BytesUtils.toLongs(value, offset, length); > >>>>> for (long longValue : values) { > >>>>> if (longValue == val) { > >>>>> return 0; > >>>>> } > >>>>> } > >>>>> return 1; > >>>>> } > >>>>> } > >>>>> > >>>>> public static long[] toLongs(byte[] value, int offset, int length) { > >>>>> int num = (length - offset) / 8; > >>>>> long[] values = new long[num]; > >>>>> for (int i = offset; i < num; i++) { > >>>>> values[i] = getLong(value, i * 8); > >>>>> } > >>>>> return values; > >>>>> } > >>>>> > >>>>> > >>>>> Strings are similar but would require charset and length for each > >> string. > >>>>> > >>>>> public class StringsComparator extends WritableByteArrayComparable { > >>>>> public int compareTo(byte[] value, int offset, int length) { > >>>>> String[] values = BytesUtils.toStrings(value, offset, length); > >>>>> for (String stringValue : values) { > >>>>> if (val.equals(stringValue)) { > >>>>> return 0; > >>>>> } > >>>>> } > >>>>> return 1; > >>>>> } > >>>>> } > >>>>> > >>>>> public static String[] toStrings(byte[] value, int offset, int > length) > >> { > >>>>> ArrayList<String> values = new ArrayList<String>(); > >>>>> int idx = 0; > >>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length); > >>>>> while (idx < length) { > >>>>> int size = buffer.getInt(); > >>>>> byte[] bytes = new byte[size]; > >>>>> buffer.get(bytes); > >>>>> values.add(new String(bytes)); > >>>>> idx += 4 + size; > >>>>> } > >>>>> return values.toArray(new String[values.size()]); > >>>>> } > >>>>> > >>>>> > >>>>> Am I on the right track or maybe overlooking some implementation > >> details? > >>>>> Not really sure how to target each comparator to a specific column > >> value? > >>>>> > >>>>> > >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < > >>>> [email protected]>wrote: > >>>>> > >>>>>> Not an easy task. > >>>>>> > >>>>>> You first need to determine how you want to store the data within a > >>>> column > >>>>>> and/or apply a type constraint to a column. > >>>>>> > >>>>>> Even if you use JSON records to store your data within a column, > does > >> an > >>>>>> equality comparator exist? If not, you would have to write one. > >>>>>> (I kinda think that one may already exist...) > >>>>>> > >>>>>> > >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> Hi > >>>>>>> > >>>>>>> Working with the standard filtering mechanism to scan rows that > have > >>>>>>> columns matching certain criterias. > >>>>>>> > >>>>>>> There are columns of numeric (integer and decimal) and string > types. > >>>>>> These > >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" > >> or > >>>>>>> "a,b,c" - not sure what the separator would be in the case of list > >>>> types. > >>>>>>> Maybe none? > >>>>>>> > >>>>>>> I would like to compose the following queries to filter out rows > that > >>>>>> does > >>>>>>> not match. > >>>>>>> > >>>>>>> - contains(String column, String value) > >>>>>>> Single valued column that String.contain() provided value. > >>>>>>> > >>>>>>> - equal(String column, Object value) > >>>>>>> Single valued column that Object.equals() provided value. > >>>>>>> Value is either string or numeric type. > >>>>>>> > >>>>>>> - greaterThan(String column, java.lang.Number value) > >>>>>>> Single valued column that > provided numeric value. > >>>>>>> > >>>>>>> - in(String column, Object value...) > >>>>>>> Multi-valued column have values that Object.equals() all provided > >>>>>> values. > >>>>>>> Values are of string or numeric type. > >>>>>>> > >>>>>>> How would I design a schema that can take advantage of the already > >>>>>> existing > >>>>>>> filters and comparators to accomplish this? > >>>>>>> > >>>>>>> Already looked at the string and binary comparators but fail to see > >> how > >>>>>> to > >>>>>>> solve this in a clean way for multi-valued column values. > >>>>>>> > >>>>>>> Im aware of custom filters but would like to avoid it if possible. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> -Kristoffer > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> >
