Why is it that if all you have is a hammer, everything looks like a nail? ;-)
On Jun 27, 2013, at 8:55 PM, James Taylor <jtay...@salesforce.com> wrote: > Hi Kristoffer, > Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You > could model your schema much like an O/R mapper and issue SQL queries through > Phoenix for your filtering. > > James > @JamesPlusPlus > http://phoenix-hbase.blogspot.com > > On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <sto...@gmail.com> wrote: > >> Thanks for your help Mike. Much appreciated. >> >> I dont store rows/columns in JSON format. The schema is exactly that of a >> specific java class, where the rowkey is a unique object identifier with >> the class type encoded into it. Columns are the field names of the class >> and the values are that of the object instance. >> >> Did think about coprocessors but the schema is discovered a runtime and I >> cant hard code it. >> >> However, I still believe that filters might work. Had a look >> at SingleColumnValueFilter and this filter is be able to target specific >> column qualifiers with specific WritableByteArrayComparables. >> >> But list comparators are still missing... So I guess the only way is to >> write these comparators? >> >> Do you follow my reasoning? Will it work? >> >> >> >> >> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel >> <michael_se...@hotmail.com>wrote: >> >>> Ok... >>> >>> If you want to do type checking and schema enforcement... >>> >>> You will need to do this as a coprocessor. >>> >>> The quick and dirty way... (Not recommended) would be to hard code the >>> schema in to the co-processor code.) >>> >>> A better way... at start up, load up ZK to manage the set of known table >>> schemas which would be a map of column qualifier to data type. >>> (If JSON then you need to do a separate lookup to get the records schema) >>> >>> Then a single java class that does the look up and then handles the known >>> data type comparators. >>> >>> Does this make sense? >>> (Sorry, kinda was thinking this out as I typed the response. But it should >>> work ) >>> >>> At least it would be a design approach I would talk. YMMV >>> >>> Having said that, I expect someone to say its a bad idea and that they >>> have a better solution. >>> >>> HTH >>> >>> -Mike >>> >>> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <sto...@gmail.com> wrote: >>> >>>> I see your point. Everything is just bytes. >>>> >>>> However, the schema is known and every row is formatted according to this >>>> schema, although some columns may not exist, that is, no value exist for >>>> this property on this row. >>>> >>>> So if im able to apply these "typed comparators" to the right cell values >>>> it may be possible? But I cant find a filter that target specific >>> columns? >>>> >>>> Seems like all filters scan every column/qualifier and there is no way of >>>> knowing what column is currently being evaluated? >>>> >>>> >>>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel >>>> <michael_se...@hotmail.com>wrote: >>>> >>>>> You have to remember that HBase doesn't enforce any sort of typing. >>>>> That's why this can be difficult. >>>>> >>>>> You'd have to write a coprocessor to enforce a schema on a table. >>>>> Even then YMMV if you're writing JSON structures to a column because >>> while >>>>> the contents of the structures could be the same, the actual strings >>> could >>>>> differ. >>>>> >>>>> HTH >>>>> >>>>> -Mike >>>>> >>>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <sto...@gmail.com> >>> wrote: >>>>> >>>>>> I realize standard comparators cannot solve this. >>>>>> >>>>>> However I do know the type of each column so writing custom list >>>>>> comparators for boolean, char, byte, short, int, long, float, double >>>>> seems >>>>>> quite straightforward. >>>>>> >>>>>> Long arrays, for example, are stored as a byte array with 8 bytes per >>>>> item >>>>>> so a comparator might look like this. >>>>>> >>>>>> public class LongsComparator extends WritableByteArrayComparable { >>>>>> public int compareTo(byte[] value, int offset, int length) { >>>>>> long[] values = BytesUtils.toLongs(value, offset, length); >>>>>> for (long longValue : values) { >>>>>> if (longValue == val) { >>>>>> return 0; >>>>>> } >>>>>> } >>>>>> return 1; >>>>>> } >>>>>> } >>>>>> >>>>>> public static long[] toLongs(byte[] value, int offset, int length) { >>>>>> int num = (length - offset) / 8; >>>>>> long[] values = new long[num]; >>>>>> for (int i = offset; i < num; i++) { >>>>>> values[i] = getLong(value, i * 8); >>>>>> } >>>>>> return values; >>>>>> } >>>>>> >>>>>> >>>>>> Strings are similar but would require charset and length for each >>> string. >>>>>> >>>>>> public class StringsComparator extends WritableByteArrayComparable { >>>>>> public int compareTo(byte[] value, int offset, int length) { >>>>>> String[] values = BytesUtils.toStrings(value, offset, length); >>>>>> for (String stringValue : values) { >>>>>> if (val.equals(stringValue)) { >>>>>> return 0; >>>>>> } >>>>>> } >>>>>> return 1; >>>>>> } >>>>>> } >>>>>> >>>>>> public static String[] toStrings(byte[] value, int offset, int length) >>> { >>>>>> ArrayList<String> values = new ArrayList<String>(); >>>>>> int idx = 0; >>>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length); >>>>>> while (idx < length) { >>>>>> int size = buffer.getInt(); >>>>>> byte[] bytes = new byte[size]; >>>>>> buffer.get(bytes); >>>>>> values.add(new String(bytes)); >>>>>> idx += 4 + size; >>>>>> } >>>>>> return values.toArray(new String[values.size()]); >>>>>> } >>>>>> >>>>>> >>>>>> Am I on the right track or maybe overlooking some implementation >>> details? >>>>>> Not really sure how to target each comparator to a specific column >>> value? >>>>>> >>>>>> >>>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < >>>>> michael_se...@hotmail.com>wrote: >>>>>> >>>>>>> Not an easy task. >>>>>>> >>>>>>> You first need to determine how you want to store the data within a >>>>> column >>>>>>> and/or apply a type constraint to a column. >>>>>>> >>>>>>> Even if you use JSON records to store your data within a column, does >>> an >>>>>>> equality comparator exist? If not, you would have to write one. >>>>>>> (I kinda think that one may already exist...) >>>>>>> >>>>>>> >>>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <sto...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> Working with the standard filtering mechanism to scan rows that have >>>>>>>> columns matching certain criterias. >>>>>>>> >>>>>>>> There are columns of numeric (integer and decimal) and string types. >>>>>>> These >>>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" >>> or >>>>>>>> "a,b,c" - not sure what the separator would be in the case of list >>>>> types. >>>>>>>> Maybe none? >>>>>>>> >>>>>>>> I would like to compose the following queries to filter out rows that >>>>>>> does >>>>>>>> not match. >>>>>>>> >>>>>>>> - contains(String column, String value) >>>>>>>> Single valued column that String.contain() provided value. >>>>>>>> >>>>>>>> - equal(String column, Object value) >>>>>>>> Single valued column that Object.equals() provided value. >>>>>>>> Value is either string or numeric type. >>>>>>>> >>>>>>>> - greaterThan(String column, java.lang.Number value) >>>>>>>> Single valued column that > provided numeric value. >>>>>>>> >>>>>>>> - in(String column, Object value...) >>>>>>>> Multi-valued column have values that Object.equals() all provided >>>>>>> values. >>>>>>>> Values are of string or numeric type. >>>>>>>> >>>>>>>> How would I design a schema that can take advantage of the already >>>>>>> existing >>>>>>>> filters and comparators to accomplish this? >>>>>>>> >>>>>>>> Already looked at the string and binary comparators but fail to see >>> how >>>>>>> to >>>>>>>> solve this in a clean way for multi-valued column values. >>>>>>>> >>>>>>>> Im aware of custom filters but would like to avoid it if possible. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> -Kristoffer >>>>>>> >>>>>>> >>>>> >>>>> >>> >>>