Re: Schema design for filters

Michael Segel Fri, 28 Jun 2013 05:47:07 -0700

Why is it that if all you have is a hammer, everything looks like a nail? ;-)



On Jun 27, 2013, at 8:55 PM, James Taylor <jtay...@salesforce.com> wrote:

> Hi Kristoffer,
> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You 
> could model your schema much like an O/R mapper and issue SQL queries through 
> Phoenix for your filtering.
> 
> James
> @JamesPlusPlus
> http://phoenix-hbase.blogspot.com
> 
> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <sto...@gmail.com> wrote:
> 
>> Thanks for your help Mike. Much appreciated.
>> 
>> I dont store rows/columns in JSON format. The schema is exactly that of a
>> specific java class, where the rowkey is a unique object identifier with
>> the class type encoded into it. Columns are the field names of the class
>> and the values are that of the object instance.
>> 
>> Did think about coprocessors but the schema is discovered a runtime and I
>> cant hard code it.
>> 
>> However, I still believe that filters might work. Had a look
>> at SingleColumnValueFilter and this filter is be able to target specific
>> column qualifiers with specific WritableByteArrayComparables.
>> 
>> But list comparators are still missing... So I guess the only way is to
>> write these comparators?
>> 
>> Do you follow my reasoning? Will it work?
>> 
>> 
>> 
>> 
>> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
>> <michael_se...@hotmail.com>wrote:
>> 
>>> Ok...
>>> 
>>> If you want to do type checking and schema enforcement...
>>> 
>>> You will need to do this as a coprocessor.
>>> 
>>> The quick and dirty way... (Not recommended) would be to hard code the
>>> schema in to the co-processor code.)
>>> 
>>> A better way... at start up, load up ZK to manage the set of known table
>>> schemas which would be a map of column qualifier to data type.
>>> (If JSON then you need to do a separate lookup to get the records schema)
>>> 
>>> Then a single java class that does the look up and then handles the known
>>> data type comparators.
>>> 
>>> Does this make sense?
>>> (Sorry, kinda was thinking this out as I typed the response. But it should
>>> work )
>>> 
>>> At least it would be a design approach I would talk. YMMV
>>> 
>>> Having said that, I expect someone to say its a bad idea and that they
>>> have a better solution.
>>> 
>>> HTH
>>> 
>>> -Mike
>>> 
>>> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <sto...@gmail.com> wrote:
>>> 
>>>> I see your point. Everything is just bytes.
>>>> 
>>>> However, the schema is known and every row is formatted according to this
>>>> schema, although some columns may not exist, that is, no value exist for
>>>> this property on this row.
>>>> 
>>>> So if im able to apply these "typed comparators" to the right cell values
>>>> it may be possible? But I cant find a filter that target specific
>>> columns?
>>>> 
>>>> Seems like all filters scan every column/qualifier and there is no way of
>>>> knowing what column is currently being evaluated?
>>>> 
>>>> 
>>>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
>>>> <michael_se...@hotmail.com>wrote:
>>>> 
>>>>> You have to remember that HBase doesn't enforce any sort of typing.
>>>>> That's why this can be difficult.
>>>>> 
>>>>> You'd have to write a coprocessor to enforce a schema on a table.
>>>>> Even then YMMV if you're writing JSON structures to a column because
>>> while
>>>>> the contents of the structures could be the same, the actual strings
>>> could
>>>>> differ.
>>>>> 
>>>>> HTH
>>>>> 
>>>>> -Mike
>>>>> 
>>>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <sto...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> I realize standard comparators cannot solve this.
>>>>>> 
>>>>>> However I do know the type of each column so writing custom list
>>>>>> comparators for boolean, char, byte, short, int, long, float, double
>>>>> seems
>>>>>> quite straightforward.
>>>>>> 
>>>>>> Long arrays, for example, are stored as a byte array with 8 bytes per
>>>>> item
>>>>>> so a comparator might look like this.
>>>>>> 
>>>>>> public class LongsComparator extends WritableByteArrayComparable {
>>>>>> public int compareTo(byte[] value, int offset, int length) {
>>>>>>     long[] values = BytesUtils.toLongs(value, offset, length);
>>>>>>     for (long longValue : values) {
>>>>>>         if (longValue == val) {
>>>>>>             return 0;
>>>>>>         }
>>>>>>     }
>>>>>>     return 1;
>>>>>> }
>>>>>> }
>>>>>> 
>>>>>> public static long[] toLongs(byte[] value, int offset, int length) {
>>>>>> int num = (length - offset) / 8;
>>>>>> long[] values = new long[num];
>>>>>> for (int i = offset; i < num; i++) {
>>>>>>     values[i] = getLong(value, i * 8);
>>>>>> }
>>>>>> return values;
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Strings are similar but would require charset and length for each
>>> string.
>>>>>> 
>>>>>> public class StringsComparator extends WritableByteArrayComparable  {
>>>>>> public int compareTo(byte[] value, int offset, int length) {
>>>>>>     String[] values = BytesUtils.toStrings(value, offset, length);
>>>>>>     for (String stringValue : values) {
>>>>>>         if (val.equals(stringValue)) {
>>>>>>             return 0;
>>>>>>         }
>>>>>>     }
>>>>>>     return 1;
>>>>>> }
>>>>>> }
>>>>>> 
>>>>>> public static String[] toStrings(byte[] value, int offset, int length)
>>> {
>>>>>> ArrayList<String> values = new ArrayList<String>();
>>>>>> int idx = 0;
>>>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
>>>>>> while (idx < length) {
>>>>>>     int size = buffer.getInt();
>>>>>>     byte[] bytes = new byte[size];
>>>>>>     buffer.get(bytes);
>>>>>>     values.add(new String(bytes));
>>>>>>     idx += 4 + size;
>>>>>> }
>>>>>> return values.toArray(new String[values.size()]);
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Am I on the right track or maybe overlooking some implementation
>>> details?
>>>>>> Not really sure how to target each comparator to a specific column
>>> value?
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <
>>>>> michael_se...@hotmail.com>wrote:
>>>>>> 
>>>>>>> Not an easy task.
>>>>>>> 
>>>>>>> You first need to determine how you want to store the data within a
>>>>> column
>>>>>>> and/or apply a type constraint to a column.
>>>>>>> 
>>>>>>> Even if you use JSON records to store your data within a column, does
>>> an
>>>>>>> equality comparator exist? If not, you would have to write one.
>>>>>>> (I kinda think that one may already exist...)
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <sto...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi
>>>>>>>> 
>>>>>>>> Working with the standard filtering mechanism to scan rows that have
>>>>>>>> columns matching certain criterias.
>>>>>>>> 
>>>>>>>> There are columns of numeric (integer and decimal) and string types.
>>>>>>> These
>>>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b"
>>> or
>>>>>>>> "a,b,c" - not sure what the separator would be in the case of list
>>>>> types.
>>>>>>>> Maybe none?
>>>>>>>> 
>>>>>>>> I would like to compose the following queries to filter out rows that
>>>>>>> does
>>>>>>>> not match.
>>>>>>>> 
>>>>>>>> - contains(String column, String value)
>>>>>>>> Single valued column that String.contain() provided value.
>>>>>>>> 
>>>>>>>> - equal(String column, Object value)
>>>>>>>> Single valued column that Object.equals() provided value.
>>>>>>>> Value is either string or numeric type.
>>>>>>>> 
>>>>>>>> - greaterThan(String column, java.lang.Number value)
>>>>>>>> Single valued column that > provided numeric value.
>>>>>>>> 
>>>>>>>> - in(String column, Object value...)
>>>>>>>> Multi-valued column have values that Object.equals() all provided
>>>>>>> values.
>>>>>>>> Values are of string or numeric type.
>>>>>>>> 
>>>>>>>> How would I design a schema that can take advantage of the already
>>>>>>> existing
>>>>>>>> filters and comparators to accomplish this?
>>>>>>>> 
>>>>>>>> Already looked at the string and binary comparators but fail to see
>>> how
>>>>>>> to
>>>>>>>> solve this in a clean way for multi-valued column values.
>>>>>>>> 
>>>>>>>> Im aware of custom filters but would like to avoid it if possible.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> -Kristoffer
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>>

Re: Schema design for filters

Reply via email to