@Otis HBase is a natural fit for my usecase because its schemaless. Im building a configuration management system and there is no need for advanced filtering/querying capabilities, just basic predicate logic and pagination that scales to < 1 million rows with reasonable performance.
Thanks for the tip! On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic < [email protected]> wrote: > Kristoffer, > > You could also consider using something other than HBase, something > that supports "secondary indices", like anything that is Lucene based > - Solr and ElasticSearch for example. We recently compared how we > aggregate data in HBase (see my signature) and how we would do it if > we were to use Solr (or ElasticSearch), and so far things look better > in Solr for our use case. And our use case involves a lot of > filtering, slicing and dicing..... something to consider... > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[email protected]> > wrote: > > Interesting. Im actually building something similar. > > > > A fullblown SQL implementation is bit overkill for my particular usecase > > and the query API is the final piece to the puzzle. But ill definitely > have > > a look for some inspiration. > > > > Thanks! > > > > > > > > On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[email protected] > >wrote: > > > >> Hi Kristoffer, > >> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix > )? > >> You could model your schema much like an O/R mapper and issue SQL > queries > >> through Phoenix for your filtering. > >> > >> James > >> @JamesPlusPlus > >> http://phoenix-hbase.blogspot.com > >> > >> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]> > >> wrote: > >> > >> > Thanks for your help Mike. Much appreciated. > >> > > >> > I dont store rows/columns in JSON format. The schema is exactly that > of a > >> > specific java class, where the rowkey is a unique object identifier > with > >> > the class type encoded into it. Columns are the field names of the > class > >> > and the values are that of the object instance. > >> > > >> > Did think about coprocessors but the schema is discovered a runtime > and I > >> > cant hard code it. > >> > > >> > However, I still believe that filters might work. Had a look > >> > at SingleColumnValueFilter and this filter is be able to target > specific > >> > column qualifiers with specific WritableByteArrayComparables. > >> > > >> > But list comparators are still missing... So I guess the only way is > to > >> > write these comparators? > >> > > >> > Do you follow my reasoning? Will it work? > >> > > >> > > >> > > >> > > >> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel > >> > <[email protected]>wrote: > >> > > >> >> Ok... > >> >> > >> >> If you want to do type checking and schema enforcement... > >> >> > >> >> You will need to do this as a coprocessor. > >> >> > >> >> The quick and dirty way... (Not recommended) would be to hard code > the > >> >> schema in to the co-processor code.) > >> >> > >> >> A better way... at start up, load up ZK to manage the set of known > table > >> >> schemas which would be a map of column qualifier to data type. > >> >> (If JSON then you need to do a separate lookup to get the records > >> schema) > >> >> > >> >> Then a single java class that does the look up and then handles the > >> known > >> >> data type comparators. > >> >> > >> >> Does this make sense? > >> >> (Sorry, kinda was thinking this out as I typed the response. But it > >> should > >> >> work ) > >> >> > >> >> At least it would be a design approach I would talk. YMMV > >> >> > >> >> Having said that, I expect someone to say its a bad idea and that > they > >> >> have a better solution. > >> >> > >> >> HTH > >> >> > >> >> -Mike > >> >> > >> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[email protected]> > >> wrote: > >> >> > >> >>> I see your point. Everything is just bytes. > >> >>> > >> >>> However, the schema is known and every row is formatted according to > >> this > >> >>> schema, although some columns may not exist, that is, no value exist > >> for > >> >>> this property on this row. > >> >>> > >> >>> So if im able to apply these "typed comparators" to the right cell > >> values > >> >>> it may be possible? But I cant find a filter that target specific > >> >> columns? > >> >>> > >> >>> Seems like all filters scan every column/qualifier and there is no > way > >> of > >> >>> knowing what column is currently being evaluated? > >> >>> > >> >>> > >> >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel > >> >>> <[email protected]>wrote: > >> >>> > >> >>>> You have to remember that HBase doesn't enforce any sort of typing. > >> >>>> That's why this can be difficult. > >> >>>> > >> >>>> You'd have to write a coprocessor to enforce a schema on a table. > >> >>>> Even then YMMV if you're writing JSON structures to a column > because > >> >> while > >> >>>> the contents of the structures could be the same, the actual > strings > >> >> could > >> >>>> differ. > >> >>>> > >> >>>> HTH > >> >>>> > >> >>>> -Mike > >> >>>> > >> >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[email protected]> > >> >> wrote: > >> >>>> > >> >>>>> I realize standard comparators cannot solve this. > >> >>>>> > >> >>>>> However I do know the type of each column so writing custom list > >> >>>>> comparators for boolean, char, byte, short, int, long, float, > double > >> >>>> seems > >> >>>>> quite straightforward. > >> >>>>> > >> >>>>> Long arrays, for example, are stored as a byte array with 8 bytes > per > >> >>>> item > >> >>>>> so a comparator might look like this. > >> >>>>> > >> >>>>> public class LongsComparator extends WritableByteArrayComparable { > >> >>>>> public int compareTo(byte[] value, int offset, int length) { > >> >>>>> long[] values = BytesUtils.toLongs(value, offset, length); > >> >>>>> for (long longValue : values) { > >> >>>>> if (longValue == val) { > >> >>>>> return 0; > >> >>>>> } > >> >>>>> } > >> >>>>> return 1; > >> >>>>> } > >> >>>>> } > >> >>>>> > >> >>>>> public static long[] toLongs(byte[] value, int offset, int > length) { > >> >>>>> int num = (length - offset) / 8; > >> >>>>> long[] values = new long[num]; > >> >>>>> for (int i = offset; i < num; i++) { > >> >>>>> values[i] = getLong(value, i * 8); > >> >>>>> } > >> >>>>> return values; > >> >>>>> } > >> >>>>> > >> >>>>> > >> >>>>> Strings are similar but would require charset and length for each > >> >> string. > >> >>>>> > >> >>>>> public class StringsComparator extends > WritableByteArrayComparable { > >> >>>>> public int compareTo(byte[] value, int offset, int length) { > >> >>>>> String[] values = BytesUtils.toStrings(value, offset, > length); > >> >>>>> for (String stringValue : values) { > >> >>>>> if (val.equals(stringValue)) { > >> >>>>> return 0; > >> >>>>> } > >> >>>>> } > >> >>>>> return 1; > >> >>>>> } > >> >>>>> } > >> >>>>> > >> >>>>> public static String[] toStrings(byte[] value, int offset, int > >> length) > >> >> { > >> >>>>> ArrayList<String> values = new ArrayList<String>(); > >> >>>>> int idx = 0; > >> >>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length); > >> >>>>> while (idx < length) { > >> >>>>> int size = buffer.getInt(); > >> >>>>> byte[] bytes = new byte[size]; > >> >>>>> buffer.get(bytes); > >> >>>>> values.add(new String(bytes)); > >> >>>>> idx += 4 + size; > >> >>>>> } > >> >>>>> return values.toArray(new String[values.size()]); > >> >>>>> } > >> >>>>> > >> >>>>> > >> >>>>> Am I on the right track or maybe overlooking some implementation > >> >> details? > >> >>>>> Not really sure how to target each comparator to a specific column > >> >> value? > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < > >> >>>> [email protected]>wrote: > >> >>>>> > >> >>>>>> Not an easy task. > >> >>>>>> > >> >>>>>> You first need to determine how you want to store the data > within a > >> >>>> column > >> >>>>>> and/or apply a type constraint to a column. > >> >>>>>> > >> >>>>>> Even if you use JSON records to store your data within a column, > >> does > >> >> an > >> >>>>>> equality comparator exist? If not, you would have to write one. > >> >>>>>> (I kinda think that one may already exist...) > >> >>>>>> > >> >>>>>> > >> >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren < > [email protected]> > >> >>>> wrote: > >> >>>>>> > >> >>>>>>> Hi > >> >>>>>>> > >> >>>>>>> Working with the standard filtering mechanism to scan rows that > >> have > >> >>>>>>> columns matching certain criterias. > >> >>>>>>> > >> >>>>>>> There are columns of numeric (integer and decimal) and string > >> types. > >> >>>>>> These > >> >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", > "b" > >> >> or > >> >>>>>>> "a,b,c" - not sure what the separator would be in the case of > list > >> >>>> types. > >> >>>>>>> Maybe none? > >> >>>>>>> > >> >>>>>>> I would like to compose the following queries to filter out rows > >> that > >> >>>>>> does > >> >>>>>>> not match. > >> >>>>>>> > >> >>>>>>> - contains(String column, String value) > >> >>>>>>> Single valued column that String.contain() provided value. > >> >>>>>>> > >> >>>>>>> - equal(String column, Object value) > >> >>>>>>> Single valued column that Object.equals() provided value. > >> >>>>>>> Value is either string or numeric type. > >> >>>>>>> > >> >>>>>>> - greaterThan(String column, java.lang.Number value) > >> >>>>>>> Single valued column that > provided numeric value. > >> >>>>>>> > >> >>>>>>> - in(String column, Object value...) > >> >>>>>>> Multi-valued column have values that Object.equals() all > provided > >> >>>>>> values. > >> >>>>>>> Values are of string or numeric type. > >> >>>>>>> > >> >>>>>>> How would I design a schema that can take advantage of the > already > >> >>>>>> existing > >> >>>>>>> filters and comparators to accomplish this? > >> >>>>>>> > >> >>>>>>> Already looked at the string and binary comparators but fail to > see > >> >> how > >> >>>>>> to > >> >>>>>>> solve this in a clean way for multi-valued column values. > >> >>>>>>> > >> >>>>>>> Im aware of custom filters but would like to avoid it if > possible. > >> >>>>>>> > >> >>>>>>> Cheers, > >> >>>>>>> -Kristoffer > >> >>>>>> > >> >>>>>> > >> >>>> > >> >>>> > >> >> > >> >> > >> >
