Hi, I see. Btw. isn't HBase for < 1M rows an overkill? Note that Lucene is schemaless and both Solr and Elasticsearch can detect field types, so in a way they are schemaless, too.
Otis -- Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[email protected]> wrote: > @Otis > > HBase is a natural fit for my usecase because its schemaless. Im building a > configuration management system and there is no need for advanced > filtering/querying capabilities, just basic predicate logic and pagination > that scales to < 1 million rows with reasonable performance. > > Thanks for the tip! > > > On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic < > [email protected]> wrote: > >> Kristoffer, >> >> You could also consider using something other than HBase, something >> that supports "secondary indices", like anything that is Lucene based >> - Solr and ElasticSearch for example. We recently compared how we >> aggregate data in HBase (see my signature) and how we would do it if >> we were to use Solr (or ElasticSearch), and so far things look better >> in Solr for our use case. And our use case involves a lot of >> filtering, slicing and dicing..... something to consider... >> >> Otis >> -- >> Solr & ElasticSearch Support -- http://sematext.com/ >> Performance Monitoring -- http://sematext.com/spm >> >> >> >> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[email protected]> >> wrote: >> > Interesting. Im actually building something similar. >> > >> > A fullblown SQL implementation is bit overkill for my particular usecase >> > and the query API is the final piece to the puzzle. But ill definitely >> have >> > a look for some inspiration. >> > >> > Thanks! >> > >> > >> > >> > On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[email protected] >> >wrote: >> > >> >> Hi Kristoffer, >> >> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix >> )? >> >> You could model your schema much like an O/R mapper and issue SQL >> queries >> >> through Phoenix for your filtering. >> >> >> >> James >> >> @JamesPlusPlus >> >> http://phoenix-hbase.blogspot.com >> >> >> >> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]> >> >> wrote: >> >> >> >> > Thanks for your help Mike. Much appreciated. >> >> > >> >> > I dont store rows/columns in JSON format. The schema is exactly that >> of a >> >> > specific java class, where the rowkey is a unique object identifier >> with >> >> > the class type encoded into it. Columns are the field names of the >> class >> >> > and the values are that of the object instance. >> >> > >> >> > Did think about coprocessors but the schema is discovered a runtime >> and I >> >> > cant hard code it. >> >> > >> >> > However, I still believe that filters might work. Had a look >> >> > at SingleColumnValueFilter and this filter is be able to target >> specific >> >> > column qualifiers with specific WritableByteArrayComparables. >> >> > >> >> > But list comparators are still missing... So I guess the only way is >> to >> >> > write these comparators? >> >> > >> >> > Do you follow my reasoning? Will it work? >> >> > >> >> > >> >> > >> >> > >> >> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel >> >> > <[email protected]>wrote: >> >> > >> >> >> Ok... >> >> >> >> >> >> If you want to do type checking and schema enforcement... >> >> >> >> >> >> You will need to do this as a coprocessor. >> >> >> >> >> >> The quick and dirty way... (Not recommended) would be to hard code >> the >> >> >> schema in to the co-processor code.) >> >> >> >> >> >> A better way... at start up, load up ZK to manage the set of known >> table >> >> >> schemas which would be a map of column qualifier to data type. >> >> >> (If JSON then you need to do a separate lookup to get the records >> >> schema) >> >> >> >> >> >> Then a single java class that does the look up and then handles the >> >> known >> >> >> data type comparators. >> >> >> >> >> >> Does this make sense? >> >> >> (Sorry, kinda was thinking this out as I typed the response. But it >> >> should >> >> >> work ) >> >> >> >> >> >> At least it would be a design approach I would talk. YMMV >> >> >> >> >> >> Having said that, I expect someone to say its a bad idea and that >> they >> >> >> have a better solution. >> >> >> >> >> >> HTH >> >> >> >> >> >> -Mike >> >> >> >> >> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[email protected]> >> >> wrote: >> >> >> >> >> >>> I see your point. Everything is just bytes. >> >> >>> >> >> >>> However, the schema is known and every row is formatted according to >> >> this >> >> >>> schema, although some columns may not exist, that is, no value exist >> >> for >> >> >>> this property on this row. >> >> >>> >> >> >>> So if im able to apply these "typed comparators" to the right cell >> >> values >> >> >>> it may be possible? But I cant find a filter that target specific >> >> >> columns? >> >> >>> >> >> >>> Seems like all filters scan every column/qualifier and there is no >> way >> >> of >> >> >>> knowing what column is currently being evaluated? >> >> >>> >> >> >>> >> >> >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel >> >> >>> <[email protected]>wrote: >> >> >>> >> >> >>>> You have to remember that HBase doesn't enforce any sort of typing. >> >> >>>> That's why this can be difficult. >> >> >>>> >> >> >>>> You'd have to write a coprocessor to enforce a schema on a table. >> >> >>>> Even then YMMV if you're writing JSON structures to a column >> because >> >> >> while >> >> >>>> the contents of the structures could be the same, the actual >> strings >> >> >> could >> >> >>>> differ. >> >> >>>> >> >> >>>> HTH >> >> >>>> >> >> >>>> -Mike >> >> >>>> >> >> >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[email protected]> >> >> >> wrote: >> >> >>>> >> >> >>>>> I realize standard comparators cannot solve this. >> >> >>>>> >> >> >>>>> However I do know the type of each column so writing custom list >> >> >>>>> comparators for boolean, char, byte, short, int, long, float, >> double >> >> >>>> seems >> >> >>>>> quite straightforward. >> >> >>>>> >> >> >>>>> Long arrays, for example, are stored as a byte array with 8 bytes >> per >> >> >>>> item >> >> >>>>> so a comparator might look like this. >> >> >>>>> >> >> >>>>> public class LongsComparator extends WritableByteArrayComparable { >> >> >>>>> public int compareTo(byte[] value, int offset, int length) { >> >> >>>>> long[] values = BytesUtils.toLongs(value, offset, length); >> >> >>>>> for (long longValue : values) { >> >> >>>>> if (longValue == val) { >> >> >>>>> return 0; >> >> >>>>> } >> >> >>>>> } >> >> >>>>> return 1; >> >> >>>>> } >> >> >>>>> } >> >> >>>>> >> >> >>>>> public static long[] toLongs(byte[] value, int offset, int >> length) { >> >> >>>>> int num = (length - offset) / 8; >> >> >>>>> long[] values = new long[num]; >> >> >>>>> for (int i = offset; i < num; i++) { >> >> >>>>> values[i] = getLong(value, i * 8); >> >> >>>>> } >> >> >>>>> return values; >> >> >>>>> } >> >> >>>>> >> >> >>>>> >> >> >>>>> Strings are similar but would require charset and length for each >> >> >> string. >> >> >>>>> >> >> >>>>> public class StringsComparator extends >> WritableByteArrayComparable { >> >> >>>>> public int compareTo(byte[] value, int offset, int length) { >> >> >>>>> String[] values = BytesUtils.toStrings(value, offset, >> length); >> >> >>>>> for (String stringValue : values) { >> >> >>>>> if (val.equals(stringValue)) { >> >> >>>>> return 0; >> >> >>>>> } >> >> >>>>> } >> >> >>>>> return 1; >> >> >>>>> } >> >> >>>>> } >> >> >>>>> >> >> >>>>> public static String[] toStrings(byte[] value, int offset, int >> >> length) >> >> >> { >> >> >>>>> ArrayList<String> values = new ArrayList<String>(); >> >> >>>>> int idx = 0; >> >> >>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length); >> >> >>>>> while (idx < length) { >> >> >>>>> int size = buffer.getInt(); >> >> >>>>> byte[] bytes = new byte[size]; >> >> >>>>> buffer.get(bytes); >> >> >>>>> values.add(new String(bytes)); >> >> >>>>> idx += 4 + size; >> >> >>>>> } >> >> >>>>> return values.toArray(new String[values.size()]); >> >> >>>>> } >> >> >>>>> >> >> >>>>> >> >> >>>>> Am I on the right track or maybe overlooking some implementation >> >> >> details? >> >> >>>>> Not really sure how to target each comparator to a specific column >> >> >> value? >> >> >>>>> >> >> >>>>> >> >> >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < >> >> >>>> [email protected]>wrote: >> >> >>>>> >> >> >>>>>> Not an easy task. >> >> >>>>>> >> >> >>>>>> You first need to determine how you want to store the data >> within a >> >> >>>> column >> >> >>>>>> and/or apply a type constraint to a column. >> >> >>>>>> >> >> >>>>>> Even if you use JSON records to store your data within a column, >> >> does >> >> >> an >> >> >>>>>> equality comparator exist? If not, you would have to write one. >> >> >>>>>> (I kinda think that one may already exist...) >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren < >> [email protected]> >> >> >>>> wrote: >> >> >>>>>> >> >> >>>>>>> Hi >> >> >>>>>>> >> >> >>>>>>> Working with the standard filtering mechanism to scan rows that >> >> have >> >> >>>>>>> columns matching certain criterias. >> >> >>>>>>> >> >> >>>>>>> There are columns of numeric (integer and decimal) and string >> >> types. >> >> >>>>>> These >> >> >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", >> "b" >> >> >> or >> >> >>>>>>> "a,b,c" - not sure what the separator would be in the case of >> list >> >> >>>> types. >> >> >>>>>>> Maybe none? >> >> >>>>>>> >> >> >>>>>>> I would like to compose the following queries to filter out rows >> >> that >> >> >>>>>> does >> >> >>>>>>> not match. >> >> >>>>>>> >> >> >>>>>>> - contains(String column, String value) >> >> >>>>>>> Single valued column that String.contain() provided value. >> >> >>>>>>> >> >> >>>>>>> - equal(String column, Object value) >> >> >>>>>>> Single valued column that Object.equals() provided value. >> >> >>>>>>> Value is either string or numeric type. >> >> >>>>>>> >> >> >>>>>>> - greaterThan(String column, java.lang.Number value) >> >> >>>>>>> Single valued column that > provided numeric value. >> >> >>>>>>> >> >> >>>>>>> - in(String column, Object value...) >> >> >>>>>>> Multi-valued column have values that Object.equals() all >> provided >> >> >>>>>> values. >> >> >>>>>>> Values are of string or numeric type. >> >> >>>>>>> >> >> >>>>>>> How would I design a schema that can take advantage of the >> already >> >> >>>>>> existing >> >> >>>>>>> filters and comparators to accomplish this? >> >> >>>>>>> >> >> >>>>>>> Already looked at the string and binary comparators but fail to >> see >> >> >> how >> >> >>>>>> to >> >> >>>>>>> solve this in a clean way for multi-valued column values. >> >> >>>>>>> >> >> >>>>>>> Im aware of custom filters but would like to avoid it if >> possible. >> >> >>>>>>> >> >> >>>>>>> Cheers, >> >> >>>>>>> -Kristoffer >> >> >>>>>> >> >> >>>>>> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >> >> >>
