Re: Schema design for filters

Michel Segel Fri, 28 Jun 2013 16:45:47 -0700

This doesn't make sense in that the OP wants schema less  structure, yet wants 
filtering on columns. The issue is that you do have a limited Schema, so Schema 
less is a misnomer.


In order to do filtering, you need to enforce object type within a column which 
requires a Schema to be enforced.

Again, this can be done in HBase.



Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2013, at 4:30 PM, Asaf Mesika <[email protected]> wrote:

> Yep. Other DBs like
> Mongo may have the stuff you need out of the box.
> Another option is to encode the whole class using Avro, and writing a
> filter on top of that.
> You basically use one column and store it there.
> Yes, you pay the penalty of loading your entire class and extract the
> fields you need to compare against, but I'm really not sure the other way
> is faster, taking into account the hint mechanism in Filter which is
> pinpointed thus grabs more bytes than it needs to.
> 
> Back what was said earlier: 1M rows- why not MySql?
> 
> On Friday, June 28, 2013, Otis Gospodnetic wrote:
> 
>> Hi,
>> 
>> I see.  Btw. isn't HBase for < 1M rows an overkill?
>> Note that Lucene is schemaless and both Solr and Elasticsearch can
>> detect field types, so in a way they are schemaless, too.
>> 
>> Otis
>> --
>> Performance Monitoring -- http://sematext.com/spm
>> 
>> 
>> 
>> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[email protected]>
>> wrote:
>>> @Otis
>>> 
>>> HBase is a natural fit for my usecase because its schemaless. Im
>> building a
>>> configuration management system and there is no need for advanced
>>> filtering/querying capabilities, just basic predicate logic and
>> pagination
>>> that scales to < 1 million rows with reasonable performance.
>>> 
>>> Thanks for the tip!
>>> 
>>> 
>>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
>>> [email protected]> wrote:
>>> 
>>>> Kristoffer,
>>>> 
>>>> You could also consider using something other than HBase, something
>>>> that supports "secondary indices", like anything that is Lucene based
>>>> - Solr and ElasticSearch for example.  We recently compared how we
>>>> aggregate data in HBase (see my signature) and how we would do it if
>>>> we were to use Solr (or ElasticSearch), and so far things look better
>>>> in Solr for our use case.  And our use case involves a lot of
>>>> filtering, slicing and dicing..... something to consider...
>>>> 
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>> Performance Monitoring -- http://sematext.com/spm
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[email protected]>
>>>> wrote:
>>>>> Interesting. Im actually building something similar.
>>>>> 
>>>>> A fullblown SQL implementation is bit overkill for my particular
>> usecase
>>>>> and the query API is the final piece to the puzzle. But ill definitely
>>>> have
>>>>> a look for some inspiration.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[email protected]
>>>>> wrote:
>>>>> 
>>>>>> Hi Kristoffer,
>>>>>> Have you had a look at Phoenix (
>> https://github.com/forcedotcom/phoenix
>>>> )?
>>>>>> You could model your schema much like an O/R mapper and issue SQL
>>>> queries
>>>>>> through Phoenix for your filtering.
>>>>>> 
>>>>>> James
>>>>>> @JamesPlusPlus
>>>>>> http://phoenix-hbase.blogspot.com
>>>>>> 
>>>>>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks for your help Mike. Much appreciated.
>>>>>>> 
>>>>>>> I dont store rows/columns in JSON format. The schema is exactly
>> that
>>>> of a
>>>>>>> specific java class, where the rowkey is a unique object identifier
>>>> with
>>>>>>> the class type encoded into it. Columns are the field names of the
>>>> class
>>>>>>> and the values are that of the object instance.
>>>>>>> 
>>>>>>> Did think about coprocessors but the schema is discovered a runtime
>>>> and I
>>>>>>> cant hard code it.
>>>>>>> 
>>>>>>> However, I still believe that filters might work. Had a look
>>>>>>> at SingleColumnValueFilter and this filter is be able to target
>>>> specific
>>>>>>> column qualifiers with specific WritableByteArrayComparables.
>>>>>>> 
>>>>>>> But list comparators are still missing... So I guess the only way
>> is
>>>> to
>>>>>>> write these comparators?
>>>>>>> 
>>>>>>> Do you follow my reasoning? Will it work?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
>>>>>>> <

Re: Schema design for filters

Reply via email to