Two cents below... On Mon, Sep 10, 2012 at 7:24 AM, Shengjie Min <[email protected]> wrote:
> In my case, I have all the log events stored in HDFS/hbase in this format: > > timestamp | priority | category | message body > > Given I have only 4 fields here, that limits my queries to only against > these four. I am thinking about more advanced search like full text search > the message body. well, mainly substring query against message body. > > 1. > > Has anybody tried to use Hbase SubstringComparator? How does it perform, > with reasonable huge amount of data, can it still provide us the real > time > response capability? > Probably not if "huge" is sufficiently large. Since HBase only stores data indexed by the primary row key, any other criteria search requires a full scan of all data. > 2. > > In my case, does it make more sene to use a proper full text search > engine(lucene/solr/elasticsearch) to index the message body, does that > sound like a better idea? > Often yes. For big data especially, this is where ElasticSearch accels. > > would be great someone experienced can share some stories here. > > -Shengjie Min >
