Hi David,

I am currently working with HBase with 100 columns. My requirement is 
perform real time search on HBase using rowkeys, and these many columns (
 all within 1 family only in the schema). Typical query can be SQL type 
with AND OR NOT operators using these columns. I have ruled out batch 
processing, such as 
Hive. My question is:

- HBase + Solr will probably give you 
better query speed, but you need to maintain the both clusters, pushing 
data from HBase to Solr, and perhaps update Solr index pretty frequently.
- Using HBase only and search needs to be 
against all of these columns, you need to either build secondary indexes
 for each of the column ( if master table is 1 million rows, you will 
end up with 100 millions row + 1 million of original master table,  
which will use quite a lot of space), but I suppose search can be done 
pretty fast as well ?

Not sure what is the best approach, any suggestions ?


Thanks

-Andrew

> From: [email protected]
> To: [email protected]
> Date: Thu, 29 Sep 2011 08:38:12 -0700
> Subject: RE: Hbase - Solr Integration
> 
> It sounds like you should investigate the Lily Project.  They have already 
> done a lot of work to integrate Solr and HBase into a single solution.  I did 
> something similar before they released their project -- I like my use of 
> dynamic schema's, but their overall approach is probably more solid.  In 
> particular they have given careful consideration as to what to do with large 
> objects, and how to integrate them into the system.  And most importantly, 
> their project is open.
> 
> There was also some talk earlier of integrating HBase and Solr -- you might 
> want to search the list for some of Jason's posts.  I think that is a work in 
> progress still.
> 
> Otherwise you will have to roll your own solution.  It is actually not too 
> difficult to set up a system to publish HBase contents to Solr.  The 
> difficulty is in maintaining a consistent view of the data between the two.  
> I believe Lily uses queues to keep updates in sync.  If you can tolerate some 
> delay, you could simply update your indexes on a regular basis, or set up 
> your application to populate HBase and Solr simultaneously.  The biggest 
> challenge is resharding.  HBase will automatically split regions when they 
> become too large.  Solr doesn't have that capability yet, so you will have to 
> manage the shards yourself.
> 
> Another approach is to look at Elastic Search. That is a Lucene based system 
> that does do automatic sharding.
> 
> Direct search on HBase requires either a clever key encoding (like OpenTSDB), 
> and/or multiple copies of the data to imitate secondary indexes.
> 
> Dave
> 
> 
> 
> -----Original Message-----
> From: Stuti Awasthi [mailto:[email protected]] 
> Sent: Thursday, September 29, 2011 2:52 AM
> To: [email protected]
> Subject: Hbase - Solr Integration
> 
> Hi Friends,
> 
> I am storing my data in Hbase. I want to do search using Solr. I can't find 
> much documentation about the integration. Is there any documentation to 
> integrate these two.
> 
> Please Suggest
> 
> Regards,
> Stuti Awasthi
> 
> ::DISCLAIMER::
> -----------------------------------------------------------------------------------------------------------------------
> 
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. 
> Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the 
> opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, 
> distribution and / or publication of
> this message without the prior written consent of the author of this e-mail 
> is strictly prohibited. If you have
> received this email in error please delete it and notify the sender 
> immediately. Before opening any mail and
> attachments please check them for viruses and defect.
> 
> -----------------------------------------------------------------------------------------------------------------------
                                          

Reply via email to