Hi David, I am currently working with HBase with 100 columns. My requirement is perform real time search on HBase using rowkeys, and these many columns ( all within 1 family only in the schema). Typical query can be SQL type with AND OR NOT operators using these columns. I have ruled out batch processing, such as Hive. My question is:
- HBase + Solr will probably give you better query speed, but you need to maintain the both clusters, pushing data from HBase to Solr, and perhaps update Solr index pretty frequently. - Using HBase only and search needs to be against all of these columns, you need to either build secondary indexes for each of the column ( if master table is 1 million rows, you will end up with 100 millions row + 1 million of original master table, which will use quite a lot of space), but I suppose search can be done pretty fast as well ? Not sure what is the best approach, any suggestions ? Thanks -Andrew > From: [email protected] > To: [email protected] > Date: Thu, 29 Sep 2011 08:38:12 -0700 > Subject: RE: Hbase - Solr Integration > > It sounds like you should investigate the Lily Project. They have already > done a lot of work to integrate Solr and HBase into a single solution. I did > something similar before they released their project -- I like my use of > dynamic schema's, but their overall approach is probably more solid. In > particular they have given careful consideration as to what to do with large > objects, and how to integrate them into the system. And most importantly, > their project is open. > > There was also some talk earlier of integrating HBase and Solr -- you might > want to search the list for some of Jason's posts. I think that is a work in > progress still. > > Otherwise you will have to roll your own solution. It is actually not too > difficult to set up a system to publish HBase contents to Solr. The > difficulty is in maintaining a consistent view of the data between the two. > I believe Lily uses queues to keep updates in sync. If you can tolerate some > delay, you could simply update your indexes on a regular basis, or set up > your application to populate HBase and Solr simultaneously. The biggest > challenge is resharding. HBase will automatically split regions when they > become too large. Solr doesn't have that capability yet, so you will have to > manage the shards yourself. > > Another approach is to look at Elastic Search. That is a Lucene based system > that does do automatic sharding. > > Direct search on HBase requires either a clever key encoding (like OpenTSDB), > and/or multiple copies of the data to imitate secondary indexes. > > Dave > > > > -----Original Message----- > From: Stuti Awasthi [mailto:[email protected]] > Sent: Thursday, September 29, 2011 2:52 AM > To: [email protected] > Subject: Hbase - Solr Integration > > Hi Friends, > > I am storing my data in Hbase. I want to do search using Solr. I can't find > much documentation about the integration. Is there any documentation to > integrate these two. > > Please Suggest > > Regards, > Stuti Awasthi > > ::DISCLAIMER:: > ----------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > It shall not attach any liability on the originator or HCL or its affiliates. > Any views or opinions presented in > this email are solely those of the author and may not necessarily reflect the > opinions of HCL or its affiliates. > Any form of reproduction, dissemination, copying, disclosure, modification, > distribution and / or publication of > this message without the prior written consent of the author of this e-mail > is strictly prohibited. If you have > received this email in error please delete it and notify the sender > immediately. Before opening any mail and > attachments please check them for viruses and defect. > > -----------------------------------------------------------------------------------------------------------------------
