HBasene is dead. Watch HBASE-3529. Otis We're hiring HBase / Hadoop / Hive / Mahout engineers with interest in Big Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
From: "Hiller, Dean x66079" <[email protected]> To: "[email protected]" <[email protected]> >Sent: Friday, June 17, 2011 4:21 PM >Subject: RE: What's the best approach to search in HBase? > >What about using Hbasene....is it pretty good....looks just like a distributed >Lucene and the same api and everything? > >Later, >Dean > >-----Original Message----- >From: Mark Kerzner [mailto:[email protected]] >Sent: Wednesday, June 15, 2011 10:10 PM >To: [email protected] >Subject: Re: What's the best approach to search in HBase? > >Thank you, everybody. I summarized your advice here, >http://shmsoft.blogspot.com/2011/06/search-in-ediscovery.html, because I >need it for my open source eDiscovery, and now just need to try it all :) > >Sincerely, >Mark > >On Mon, Jun 6, 2011 at 11:18 AM, Buttler, David <[email protected]> wrote: > >> I store over 500M documents in HBase, and index using Solr with dynamic >> fields. This gives you tremendous flexibility to do the type of queries you >> are looking for -- and to make them simple and intuitive via a faceted >> interface. >> >> However, there was quite a bit of software that we had to write to get >> things going, and I can neither release all of it open source, or support >> other people using it. If I had to start again, I would seriously look at >> solutions like elastic search and lily. >> >> Dave >> >> -----Original Message----- >> From: Mark Kerzner [mailto:[email protected]] >> Sent: Friday, June 03, 2011 5:57 PM >> To: HBase Discussion Group >> Subject: What's the best approach to search in HBase? >> >> Hi, >> >> I need to store, say, 10M-100M documents, with each document having say 100 >> fields, like author, creation date, access date, etc., and then I want to >> ask questions like >> >> give me all documents whose author is like abc**, and creation date any >> time >> in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions, >> matching a list of some keywords. >> >> What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan >> and compare of every record? >> >> Thanks a bunch! >> >> Mark >> >This message and any attachments are intended only for the use of the >addressee and >may contain information that is privileged and confidential. If the reader of >the >message is not the intended recipient or an authorized representative of the >intended recipient, you are hereby notified that any dissemination of this >communication is strictly prohibited. If you have received this communication >in >error, please notify us immediately by e-mail and delete the message and any >attachments from your system. > > > >
