Hi thanks for the answer. I will not use HBase for free-text searching, for that Lucene is way more mature, scalable etc.
What I want to use HBase for is a somewhat more familiar and clean concept of storing data than large sequential files spread out on HDFS. Typical use-cases: * Search with Lucene in some way: Solr, NutchBean etc. * Get the actual data from HBase or some other clustered db based on a primary key which is stored in Lucene. * Applications get an easier integration point than using CrawlDb.get(...) or dump. * This is so we don't store the same data in duplicate (or more) places, wasting disk. The yes answers in you mail was they referring to actual implementations ? Kindly //Marcus On Tue, Jun 17, 2008 at 9:07 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Marcus Herou wrote: > >> Hi. >> >> Anyone tried to implement HBase as storage for: >> > > > Not yet. We are waiting for HBase to reach certain stability and > efficiency. > > >> * CrawlDB >> > > Yes. > > * LinkDB >> > > Yes. > > * Fetched and parsed url data >> > > I don't think so, for performance reasons - the page storage needs to offer > high-performance search and retrieve operations, and I don't think HBase is > able to provide this level of performance. The current segment format (or > the future shard format) is for now the best option. > > >> It would certainly be cool I think to be able to search in all these three >> db's. Currently it is a little bit hard to use the data crawled without >> actually indexing it. >> > > That's true - on the other hand, the current set of features is optimized > (read: minimized ;) ) to support the primary functionality, and to do it > well. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com/ http://blogg.tailsweep.com/
