Hi thanks for the answer.

I will not use HBase for free-text searching, for that Lucene is way more
mature, scalable etc.

What I want to use HBase for is a somewhat more familiar and clean concept
of storing data than large sequential files spread out on HDFS.
Typical use-cases:

* Search with Lucene in some way: Solr, NutchBean etc.
* Get the actual data from HBase or some other clustered db based on a
primary key which is stored in Lucene.
* Applications get an easier integration point than using CrawlDb.get(...)
or dump.
* This is so we don't store the same data in duplicate (or more) places,
wasting disk.

The yes answers in you mail was they referring to actual implementations ?

Kindly

//Marcus






On Tue, Jun 17, 2008 at 9:07 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Marcus Herou wrote:
>
>> Hi.
>>
>> Anyone tried to implement HBase as storage for:
>>
>
>
> Not yet. We are waiting for HBase to reach certain stability and
> efficiency.
>
>
>> * CrawlDB
>>
>
> Yes.
>
>  * LinkDB
>>
>
> Yes.
>
>  * Fetched and parsed url data
>>
>
> I don't think so, for performance reasons - the page storage needs to offer
> high-performance search and retrieve operations, and I don't think HBase is
> able to provide this level of performance. The current segment format (or
> the future shard format) is for now the best option.
>
>
>> It would certainly be cool I think to be able to search in all these three
>> db's. Currently it is a little bit hard to use the data crawled without
>> actually indexing it.
>>
>
> That's true - on the other hand, the current set of features is optimized
> (read: minimized ;) ) to support the primary functionality, and to do it
> well.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Reply via email to