Marcus Herou wrote:
Hi.

Anyone tried to implement HBase as storage for:


Not yet. We are waiting for HBase to reach certain stability and efficiency.


* CrawlDB

Yes.

* LinkDB

Yes.

* Fetched and parsed url data

I don't think so, for performance reasons - the page storage needs to offer high-performance search and retrieve operations, and I don't think HBase is able to provide this level of performance. The current segment format (or the future shard format) is for now the best option.


It would certainly be cool I think to be able to search in all these three
db's. Currently it is a little bit hard to use the data crawled without
actually indexing it.

That's true - on the other hand, the current set of features is optimized (read: minimized ;) ) to support the primary functionality, and to do it well.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to