Marcus Herou wrote:
Hi.
Anyone tried to implement HBase as storage for:
Not yet. We are waiting for HBase to reach certain stability and efficiency.
* CrawlDB
Yes.
* LinkDB
Yes.
* Fetched and parsed url data
I don't think so, for performance reasons - the page storage needs to
offer high-performance search and retrieve operations, and I don't think
HBase is able to provide this level of performance. The current segment
format (or the future shard format) is for now the best option.
It would certainly be cool I think to be able to search in all these three
db's. Currently it is a little bit hard to use the data crawled without
actually indexing it.
That's true - on the other hand, the current set of features is
optimized (read: minimized ;) ) to support the primary functionality,
and to do it well.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com