On 07/05/13 23:51, Arash Shahkar wrote:
Hi,

Is it possible / advisable to use TDB directly over HDFS in order to store
a large dataset? By directly, I mean specifying the directory address in
the form of an HDFS address like hdfs://...


TDB does not support hdfs://

It could work - code changes would be needed but limited to new implementations of the disk abstractions used internally. Obviously, memory mapped files will not work but the direct mode (as used on 32 bit) with large local caches will get somewhere.

As a general observation, HDFS is not designed for the access patterns that a general purpose database makes which are smallish, random read/writes. Latency matters.

HDFS is designed around streaming access (= high throughput) to large amounts of the stored data (and latency is not a consideration).

        Andy

Reply via email to