On 07/05/13 23:51, Arash Shahkar wrote:
Hi,
Is it possible / advisable to use TDB directly over HDFS in order to store
a large dataset? By directly, I mean specifying the directory address in
the form of an HDFS address like hdfs://...
TDB does not support hdfs://
It could work - code changes would be needed but limited to new
implementations of the disk abstractions used internally. Obviously,
memory mapped files will not work but the direct mode (as used on 32
bit) with large local caches will get somewhere.
As a general observation, HDFS is not designed for the access patterns
that a general purpose database makes which are smallish, random
read/writes. Latency matters.
HDFS is designed around streaming access (= high throughput) to large
amounts of the stored data (and latency is not a consideration).
Andy