On 24/09/13 19:36, Zhiyun Qian wrote:
Hi all,
Currently when I want to update an existing TDB, I simply open it using
"memory-mapped file" mode (I'm using 64-bit) and then call
"model.createResource()" repeatedly which will get reflected onto the TDB
as the program runs.
The bulk loaders are faster. They work directly on index files and
order things efficiently for bulk loads. They onyl work on an empty
dataset, other wise the java-based one falls back to incremental update.
I'm quite curious about the details behind the scenes.
1. According to my understanding: when I open the existing TDB, it does not
load any data from disk just yet. it only loads on-demand whenever an
existing node needs to be referenced (for instance, let's say the existing
TDB has the triple "A p B" and I'm trying to add "A p C". This requires A
to be loaded in memory first). In this case, if I'm not referencing any
existing nodes, there's no need to load anything from the existing TDB at
all.
It has to have A not "A p B" - there is a separate node table - and you
are accessing an existing node (2 in fact) on "A p C". The node table
has a big cache in front of it.
2. Even though the TDB is loaded in "memory-mapped file", does the program
really have to periodically write to disk (assuming there's still enough
physical memory)? Can somehow the program write only when it runs out of
physical memory? Additionally, after writing the disk, can the
corresponding data in memory be freed (or maybe keep a cache of much
smaller set)?
The file is written to disk in parts and it's under OS control, not the
program. Memory mapped files are like swap. The OS manages what is
in-memory and what is not.
A memory mapped file appears as a very large virtual memory area, and it
accessed as a very lareg area of bytes (ByteBuffer). The OS controls
what is really in-memory and what is left on disk. Writing to a mmap
file does not cause the OS to write it out immediately. The OS writes
dirty pages only when it wants to free up real memory for some other
use, like another part of the file that is now accessed.
Andy
Any comments are welcome. Thanks!
-Zhiyun