On 26/08/16 08:59, Laurent Rucquoy wrote:
Hello Andy,

Thank you for your help.

The params I'm mainly interested in changing are those of the profile
returned by StoreParams.getSmallStoreParams() to be able to reduce the
dataset size.

That is best done when creating the dataset in the first place.

It reduces the in-memory cache foot print; it uses direct mode which uses in-JVM file cache but it does not swamp the machine with memory mapped files.

For small datasets, it makes the file size seem less. The memory mapped files on Linux are spare files - space allocated but not used. The empty dataset on disk is 150K for Linux even though many file sizes are 8M. Some other OSs may allocate the whole space or they may misreport sparse files)

Except the test of changing the fileMode from mapped to direct, I've not
made finer tuning on the other parameters, this is why the
StoreParams.getSmallStoreParams()
seems to be convenient for our needs.

I've another question about this case:

What will be the size result of changing from default store params to small
store params on an existing TDB dataset ?

Not much. The files reporting 8M will report 8k but the actual size is the same because all databases are compatible unless you change the block size or indexing.

I think this will have an effect on future writing (i.e. the existing size
on disk will not be compacted -> is there a direct way or an existing tool
able to compact the size of an existing dataset ?)

Correct.


Regards,
Laurent

What OS are you using?

        Andy



On 26 August 2016 at 00:22, Andy Seaborne <[email protected]> wrote:

On 25/08/16 16:16, Laurent Rucquoy wrote:

Hello,

I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide a
method to change the StoreParams of this dataset.

Because changing the StoreParams implies to release the corresponding

dataset location, I'd like to identify the current StoreParams in use to
be
able to avoid to release the location if the StoreParams we want to apply
now are the same as those currently used.


Release is not so bad unless you are doing it frequently.


What is the right way to do this (if possible) ?


This may work:

DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph)
StoreParams sp = x.getConfig().params ;
System.out.println(sp);

(the "may" is because I only think it works on a live dataset no tested it)

Obviously the name "TDBInternal" is a warning!

Which params are you interested in changing?

    Andy

Defaults:

fileMode               dft:mapped
blockSize              dft:8192
readCacheSize          dft:10000
writeCacheSize         dft:2000
Node2NodeIdCacheSize   dft:100000
NodeId2NodeCacheSize   dft:500000
NodeMissCacheSize      dft:100
indexNode2Id           dft:node2id
indexId2Node           dft:nodes
primaryIndexTriples    dft:SPO
tripleIndexes          dft:[SPO, POS, OSP]
primaryIndexQuads      dft:GSPO
quadIndexes            dft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG]
primaryIndexPrefix     dft:GPU
prefixIndexes          dft:[GPU]
indexPrefix            dft:prefixIdx
prefixNode2Id          dft:prefix2id
prefixId2Node          dft:prefixes



Thank you in advance for your help.

Sincerely,
Laurent





Reply via email to