Re: TDB store parameters

2016-08-28 Thread Andy Seaborne

On 26/08/16 16:31, Laurent Rucquoy wrote:

We use Microsoft Windows servers.


Mapped vs direct seems to make less difference on Windows.  That said, I 
have had more experience deploying in Linux but there have been some 
reports from Windows users that suggest this is the case.  And you can't 
delete mapped datasets even after releasing the dataset on Windows. 
It's a Java issue.


For small datasets, it makes little performance difference anyway.

Andy





On 26 August 2016 at 13:01, Andy Seaborne  wrote:


On 26/08/16 08:59, Laurent Rucquoy wrote:


Hello Andy,

Thank you for your help.

The params I'm mainly interested in changing are those of the profile
returned by StoreParams.getSmallStoreParams() to be able to reduce the
dataset size.



That is best done when creating the dataset in the first place.

It reduces the in-memory cache foot print; it uses direct mode which uses
in-JVM file cache but it does not swamp the machine with memory mapped
files.

For small datasets, it makes the file size seem less. The memory mapped
files on Linux are spare files - space allocated but not used. The empty
dataset on disk is 150K for Linux even though many file sizes are 8M. Some
other OSs may allocate the whole space or they may misreport sparse files)

Except the test of changing the fileMode from mapped to direct, I've not

made finer tuning on the other parameters, this is why the
StoreParams.getSmallStoreParams()
seems to be convenient for our needs.

I've another question about this case:

What will be the size result of changing from default store params to
small
store params on an existing TDB dataset ?



Not much.  The files reporting 8M will report 8k but the actual size is
the same because all databases are compatible unless you change the block
size or indexing.

I think this will have an effect on future writing (i.e. the existing size

on disk will not be compacted -> is there a direct way or an existing tool
able to compact the size of an existing dataset ?)



Correct.



Regards,
Laurent



What OS are you using?

Andy





On 26 August 2016 at 00:22, Andy Seaborne  wrote:

On 25/08/16 16:16, Laurent Rucquoy wrote:


Hello,


I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide
a
method to change the StoreParams of this dataset.

Because changing the StoreParams implies to release the corresponding





dataset location, I'd like to identify the current StoreParams in use to

be
able to avoid to release the location if the StoreParams we want to
apply
now are the same as those currently used.



Release is not so bad unless you are doing it frequently.


What is the right way to do this (if possible) ?




This may work:

DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph)
StoreParams sp = x.getConfig().params ;
System.out.println(sp);

(the "may" is because I only think it works on a live dataset no tested
it)

Obviously the name "TDBInternal" is a warning!

Which params are you interested in changing?

Andy

Defaults:

fileMode   dft:mapped
blockSize  dft:8192
readCacheSize  dft:1
writeCacheSize dft:2000
Node2NodeIdCacheSize   dft:10
NodeId2NodeCacheSize   dft:50
NodeMissCacheSize  dft:100
indexNode2Id   dft:node2id
indexId2Node   dft:nodes
primaryIndexTriplesdft:SPO
tripleIndexes  dft:[SPO, POS, OSP]
primaryIndexQuads  dft:GSPO
quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG]
primaryIndexPrefix dft:GPU
prefixIndexes  dft:[GPU]
indexPrefixdft:prefixIdx
prefixNode2Id  dft:prefix2id
prefixId2Node  dft:prefixes



Thank you in advance for your help.


Sincerely,
Laurent















Re: TDB store parameters

2016-08-26 Thread Laurent Rucquoy
We use Microsoft Windows servers.



On 26 August 2016 at 13:01, Andy Seaborne  wrote:

> On 26/08/16 08:59, Laurent Rucquoy wrote:
>
>> Hello Andy,
>>
>> Thank you for your help.
>>
>> The params I'm mainly interested in changing are those of the profile
>> returned by StoreParams.getSmallStoreParams() to be able to reduce the
>> dataset size.
>>
>
> That is best done when creating the dataset in the first place.
>
> It reduces the in-memory cache foot print; it uses direct mode which uses
> in-JVM file cache but it does not swamp the machine with memory mapped
> files.
>
> For small datasets, it makes the file size seem less. The memory mapped
> files on Linux are spare files - space allocated but not used. The empty
> dataset on disk is 150K for Linux even though many file sizes are 8M. Some
> other OSs may allocate the whole space or they may misreport sparse files)
>
> Except the test of changing the fileMode from mapped to direct, I've not
>> made finer tuning on the other parameters, this is why the
>> StoreParams.getSmallStoreParams()
>> seems to be convenient for our needs.
>>
>> I've another question about this case:
>>
>> What will be the size result of changing from default store params to
>> small
>> store params on an existing TDB dataset ?
>>
>
> Not much.  The files reporting 8M will report 8k but the actual size is
> the same because all databases are compatible unless you change the block
> size or indexing.
>
> I think this will have an effect on future writing (i.e. the existing size
>> on disk will not be compacted -> is there a direct way or an existing tool
>> able to compact the size of an existing dataset ?)
>>
>
> Correct.
>
>
>> Regards,
>> Laurent
>>
>
> What OS are you using?
>
> Andy
>
>
>
>>
>> On 26 August 2016 at 00:22, Andy Seaborne  wrote:
>>
>> On 25/08/16 16:16, Laurent Rucquoy wrote:
>>>
>>> Hello,

 I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide
 a
 method to change the StoreParams of this dataset.

 Because changing the StoreParams implies to release the corresponding
>

>>> dataset location, I'd like to identify the current StoreParams in use to
 be
 able to avoid to release the location if the StoreParams we want to
 apply
 now are the same as those currently used.


>>> Release is not so bad unless you are doing it frequently.
>>>
>>>
>>> What is the right way to do this (if possible) ?


>>> This may work:
>>>
>>> DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph)
>>> StoreParams sp = x.getConfig().params ;
>>> System.out.println(sp);
>>>
>>> (the "may" is because I only think it works on a live dataset no tested
>>> it)
>>>
>>> Obviously the name "TDBInternal" is a warning!
>>>
>>> Which params are you interested in changing?
>>>
>>> Andy
>>>
>>> Defaults:
>>>
>>> fileMode   dft:mapped
>>> blockSize  dft:8192
>>> readCacheSize  dft:1
>>> writeCacheSize dft:2000
>>> Node2NodeIdCacheSize   dft:10
>>> NodeId2NodeCacheSize   dft:50
>>> NodeMissCacheSize  dft:100
>>> indexNode2Id   dft:node2id
>>> indexId2Node   dft:nodes
>>> primaryIndexTriplesdft:SPO
>>> tripleIndexes  dft:[SPO, POS, OSP]
>>> primaryIndexQuads  dft:GSPO
>>> quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG]
>>> primaryIndexPrefix dft:GPU
>>> prefixIndexes  dft:[GPU]
>>> indexPrefixdft:prefixIdx
>>> prefixNode2Id  dft:prefix2id
>>> prefixId2Node  dft:prefixes
>>>
>>>
>>>
>>> Thank you in advance for your help.

 Sincerely,
 Laurent



>>>
>>
>


Re: TDB store parameters

2016-08-25 Thread Andy Seaborne

On 25/08/16 16:16, Laurent Rucquoy wrote:

Hello,

I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide a
method to change the StoreParams of this dataset.

> > Because changing the StoreParams implies to release the corresponding

dataset location, I'd like to identify the current StoreParams in use to be
able to avoid to release the location if the StoreParams we want to apply
now are the same as those currently used.


Release is not so bad unless you are doing it frequently.



What is the right way to do this (if possible) ?


This may work:

DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph)
StoreParams sp = x.getConfig().params ;
System.out.println(sp);

(the "may" is because I only think it works on a live dataset no tested it)

Obviously the name "TDBInternal" is a warning!

Which params are you interested in changing?

Andy

Defaults:

fileMode   dft:mapped
blockSize  dft:8192
readCacheSize  dft:1
writeCacheSize dft:2000
Node2NodeIdCacheSize   dft:10
NodeId2NodeCacheSize   dft:50
NodeMissCacheSize  dft:100
indexNode2Id   dft:node2id
indexId2Node   dft:nodes
primaryIndexTriplesdft:SPO
tripleIndexes  dft:[SPO, POS, OSP]
primaryIndexQuads  dft:GSPO
quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG]
primaryIndexPrefix dft:GPU
prefixIndexes  dft:[GPU]
indexPrefixdft:prefixIdx
prefixNode2Id  dft:prefix2id
prefixId2Node  dft:prefixes



Thank you in advance for your help.

Sincerely,
Laurent