Re: TDB store parameters
On 26/08/16 16:31, Laurent Rucquoy wrote: We use Microsoft Windows servers. Mapped vs direct seems to make less difference on Windows. That said, I have had more experience deploying in Linux but there have been some reports from Windows users that suggest this is the case. And you can't delete mapped datasets even after releasing the dataset on Windows. It's a Java issue. For small datasets, it makes little performance difference anyway. Andy On 26 August 2016 at 13:01, Andy Seabornewrote: On 26/08/16 08:59, Laurent Rucquoy wrote: Hello Andy, Thank you for your help. The params I'm mainly interested in changing are those of the profile returned by StoreParams.getSmallStoreParams() to be able to reduce the dataset size. That is best done when creating the dataset in the first place. It reduces the in-memory cache foot print; it uses direct mode which uses in-JVM file cache but it does not swamp the machine with memory mapped files. For small datasets, it makes the file size seem less. The memory mapped files on Linux are spare files - space allocated but not used. The empty dataset on disk is 150K for Linux even though many file sizes are 8M. Some other OSs may allocate the whole space or they may misreport sparse files) Except the test of changing the fileMode from mapped to direct, I've not made finer tuning on the other parameters, this is why the StoreParams.getSmallStoreParams() seems to be convenient for our needs. I've another question about this case: What will be the size result of changing from default store params to small store params on an existing TDB dataset ? Not much. The files reporting 8M will report 8k but the actual size is the same because all databases are compatible unless you change the block size or indexing. I think this will have an effect on future writing (i.e. the existing size on disk will not be compacted -> is there a direct way or an existing tool able to compact the size of an existing dataset ?) Correct. Regards, Laurent What OS are you using? Andy On 26 August 2016 at 00:22, Andy Seaborne wrote: On 25/08/16 16:16, Laurent Rucquoy wrote: Hello, I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide a method to change the StoreParams of this dataset. Because changing the StoreParams implies to release the corresponding dataset location, I'd like to identify the current StoreParams in use to be able to avoid to release the location if the StoreParams we want to apply now are the same as those currently used. Release is not so bad unless you are doing it frequently. What is the right way to do this (if possible) ? This may work: DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph) StoreParams sp = x.getConfig().params ; System.out.println(sp); (the "may" is because I only think it works on a live dataset no tested it) Obviously the name "TDBInternal" is a warning! Which params are you interested in changing? Andy Defaults: fileMode dft:mapped blockSize dft:8192 readCacheSize dft:1 writeCacheSize dft:2000 Node2NodeIdCacheSize dft:10 NodeId2NodeCacheSize dft:50 NodeMissCacheSize dft:100 indexNode2Id dft:node2id indexId2Node dft:nodes primaryIndexTriplesdft:SPO tripleIndexes dft:[SPO, POS, OSP] primaryIndexQuads dft:GSPO quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG] primaryIndexPrefix dft:GPU prefixIndexes dft:[GPU] indexPrefixdft:prefixIdx prefixNode2Id dft:prefix2id prefixId2Node dft:prefixes Thank you in advance for your help. Sincerely, Laurent
Re: TDB store parameters
We use Microsoft Windows servers. On 26 August 2016 at 13:01, Andy Seabornewrote: > On 26/08/16 08:59, Laurent Rucquoy wrote: > >> Hello Andy, >> >> Thank you for your help. >> >> The params I'm mainly interested in changing are those of the profile >> returned by StoreParams.getSmallStoreParams() to be able to reduce the >> dataset size. >> > > That is best done when creating the dataset in the first place. > > It reduces the in-memory cache foot print; it uses direct mode which uses > in-JVM file cache but it does not swamp the machine with memory mapped > files. > > For small datasets, it makes the file size seem less. The memory mapped > files on Linux are spare files - space allocated but not used. The empty > dataset on disk is 150K for Linux even though many file sizes are 8M. Some > other OSs may allocate the whole space or they may misreport sparse files) > > Except the test of changing the fileMode from mapped to direct, I've not >> made finer tuning on the other parameters, this is why the >> StoreParams.getSmallStoreParams() >> seems to be convenient for our needs. >> >> I've another question about this case: >> >> What will be the size result of changing from default store params to >> small >> store params on an existing TDB dataset ? >> > > Not much. The files reporting 8M will report 8k but the actual size is > the same because all databases are compatible unless you change the block > size or indexing. > > I think this will have an effect on future writing (i.e. the existing size >> on disk will not be compacted -> is there a direct way or an existing tool >> able to compact the size of an existing dataset ?) >> > > Correct. > > >> Regards, >> Laurent >> > > What OS are you using? > > Andy > > > >> >> On 26 August 2016 at 00:22, Andy Seaborne wrote: >> >> On 25/08/16 16:16, Laurent Rucquoy wrote: >>> >>> Hello, I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide a method to change the StoreParams of this dataset. Because changing the StoreParams implies to release the corresponding > >>> dataset location, I'd like to identify the current StoreParams in use to be able to avoid to release the location if the StoreParams we want to apply now are the same as those currently used. >>> Release is not so bad unless you are doing it frequently. >>> >>> >>> What is the right way to do this (if possible) ? >>> This may work: >>> >>> DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph) >>> StoreParams sp = x.getConfig().params ; >>> System.out.println(sp); >>> >>> (the "may" is because I only think it works on a live dataset no tested >>> it) >>> >>> Obviously the name "TDBInternal" is a warning! >>> >>> Which params are you interested in changing? >>> >>> Andy >>> >>> Defaults: >>> >>> fileMode dft:mapped >>> blockSize dft:8192 >>> readCacheSize dft:1 >>> writeCacheSize dft:2000 >>> Node2NodeIdCacheSize dft:10 >>> NodeId2NodeCacheSize dft:50 >>> NodeMissCacheSize dft:100 >>> indexNode2Id dft:node2id >>> indexId2Node dft:nodes >>> primaryIndexTriplesdft:SPO >>> tripleIndexes dft:[SPO, POS, OSP] >>> primaryIndexQuads dft:GSPO >>> quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG] >>> primaryIndexPrefix dft:GPU >>> prefixIndexes dft:[GPU] >>> indexPrefixdft:prefixIdx >>> prefixNode2Id dft:prefix2id >>> prefixId2Node dft:prefixes >>> >>> >>> >>> Thank you in advance for your help. Sincerely, Laurent >>> >> >
Re: TDB store parameters
On 25/08/16 16:16, Laurent Rucquoy wrote: Hello, I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide a method to change the StoreParams of this dataset. > > Because changing the StoreParams implies to release the corresponding dataset location, I'd like to identify the current StoreParams in use to be able to avoid to release the location if the StoreParams we want to apply now are the same as those currently used. Release is not so bad unless you are doing it frequently. What is the right way to do this (if possible) ? This may work: DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph) StoreParams sp = x.getConfig().params ; System.out.println(sp); (the "may" is because I only think it works on a live dataset no tested it) Obviously the name "TDBInternal" is a warning! Which params are you interested in changing? Andy Defaults: fileMode dft:mapped blockSize dft:8192 readCacheSize dft:1 writeCacheSize dft:2000 Node2NodeIdCacheSize dft:10 NodeId2NodeCacheSize dft:50 NodeMissCacheSize dft:100 indexNode2Id dft:node2id indexId2Node dft:nodes primaryIndexTriplesdft:SPO tripleIndexes dft:[SPO, POS, OSP] primaryIndexQuads dft:GSPO quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG] primaryIndexPrefix dft:GPU prefixIndexes dft:[GPU] indexPrefixdft:prefixIdx prefixNode2Id dft:prefix2id prefixId2Node dft:prefixes Thank you in advance for your help. Sincerely, Laurent