I agree, would be desirable to have funding for these requests and more. to bad there isn't currently a commercial entity that helps actively driving this valuable project.
On Tue, Jun 18, 2019 at 2:37 PM Andy Seaborne <[email protected]> wrote: > > > On 18/06/2019 13:44, Marco Neumann wrote: > > Andy, just one observation. there seems to be quite some data replication > > going on in the respective tdb / tdb2 folder. > > > > Is it possibly to instruct tdb/tdb2 only to create a database with one > > default graph? > > In theory you can set the indexes you want via StoreParams - it works > for choices but I would not be surprised if the code assumed at least > one quads index. Fixable. > > > It seems to be quite safe to remove files from disk that > > contain G-indexes manually and maintain query consistency in the default > > graph and it would reduced the tdb database footprint on disk by 1/3. > > > > They aren't as big as you think they are :-) > > Try this: > > No DB2. > > tdb2.tdbquery --loc DB2 'ASK{}' > Ask => Yes > > du -sh DB2 > 216K DB2 > > so it is 216K bytes on disk empty. > > (this is Linux/ext4 filesystem) > > ~ >> ll DB2/Data-0001/ > > loads of 8M files. > > How come there are files that are 8M but the entire thing is 216K? > > They are sparse files. > The space is not allocated. > > Some systems (Mac for example) report the size of the files added up, > not the space used. > > total 204 > -rw-r--r-- 1 afs afs 24 Jun 18 14:26 GOSP.bpt > -rw-r--r-- 1 afs afs 8388608 Jun 18 14:26 GOSP.dat > -rw-r--r-- 1 afs afs 8388608 Jun 18 14:26 GOSP.idn > > > not to speak of an option for LZW compression a la HDT. > > That would be good if I had time. Anyone got any spare funding?! > > I'm not sure how the HDT (java) project is doing. > Like all open source projects, it needs time and energy, and executing a > steady state still requires backing. > > I currently think RocksDB is possible choice. Initial experiments showed > it works but needs tuning work. The new storage architecture > (jena-dboe-storage) would make it event easier to build. > > Andy > > > > > > > > > > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <[email protected]> wrote: > > > >> > >> > >> On 14/06/2019 18:13, Marco Neumann wrote: > >>> I am collecting jena loader benchmarks. if you have results please post > >>> them directly. > >>> > >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks > >> > >> tdb2.tdbloader has variations controlled by --loader. > >> > >> --loader= > >> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or > >> 'light' > >> > >> "basic" is a super naive parser-add triple loop - it used if a loader > >> can't cope with an already loaded database. > >> > >> "phased" is a balanced, does not saturate the machine loader. Some > >> parallelism. > >> > >> "sequential" is the tdbloader algorithm for TDB2, more for reference. > >> > >> "parallel" is as much parallelism as it wants. (5 for triples, more for > >> quads) > >> > >> "light" is two threaded. Slightly ligther than "phased". > >> > >> See LoaderPlans. > >> > >>> On a linux machine I am using "time" to collect data. > >>> > >>> Is there a flag on tdb2.tdbloader to report time and triples per > second? > >>> > >>> I have noticed that storage space use for tdbloader2 is significantly > >>> smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a > >>> straight forward explanation here? > >>> > >> > > > > > -- --- Marco Neumann KONA -- --- Marco Neumann KONA
