Hi,

TLDR: When importing the Freebase RDF dump Virtuoso seems to consume way more 
RAM than configured.

i'm trying to load the Freebase RDF dump ( 
https://developers.google.com/freebase/data ) into a clean Virtuoso OpenSource 
7.1.0 instance running on a VM with 4 cores and 32 GB of RAM, 300+ GB HD free.
The dump file contains 2,656,580,382 rows (even though the page claims 1.9 
billion triples, maybe outdated or dups).
Before attempting to load the whole Freebase dump, i loaded the basekb.com dump 
which contained 1,205,456,739 triples into the store which was already filled 
with DBpedia without any noticeable problem.

The Freebase dump rdf_loader_run() import starts with rapid IO rates (several 
100MB/s read and write bursts) and quickly consumes ~ 25 GB of RAM as 
configured. It then continues to slowly consume more and more RAM ~ 1 
MB/minute. As this goes on, the IO rates slowly drop down to some KB/s read and 
no / very very rare writes. htop at this point shows that the process spends 
nearly all its time on IO wait. After a couple of days Virtuoso is finally 
killed by the kernel when it consumed all RAM of the machine and wants even 
more.

I already tried adding 16 GB swap. This didn't help but made the machine 
completely unresponsive after 4 days (sshd seems to have been swapped out and 
never came back over a couple of hour long retries to ssh into the VM).

Ubuntu 12.04 LTS or 14.04.1 LTS doesn't seem to make a difference.

A colleague is reporting that the import works fine on a 256 GB RAM, 8 core 
machine with settings for 64 GB... takes about 1 day to import, the final DB is 
~ 130 GB. Mine never gets to > 100 GB before Virtuoso is killed.


The instance is set up following my tutorial 
http://joernhees.de/blog/2014/04/23/setting-up-a-local-dbpedia-3-9-mirror-with-virtuoso-7/
 just substitute the DBpedia Datasets with the Freebase triple dump and 
Wikidata links.

The virtuoso.ini values are set as suggested for 32 GB of RAM, there's nothing 
else running on the VM:
[Database]
MaxCheckpointRemap              = 2000  // also tried with 62500, so ~1/4th of 
NumberOfBuffers as in the blogpost
[Parameters]
;; Uncomment next two lines if there is 32 GB system memory free
NumberOfBuffers          = 2720000
MaxDirtyBuffers          = 2000000


As I already tried a lot of things but can't get this to work, i'd be thankful 
for feedback or someone looking into why virtuoso is consuming all of the RAM.

Cheers,
Jörn


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to