Hi, TLDR: When importing the Freebase RDF dump Virtuoso seems to consume way more RAM than configured.
i'm trying to load the Freebase RDF dump ( https://developers.google.com/freebase/data ) into a clean Virtuoso OpenSource 7.1.0 instance running on a VM with 4 cores and 32 GB of RAM, 300+ GB HD free. The dump file contains 2,656,580,382 rows (even though the page claims 1.9 billion triples, maybe outdated or dups). Before attempting to load the whole Freebase dump, i loaded the basekb.com dump which contained 1,205,456,739 triples into the store which was already filled with DBpedia without any noticeable problem. The Freebase dump rdf_loader_run() import starts with rapid IO rates (several 100MB/s read and write bursts) and quickly consumes ~ 25 GB of RAM as configured. It then continues to slowly consume more and more RAM ~ 1 MB/minute. As this goes on, the IO rates slowly drop down to some KB/s read and no / very very rare writes. htop at this point shows that the process spends nearly all its time on IO wait. After a couple of days Virtuoso is finally killed by the kernel when it consumed all RAM of the machine and wants even more. I already tried adding 16 GB swap. This didn't help but made the machine completely unresponsive after 4 days (sshd seems to have been swapped out and never came back over a couple of hour long retries to ssh into the VM). Ubuntu 12.04 LTS or 14.04.1 LTS doesn't seem to make a difference. A colleague is reporting that the import works fine on a 256 GB RAM, 8 core machine with settings for 64 GB... takes about 1 day to import, the final DB is ~ 130 GB. Mine never gets to > 100 GB before Virtuoso is killed. The instance is set up following my tutorial http://joernhees.de/blog/2014/04/23/setting-up-a-local-dbpedia-3-9-mirror-with-virtuoso-7/ just substitute the DBpedia Datasets with the Freebase triple dump and Wikidata links. The virtuoso.ini values are set as suggested for 32 GB of RAM, there's nothing else running on the VM: [Database] MaxCheckpointRemap = 2000 // also tried with 62500, so ~1/4th of NumberOfBuffers as in the blogpost [Parameters] ;; Uncomment next two lines if there is 32 GB system memory free NumberOfBuffers = 2720000 MaxDirtyBuffers = 2000000 As I already tried a lot of things but can't get this to work, i'd be thankful for feedback or someone looking into why virtuoso is consuming all of the RAM. Cheers, Jörn ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users