Eric, I've loaded Uniprot (both reviewed and not-reviewed parts, all 14.2Gb) into a database that was some about 5Gb before the loading. I've used DB.DBA.RDF_LOAD_RDFXML_MT without any splitting into parts. It was a bit slower than I've expected but there were no hangs. I've set checkpoint_interval (6000) before the start, just to know better how much disk pages I really need for that amount of data but I sure that made no important difference.
I've used 2 x Quad Xeon box with 16Gb RAM and 6 cheap SATA disks. During loading I've used only 1000000 buffers (i.e. only 8Gb of RAM was used for the database) because my Fedore 2.6.21-6.fc7xen makes weird things if I try to allocate 12Gb (the sum of resident sizes of all processes + free RAM + buffer RAM suddenly becomes much less than 16Gb; that result in weird swapping; I don't know the exact reason). This problem with memory allocation seems to be my personal problem because other Linux boxes work fine. This problem happens only during intensive data loading, I use 1500000 buffers for all other activities (i.e. 12Gb out of 16Gb are usually for buffers). Best Regards, Ivan Mikhailov, OpenLink Software. On Tue, 2008-05-06 at 14:02 +0200, Erick Antezana wrote: > Hello, > > I am trying to upload a very huge file (uniprot.rdf) Its size is > about 45GB!! (the compressed file (3.5GB) can be found in: > ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/). I > have adapted a bit the virtuoso.ini file setting the striping options > (about 100GB reserved). I have also played with the NumberOfBuffers, and > MaxCheckPointRemap as suggested in > http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading > . The ISQL sentence I am using is: > > DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'), > > 'http://www.cellcycleontology.org/ontology/rdf/uniprot', > 'http://www.cellcycleontology.org/ontology/rdf/uniprot'); > > however, after initiating the loading process, virtuoso freezes the OS > (the load average of the system rises to 40!!) then after some time I > get an error message: > > SQL> SET AUTOCOMMIT ON; > SQL> > DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'), > > 'http://www.cellcycleontology.org/ontology/rdf/uniprot', > 'http://www.cellcycleontology.org/ontology/rdf/uniprot'); > > *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server > at line 2 of Top-Level: > DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'), > > 'http://www.cellcycleontology.org/ontology/rdf/uniprot', > 'http://www.cellcycleontology.org/ontology/rdf/uniprot') > > It seems that the function DB.DBA.RDF_LOAD_RDFXML_MT > <http://docs.openlinksw.com/virtuoso/fn_rdf_load_rdfxml_mt.html> > (http://docs.openlinksw.com/virtuoso/functionidx.html) could help me > dealing with large RDF files perhaps by loading split files > (file_to_string_output) as suggested in > http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading > If it would be the case, how would it be recommended to split such as > large file? In the > http://docs.openlinksw.com/virtuoso/fn_file_to_string_output.html it is > mentioned that the initial and final segments should be defined (how > long should they be?). Once loaded, will virtuoso be able to cope with > such DB? Were some tuning in the INI parameters be still needed/suggested? > > thanks in advance for any hints, > Erick >