Eric,

I've loaded Uniprot (both reviewed and not-reviewed parts, all 14.2Gb)
into a database that was some about 5Gb before the loading.
I've used DB.DBA.RDF_LOAD_RDFXML_MT without any splitting into parts. It
was a bit slower than I've expected but there were no hangs. I've set
checkpoint_interval (6000) before the start, just to know better how
much disk pages I really need for that amount of data but I sure that
made no important difference.

I've used 2 x Quad Xeon box with 16Gb RAM and 6 cheap SATA disks. During
loading I've used only 1000000 buffers (i.e. only 8Gb of RAM was used
for the database) because my Fedore 2.6.21-6.fc7xen makes weird things
if I try to allocate 12Gb (the sum of resident sizes of all processes +
free RAM + buffer RAM suddenly becomes much less than 16Gb; that result
in weird swapping; I don't know the exact reason). This problem with
memory allocation seems to be my personal problem because other Linux
boxes work fine. This problem happens only during intensive data
loading, I use 1500000 buffers for all other activities (i.e. 12Gb out
of 16Gb are usually for buffers).

Best Regards,

Ivan Mikhailov,
OpenLink Software.

On Tue, 2008-05-06 at 14:02 +0200, Erick Antezana wrote:
> Hello,
> 
>    I am trying to upload a very huge file (uniprot.rdf) Its size is 
> about 45GB!! (the compressed file (3.5GB) can be found in: 
> ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/). I 
> have adapted a bit the virtuoso.ini file setting the striping options 
> (about 100GB reserved). I have also played with the NumberOfBuffers, and 
> MaxCheckPointRemap as suggested in 
> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading 
> . The ISQL sentence I am using is:
> 
> DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
>  
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot', 
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot');
> 
> however, after initiating the loading process, virtuoso freezes the OS 
> (the load average of the system rises to 40!!) then after some time I 
> get an error message:
> 
> SQL> SET AUTOCOMMIT ON;
> SQL> 
> DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
>  
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot', 
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot');
> 
> *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
> at line 2 of Top-Level:
> DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
>  
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot', 
> 'http://www.cellcycleontology.org/ontology/rdf/uniprot')
> 
> It seems that the function DB.DBA.RDF_LOAD_RDFXML_MT 
> <http://docs.openlinksw.com/virtuoso/fn_rdf_load_rdfxml_mt.html> 
> (http://docs.openlinksw.com/virtuoso/functionidx.html) could help me 
> dealing with large RDF files perhaps by loading split files 
> (file_to_string_output) as suggested in 
> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading 
> If it would be the case, how would it be recommended to split such as 
> large file? In the  
> http://docs.openlinksw.com/virtuoso/fn_file_to_string_output.html it is 
> mentioned that the initial and final segments should be defined (how 
> long should they be?). Once loaded, will virtuoso be able to cope with 
> such DB? Were some tuning  in the INI parameters be still needed/suggested?
> 
> thanks in advance for any hints,
> Erick
> 


Reply via email to