Andrejs, On 8 Apr 2014, at 22:36, Andrejs Abele <andr...@sindicetech.com> wrote:
> Hi Hugh, > Thank you for your comments. > When the loader fails, I continue from the point it failed. > I want to clarify, that virtuoso server, was not crashing, but the loader. [Hugh] So it is your client side loader program that is crashing ? > Our custom loader uses only one thread. We submit one file at time, that > contains 10 000 000 triples [Hugh] Why do you only run one loader ie rdf_loader_run() ? You have multiple datasets (120+) thus why would you not have multiple loaders running for parallel loading for data on a multi-core machine ? > When I increased waiting time in our virtuoso loader, then it loads the batch > file, but for every batch it takes more and more time, which we know is > expected. > But it still doesn't explain why virtuoso is working so slow. [Hugh] The load is probably slow as you have consumed all the buffers, thus the load is swapping in an out of slow disks, and thus constantly waiting for this slow process to occur hence low CPU utilisation etc as said in my previous email, which is why I wanted to see the status() command output which you have not provided ? > I tested read speed of disks (using "hdparm -Tt /dev/sdb" command ) that > I have attached and where database files are located and for each of them > read speed of 46.19 MB/sec. > When I run loader and observe virtuoso disk utilization, it doesn't go above > 4 MB/sec. > And in Goolge developers console, I can see that disk IO and CPU utilization > is close to zero. > Using SSD disks is not an option, as Googel cloud doesn't provide such > service. [Hugh] Then you need more memory otherwise once the buffers are consumed you are swapping to normal "slow" disk, which will "kill" load rates. > In virtuoso logs I see these messages: > ... > 14:33:50 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:34:06 * Monitor: High disk read (2) > 14:35:50 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:36:06 * Monitor: High disk read (2) > 14:38:53 * Monitor: High disk read (2) > 14:40:31 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:40:53 * Monitor: High disk read (2) > 14:42:43 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:42:53 * Monitor: High disk read (2) > 14:44:53 * Monitor: High disk read (2) > 14:45:31 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:46:59 * Monitor: High disk read (2) > 14:47:37 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:49:11 * Monitor: High disk read (2) > 14:51:08 * Monitor: CPU% is low while there are large numbers of runnable > threads > 14:51:12 * Monitor: High disk read (2) > > ... > > Is it because it is cloud, that virtuoso has problems to fully utilize the > disks? [Hugh] Nothing to do with cloud really it is more about the resources available on the machines in questions. You need sufficient memory for fast loading, if not then fast disk ie SSD type as a tradeoff, normal disks as is available in google cloud is going to make thing slow, but would be the same if it was your own server locally. > Are there any parameters I could configure? [Hugh] The main parameter you need is more memory ie buffers probably. I assume you have followed the Virtuoso RDF Performance tuning guide as seen in your INI file, but have you configured things like swappiness as there is much swapping to disk when the buffers run out as detailed at: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#"swappiness" > As length of mail is limited, below is link to output of LDMeter. > http://pastebin.com/ASNSamtf [Hugh] The LDMeter results are from the continued load ir the point were it is slow. I was hoping you would have started from scratch so you can see the load rates from the start until it starts slowing down , as the rates in the output you provide is about 5000 bytes per sec. Note also as indicated in the doc the main columns of interest are lm_n_row and lm_rows_per_s ie SELECT lm_n_row, lm_rows_per_s FROM ld_metric; Regards Hugh > > Regards, > Andrejs > > > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees_______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users