Re: [Virtuoso-users] problems loading Virtuos on Google cloud

Hugh Williams Tue, 08 Apr 2014 20:41:16 -0700

Andrejs,

On 8 Apr 2014, at 22:36, Andrejs Abele <andr...@sindicetech.com> wrote:


> Hi Hugh,
> Thank you for your comments.
> When the loader fails, I continue from the point it failed.
> I want to clarify, that virtuoso server, was not crashing, but the loader.

[Hugh] So it is your client side loader program that is crashing ?

> Our custom loader uses only one thread. We submit one file at time, that 
> contains 10 000 000 triples

[Hugh] Why do you only run one loader ie rdf_loader_run() ? You have multiple 
datasets (120+) thus why would you not have multiple loaders running for 
parallel loading for data on a multi-core machine ?

> When I increased waiting time in our virtuoso loader, then it loads the batch 
> file, but for every batch it takes more and more time, which we know is 
> expected.  
> But it still doesn't explain why virtuoso is working so slow.

[Hugh] The load is probably slow as you have consumed all the buffers, thus the 
load  is swapping in an out of slow disks, and thus constantly waiting for this 
slow process to occur hence low CPU utilisation etc as said in my previous 
email, which is why I wanted to see the status() command output which you have 
not provided ?

>   I tested read speed of disks (using  "hdparm -Tt /dev/sdb" command  ) that 
> I have attached and where database files are located and for each of them 
> read speed of  46.19 MB/sec.
> When I run loader and observe virtuoso disk utilization, it doesn't go above 
> 4 MB/sec.
> And in Goolge developers console, I can see that disk IO and CPU utilization 
> is close to zero.
> Using SSD disks is not an option, as Googel cloud doesn't provide such 
> service.

[Hugh] Then you need more memory otherwise once the buffers are consumed you 
are swapping to normal "slow" disk, which will "kill" load rates.

> In virtuoso logs I see these messages:
> ...
> 14:33:50 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:34:06 * Monitor: High disk read (2)
> 14:35:50 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:36:06 * Monitor: High disk read (2)
> 14:38:53 * Monitor: High disk read (2)
> 14:40:31 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:40:53 * Monitor: High disk read (2)
> 14:42:43 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:42:53 * Monitor: High disk read (2)
> 14:44:53 * Monitor: High disk read (2)
> 14:45:31 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:46:59 * Monitor: High disk read (2)
> 14:47:37 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:49:11 * Monitor: High disk read (2)
> 14:51:08 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 14:51:12 * Monitor: High disk read (2)
> 
> ...
> 
> Is it because it is cloud, that virtuoso has problems to  fully utilize the 
> disks? 

[Hugh] Nothing to do with cloud really it is more about the resources available 
on the machines  in questions. You need sufficient memory for fast loading, if 
not then fast disk ie SSD type as a tradeoff,  normal disks as is available in 
google cloud is going to make thing slow, but would be the same if it was your 
own server locally.

> Are there any parameters I could configure? 

[Hugh] The main parameter you need is more memory ie buffers probably. I assume 
you have followed the Virtuoso RDF Performance tuning guide as seen in your INI 
file, but have you configured things like swappiness as there is much swapping 
to disk  when the buffers run out as detailed at:

        
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#"swappiness";

> As length of mail is limited, below is link to output of LDMeter.
> http://pastebin.com/ASNSamtf

[Hugh] The LDMeter results are from the continued load ir the point were it is 
slow. I was hoping you would have started from scratch so you can see the load 
rates from the start until it starts slowing down , as the rates in the output 
you provide is about 5000 bytes per sec. Note also as indicated in the doc the 
main columns of interest are lm_n_row and lm_rows_per_s ie

        SELECT  lm_n_row,  lm_rows_per_s FROM ld_metric;

Regards
Hugh

> 
> Regards,
> Andrejs
> 
> 
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment 
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees_______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] problems loading Virtuos on Google cloud

Reply via email to