Re: [GRASS-user] v.net parallelisation issues

2015-02-13 Thread Moritz Lennert

On 13/02/15 08:39, Mark Wynter wrote:

I’ve encountered a bottleneck somewhere with v.net http://v.net when
scaling out with GNU Parallel… not sure if its an underlying issue with
v.net http://v.net or the way I’m calling the batch jobs?

I’ve got 32 CPUs and commensurate RAM.   What I’m observing is v.net
http://v.net CPU utilisation dropping off in accordance with number of
jobs running.



And this means that you don't get any gain in duration ? Could it be 
that as you divide into more batches each batch is smaller and thus 
needs less CPU ?


Moritz

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] v.net parallelisation issues

2015-02-13 Thread Moritz Lennert

On 13/02/15 13:40, Mark Wynter wrote:

Hi Moritz

With the second approach  (the code I shared in my post), I have 3500 discrete 
jobs, and I set the number of batches equal to the number of CPUs.  Each batch 
job is despatched to a cpu, where it then pulls from a queue of job id’s that 
are processed in serial within each batch job.  The thinking behind this 
approach was to allocate jobs across available CPUs as separate batch processes.

The other and preferred approach is to launch 1 batch job, and then GNU 
parallel draws down from the list of 3500 jobs, assigning jobs to worker 
functions as CPUs become available.  This code pattern I’ve had much success 
with parallelising PostGIS queries etc.

As you have suspected, I get no benefit from additional CPUs.


Are you sure the problem is CPU-bound ?



Unfortunately I don’t have time on my side, and parallelisation is critical.  A 
fallback is to spin up a cluster of 16 x 2 CPU machines and pre-allocate 
job-ids to machines, and then write the results back to the master node - but 
this is not ideal and pathway I am reticent about going down.

Do you know anyone who may have attempted to parallelise v.net?


No. Personally I don't have any experience with this.
You are specifically speaking about v.net.distance, here, or ?



I guess the most important question right now is - is it possible to do poor 
man’s parallelisation with v.net?   Anyone?


The one who knows the insides of these modules best is Markus Metz.

Moritz
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] v.net parallelisation issues

2015-02-13 Thread Mark Wynter
Hi Moritz, Stefan (and Marcus if you’re around?)

Every day is a new day.

Not IOPs, Not memory, Not CPU…   Hmm, how about reboot...

 
 As you have suspected, I get no benefit from additional CPUs.
 
 Are you sure the problem is CPU-bound ?


I started by testing v.net (the maintenance module) in parallel - and observed 
pretty much linear scaling in performance over and above 2 parallel jobs. 

So I then proceeded to test v.net.distance… and sure enough, it is now scaling. 
 Really different looking CPU performance profiles to yesterday, when it wasn’t 
scaling.  On top, all the v.net CPUs are humming at 50%, the other 50% used by 
pg processes.

Below are some iostat profiles of the different parallel test.

So v.net.distance does parallelise nicely.   The neatest way is to launch GNU 
parallel from within a single grass_batch_job.  I will write up on the grass 
wiki in the coming 2 weeks.

Mark
:-)





---V.NET ---

Single job
v.net
iostat 1
TOTAL SCRIPT TIME: 284

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  15.920.00   35.320.500.00   48.26

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 18.00 8.00   744.00  8744
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  20.000.00   34.000.500.00   45.50

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde  9.90 0.00   633.66  0640
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  22.500.00   35.500.500.00   41.50

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde  9.00 0.00   592.00  0592
xvdj  0.00 0.00 0.00  0  0



Two parallel jobs
v.net
iostat 1
TOTAL SCRIPT TIME: 397

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  32.160.00   67.840.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 19.00 0.00  1024.00  0   1024
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  39.500.00   60.500.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 19.00 0.00   912.00  0912
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  35.500.00   64.500.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 16.00 0.00  1024.00  0   1024
xvdj  0.00 0.00 0.00  0  0


Four parallel jobs
v.net
iostat 1
TOTAL SCRIPT TIME: 388

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  32.920.00   67.080.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 31.00 0.00  1952.00  0   1952
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  33.830.00   66.170.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 35.00 0.00  2016.00  0   2016
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  32.920.00   67.080.000.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde 61.00 0.00  4032.00  0   4032
xvdj  0.00 0.00 0.00  0  0

---V.NET.DISTANCE ---
Single job
v.net.distance
iostat 1
TOTAL SCRIPT TIME: 88

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.000.00   21.000.000.00   75.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde  0.00 0.00 0.00  0  0
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.750.00   21.750.000.00   70.50

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvde  0.00 0.00 0.00  0  0
xvdj  0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6.270.00   22.560.000.00   71.18

Device:tps   Blk_read/s   Blk_wrtn/s   

Re: [GRASS-user] v.net parallelisation issues

2015-02-13 Thread Mark Wynter
Hi Moritz

With the second approach  (the code I shared in my post), I have 3500 discrete 
jobs, and I set the number of batches equal to the number of CPUs.  Each batch 
job is despatched to a cpu, where it then pulls from a queue of job id’s that 
are processed in serial within each batch job.  The thinking behind this 
approach was to allocate jobs across available CPUs as separate batch processes.

The other and preferred approach is to launch 1 batch job, and then GNU 
parallel draws down from the list of 3500 jobs, assigning jobs to worker 
functions as CPUs become available.  This code pattern I’ve had much success 
with parallelising PostGIS queries etc.

As you have suspected, I get no benefit from additional CPUs.  

Unfortunately I don’t have time on my side, and parallelisation is critical.  A 
fallback is to spin up a cluster of 16 x 2 CPU machines and pre-allocate 
job-ids to machines, and then write the results back to the master node - but 
this is not ideal and pathway I am reticent about going down.

Do you know anyone who may have attempted to parallelise v.net?

I guess the most important question right now is - is it possible to do poor 
man’s parallelisation with v.net?   Anyone?

Mark
 
On 13 Feb 2015, at 7:56 pm, Moritz Lennert mlenn...@club.worldonline.be wrote:

 On 13/02/15 08:39, Mark Wynter wrote:
 I’ve encountered a bottleneck somewhere with v.net http://v.net when
 scaling out with GNU Parallel… not sure if its an underlying issue with
 v.net http://v.net or the way I’m calling the batch jobs?
 
 I’ve got 32 CPUs and commensurate RAM.   What I’m observing is v.net
 http://v.net CPU utilisation dropping off in accordance with number of
 jobs running.
 
 
 And this means that you don't get any gain in duration ? Could it be that as 
 you divide into more batches each batch is smaller and thus needs less CPU ?
 
 Moritz
 

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] v.net parallelisation issues

2015-02-13 Thread Mark Wynter
Getting closer to the bottom of this.

On AWS, problems occur once I scale beyond 16 CPUs.

c3.4xlarge (16 CPUs, 30GB memory), v.net  scales linearly
c3.8xlarge (32 CPUs, 60GB memory), v.net doesn't scale AT ALL - the same as 
having single CPU.

I don’t believe its a pg issue, as I run loads of 32 CPU parallel scripts with 
postgresql as a backend
Also, I noticed the same drop off issue with 32 CPUs when I had sqlite as the 
db driver in grass.

For now, I can work with 16 CPUs.  

Time to enlist the help of a sysadmin to find the bottleneck.

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


[GRASS-user] v.net parallelisation issues

2015-02-12 Thread Mark Wynter
I’ve encountered a bottleneck somewhere with v.net when scaling out with GNU 
Parallel… not sure if its an underlying issue with v.net or the way I’m calling 
the batch jobs?

I’ve got 32 CPUs and commensurate RAM.   What I’m observing is v.net CPU 
utilisation dropping off in accordance with number of jobs running.

I’ve tried launching a single batch job with single mapset, as well as multiple 
batch jobs each with their own mapset (and database).  I’ve tried both PG and 
sqlite backends.   Same issue.

The script at the bottom describes the approach of launching multiple batch 
jobs each with their own map set.Executing a single batch job, and then 
launching parallel within the batch script is much cleaner code - but the 
results are no different.

I feel I’m so close, yet so far at such a critical stage of project delivery.

Hope someone can help

Kind regards
Mark

RESULTS

ONE JOB
TOTAL SCRIPT TIME: 70

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
31313 root  20   0 28876 4080 1284 S 76.5  0.0   0:20.25 sqlite 

31293 root  20   0  276m 134m 8320 S 68.5  0.2   0:20.22 v.net.distance 
—

TWO JOBS
TOTAL SCRIPT TIME: 96

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
21391 root  20   0 28876 4080 1284 R 53.0  0.0   0:01.90 sqlite 

21392 root  20   0 28876 4080 1284 R 52.6  0.0   0:01.86 sqlite 

21380 root  20   0  276m 128m 8320 R 49.3  0.2   0:04.02 v.net.distance 

21381 root  20   0  276m 128m 8320 S 48.3  0.2   0:03.97 v.net.distance
—

FOUR JOBS
TOTAL SCRIPT TIME: 187

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
 6953 mark  20   0  180m 100m 9520 S 63.6  0.2   1:47.39 x2goagent  

23025 root  20   0 28876 4080 1284 S 21.5  0.0   0:02.03 sqlite 

23026 root  20   0 28876 4080 1284 R 19.9  0.0   0:02.08 sqlite 

23027 root  20   0 28876 4080 1284 S 19.5  0.0   0:01.87 sqlite 

23028 root  20   0 28876 4080 1284 S 19.5  0.0   0:01.84 sqlite 

23014 root  20   0  276m 128m 8320 R 18.5  0.2   0:04.06 v.net.distance 

23012 root  20   0  276m 128m 8320 R 17.5  0.2   0:03.91 v.net.distance 

23011 root  20   0  276m 128m 8320 S 16.9  0.2   0:04.13 v.net.distance 

23015 root  20   0  276m 128m 8320 R 16.9  0.2   0:03.80 v.net.distance  
—

EIGHT JOBS
TOTAL SCRIPT TIME: 373

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
27157 root  20   0 28876 4088 1284 S 19.5  0.0   0:42.39 sqlite 

27162 root  20   0 28876 4088 1284 R 16.9  0.0   0:40.60 sqlite 

 6953 mark  20   0  181m 101m 9520 S 16.5  0.2   2:18.86 x2goagent  

27154 root  20   0 28876 4088 1284 S 16.5  0.0   0:39.38 sqlite 

27153 root  20   0 28876 4088 1284 S 16.2  0.0   0:35.60 sqlite 

27156 root  20   0 28876 4088 1284 R 16.2  0.0   0:38.18 sqlite 

27161 root  20   0 28876 4088 1284 S 15.9  0.0   0:40.96 sqlite 

27155 root  20   0 28876 4088 1284 S 15.6  0.0   0:38.41 sqlite 

27104 root  20   0  284m 139m 8332 S 14.9  0.2   0:39.94 v.net.distance 

27158 root  20   0 28876 4088 1284 R 14.6  0.0   0:37.49 sqlite 

27095 root  20   0  284m 138m 8332 S 14.2  0.2   0:34.48 v.net.distance 

27099 root  20   0  284m 138m 8332 S 14.2  0.2   0:38.27 v.net.distance 

27101 root  20   0  284m 139m 8332 R 14.2  0.2   0:38.80 v.net.distance 

27105 root  20   0  284m 139m 8332 R 14.2  0.2   0:37.95 v.net.distance 

27093 root  20   0  284m 138m 8332 R 13.9  0.2   0:32.64 v.net.distance 

27102 root  20   0  284m 140m 8332 R 13.6  0.2   0:40.90 v.net.distance 

27094 root  20   0  284m 138m 8332 R 13.2  0.2   0:35.78 v.net.distance  
—


   WORKER FUNCTION# 

Re: [GRASS-user] v.net parallelisation issues

2015-02-12 Thread Blumentrath, Stefan
Hi Mark,

Don`t know if that is of any help, but:
Have you tried the igraph package for very customized / sophisticated network 
analysis (http://igraph.org/redirect.html)?
It plays nicely with R, python, and C and therefor also with GRASS.

What I did (now in several cases) is to use v.net and v.db.select / v.to.db 
(from within R) in order to collect attributes of nodes and edges into R 
objects (python arrays would work equally well I guess) and then continued 
working without the geometries. After all operations are done, results are 
written back to GRASS / SQLite. For parallelization I used doMC package in R 
and I was quite satisfied with the performance.

Kind regards,
Stefan

P.S.: I also used that kind of approach in the r.connectivity.network addon 
(GRASS 6).



From: grass-user-boun...@lists.osgeo.org 
[mailto:grass-user-boun...@lists.osgeo.org] On Behalf Of Mark Wynter
Sent: 13. februar 2015 08:40
To: grass-user@lists.osgeo.org
Subject: [GRASS-user] v.net parallelisation issues

I've encountered a bottleneck somewhere with v.nethttp://v.net when scaling 
out with GNU Parallel... not sure if its an underlying issue with 
v.nethttp://v.net or the way I'm calling the batch jobs?

I've got 32 CPUs and commensurate RAM.   What I'm observing is 
v.nethttp://v.net CPU utilisation dropping off in accordance with number of 
jobs running.

I've tried launching a single batch job with single mapset, as well as multiple 
batch jobs each with their own mapset (and database).  I've tried both PG and 
sqlite backends.   Same issue.

The script at the bottom describes the approach of launching multiple batch 
jobs each with their own map set.Executing a single batch job, and then 
launching parallel within the batch script is much cleaner code - but the 
results are no different.

I feel I'm so close, yet so far at such a critical stage of project delivery.

Hope someone can help

Kind regards
Mark

RESULTS

ONE JOB
TOTAL SCRIPT TIME: 70

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
31313 root  20   0 28876 4080 1284 S 76.5  0.0   0:20.25 sqlite
31293 root  20   0  276m 134m 8320 S 68.5  0.2   0:20.22 v.net.distance
-

TWO JOBS
TOTAL SCRIPT TIME: 96

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
21391 root  20   0 28876 4080 1284 R 53.0  0.0   0:01.90 sqlite
21392 root  20   0 28876 4080 1284 R 52.6  0.0   0:01.86 sqlite
21380 root  20   0  276m 128m 8320 R 49.3  0.2   0:04.02 v.net.distance
21381 root  20   0  276m 128m 8320 S 48.3  0.2   0:03.97 v.net.distance
-

FOUR JOBS
TOTAL SCRIPT TIME: 187

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 6953 mark  20   0  180m 100m 9520 S 63.6  0.2   1:47.39 x2goagent
23025 root  20   0 28876 4080 1284 S 21.5  0.0   0:02.03 sqlite
23026 root  20   0 28876 4080 1284 R 19.9  0.0   0:02.08 sqlite
23027 root  20   0 28876 4080 1284 S 19.5  0.0   0:01.87 sqlite
23028 root  20   0 28876 4080 1284 S 19.5  0.0   0:01.84 sqlite
23014 root  20   0  276m 128m 8320 R 18.5  0.2   0:04.06 v.net.distance
23012 root  20   0  276m 128m 8320 R 17.5  0.2   0:03.91 v.net.distance
23011 root  20   0  276m 128m 8320 S 16.9  0.2   0:04.13 v.net.distance
23015 root  20   0  276m 128m 8320 R 16.9  0.2   0:03.80 v.net.distance
-

EIGHT JOBS
TOTAL SCRIPT TIME: 373

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
27157 root  20   0 28876 4088 1284 S 19.5  0.0   0:42.39 sqlite
27162 root  20   0 28876 4088 1284 R 16.9  0.0   0:40.60 sqlite
 6953 mark  20   0  181m 101m 9520 S 16.5  0.2   2:18.86 x2goagent
27154 root  20   0 28876 4088 1284 S 16.5  0.0   0:39.38 sqlite
27153 root  20   0 28876 4088 1284 S 16.2  0.0   0:35.60 sqlite
27156 root  20   0 28876 4088 1284 R 16.2  0.0   0:38.18 sqlite
27161 root  20   0 28876 4088 1284 S 15.9  0.0   0:40.96 sqlite
27155 root  20   0 28876 4088 1284 S 15.6  0.0   0:38.41 sqlite
27104 root  20   0  284m 139m 8332 S 14.9  0.2   0:39.94 v.net.distance
27158 root  20   0 28876 4088 1284 R 14.6  0.0   0:37.49 sqlite
27095 root  20   0  284m 138m 8332 S 14.2  0.2   0:34.48 v.net.distance
27099 root  20   0  284m 138m 8332 S 14.2  0.2   0:38.27 v.net.distance
27101 root  20   0  284m 139m 8332 R 14.2  0.2   0:38.80 v.net.distance
27105 root  20   0  284m 139m 8332 R 14.2  0.2   0:37.95 v.net.distance
27093 root  20   0  284m 138m 8332 R 13.9  0.2   0:32.64 v.net.distance
27102 root  20   0  284m 140m 8332 R 13.6  0.2   0:40.90 v.net.distance
27094 root  20   0  284m 138m 8332 R 13.2  0.2   0:35.78 v.net.distance
-


   WORKER FUNCTION#

# CREATE MAPSETS AND BASH SCRIPTS FOR EACH CPU
fn_worker (){

###
# copy mapset
###
cp -R /var/tmp/jtw/PERMANENT