Hello Remi,

I have never used MPI before. I can run the LSMS Smooth from the cli. The 
system has 4 cpu sockets with 14 core per socket:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Stepping:              1
CPU MHz:               1497.888
CPU max MHz:           2600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5207.83
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55

So I launch something like:

mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -in maur_rgb.png 
-fout smooth.tif -foutpos position.tif -spatialr 16 -ranger 16 -thres 0.1 
-maxiter 100

So if I understand this launches 4 copies of the application, but how do 
they know which instance is working on what? Is that just the magic of MPI?

-Steve

On Thursday, May 18, 2017 at 12:07:52 PM UTC-4, remicres wrote:
>
> Hello Stephen,
> I am really interested in your results. 
> A few years ago I failed to have good benchmarks of otb apps (that is, a 
> good scalability over the cpu usage) on the same kind of machine of yours. 
> The speedup was collapsing near 10-30 cpus (depending of the app). I 
> suspected fine tuning to be the cause, and I did not have the time to 
> persevere. This bad speedup might be related to threads positioning, cache 
> issues: the actual framework is well cpu-scalable in processing images in a 
> shared memory context, particularly when threads are on the same socket of 
> cpus. Depending of the algorithm used, I suspect one might need to fine 
> tune also the settings of the environment. 
> Could you provide the number of sockets of your machine? (with the number 
> of cpus for each one)
>
> If this machine has many sockets, one quick workaround to have good 
> speedup could consist in using the MPI support and force the binding of mpi 
> processes over the sockets (e.g. with openmpi: "mpirun -n <nb of socket of 
> your machine> --bind-to socket ..."). However, not sure how to use it from 
> python.
>
> Keep us updated!
>
> Rémi
>
> Le mercredi 17 mai 2017 21:34:48 UTC+2, Stephen Woodbridge a écrit :
>>
>> I started watching this with htop and all the cpus are getting action. 
>> There is a pattern where the number of threads spikes from about 162 up to 
>> 215 and the number of running threads spkies to  about 50ish for a few 
>> secounds, then the running threads drops to 2 for 5-10 seconds and repeats 
>> this pattern. I thinking that the parent thread is spinning up a bunch of 
>> workers, the finish, then the parent thread cycles through each of the 
>> finished workers collecting the results and presumably write it to disk or 
>> something. If it is writing to disk, there could be a huge potential 
>> performance improvement by writing the output to memory if enough memory is 
>> available which is clearly the case on this machine, then flushing the 
>> memory to disk. The current process is only using 3 GB or memory when it 
>> has 100 GB available to it and the system has 120GB.
>>
>> On Wednesday, May 17, 2017 at 12:13:04 PM UTC-4, Stephen Woodbridge wrote:
>>>
>>> Hi, first I want to say the LSMS Segmentation is very cool and works 
>>> nicely. I recently got access to a sever with 56 cores and 128GB of memory 
>>> but I can't seem to get it to use more than 10-15 cores. I'm running the 
>>> smoothing on an image approx 20000x20000 in size. The image is a gdal VRT 
>>> file that combines 8 DOQQ images into a mosaic. It has 4 bands R, G, B, IR 
>>> with each having Mask Flags: PER_DATASET (see below). I'm running this from 
>>> a Python script like:
>>>
>>> def smoothing(fin, fout, foutpos, spatialr, ranger, rangeramp, thres, 
>>> maxiter, ram):
>>>     app = otbApplication.Registry.CreateApplication('MeanShiftSmoothing'
>>> )
>>>     app.SetParameterString('in', fin)
>>>     app.SetParameterString('fout', fout)
>>>     app.SetParameterString('foutpos', foutpos)
>>>     app.SetParameterInt('spatialr', spatialr)
>>>     app.SetParameterFloat('ranger', ranger)
>>>     app.SetParameterFloat('rangeramp', rangeramp)
>>>     app.SetParameterFloat('thres', thres)
>>>     app.SetParameterInt('maxiter', maxiter)
>>>     app.SetParameterInt('ram', ram)
>>>     app.SetParameterInt('modesearch', 0)
>>>     app.ExecuteAndWriteOutput()
>>>
>>> Where:
>>> spatialr: 24
>>> ranger: 36
>>> rangeramp: 0
>>> thres: 0.1
>>> maxiter: 100
>>> ram: 102400
>>>
>>> Any thoughts on how I can get this to utilize more of the processing 
>>> power of this machine?
>>>
>>> -Steve
>>>
>>> woodbri@optane28:/u/ror/buildings/tmp$ otbcli_ReadImageInfo -in tmp-
>>> 23081-areaofinterest.vrt
>>> 2017 May 17 15:36:04  :  Application.logger  (INFO)
>>> Image general information:
>>>         Number of bands : 4
>>>         No data flags : Not found
>>>         Start index :  [0,0]
>>>         Size :  [19933,19763]
>>>         Origin :  [-118.442,34.0035]
>>>         Spacing :  [9.83578e-06,-9.83578e-06]
>>>         Estimated ground spacing (in meters): [0.90856,1.09369]
>>>
>>> Image acquisition information:
>>>         Sensor :
>>>         Image identification number:
>>>         Image projection : GEOGCS["WGS 84",
>>>     DATUM["WGS_1984",
>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>             AUTHORITY["EPSG","7030"]],
>>>         AUTHORITY["EPSG","6326"]],
>>>     PRIMEM["Greenwich",0],
>>>     UNIT["degree",0.0174532925199433],
>>>     AUTHORITY["EPSG","4326"]]
>>>
>>> Image default RGB composition:
>>>         [R, G, B] = [0,1,2]
>>>
>>> Ground control points information:
>>>         Number of GCPs = 0
>>>         GCPs projection =
>>>
>>> Output parameters value:
>>> indexx: 0
>>> indexy: 0
>>> sizex: 19933
>>> sizey: 19763
>>> spacingx: 9.835776837e-06
>>> spacingy: -9.835776837e-06
>>> originx: -118.4418488
>>> originy: 34.00345612
>>> estimatedgroundspacingx: 0.9085595012
>>> estimatedgroundspacingy: 1.093693733
>>> numberbands: 4
>>> sensor:
>>> id:
>>> time:
>>> ullat: 0
>>> ullon: 0
>>> urlat: 0
>>> urlon: 0
>>> lrlat: 0
>>> lrlon: 0
>>> lllat: 0
>>> lllon: 0
>>> town:
>>> country:
>>> rgb.r: 0
>>> rgb.g: 1
>>> rgb.b: 2
>>> projectionref: GEOGCS["WGS 84",
>>>     DATUM["WGS_1984",
>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>             AUTHORITY["EPSG","7030"]],
>>>         AUTHORITY["EPSG","6326"]],
>>>     PRIMEM["Greenwich",0],
>>>     UNIT["degree",0.0174532925199433],
>>>     AUTHORITY["EPSG","4326"]]
>>> keyword:
>>> gcp.count: 0
>>> gcp.proj:
>>> gcp.ids:
>>> gcp.info:
>>> gcp.imcoord:
>>> gcp.geocoord:
>>>
>>> woodbri@optane28:/u/ror/buildings/tmp$ gdalinfo tmp-23081-areaofinterest
>>> .vrt
>>> Driver: VRT/Virtual Raster
>>> Files: tmp-23081-areaofinterest.vrt
>>>        /u/ror/buildings/tmp/tmp-23081-areaofinterest.vrt.vrt
>>> Size is 19933, 19763
>>> Coordinate System is:
>>> GEOGCS["WGS 84",
>>>     DATUM["WGS_1984",
>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>             AUTHORITY["EPSG","7030"]],
>>>         AUTHORITY["EPSG","6326"]],
>>>     PRIMEM["Greenwich",0],
>>>     UNIT["degree",0.0174532925199433],
>>>     AUTHORITY["EPSG","4326"]]
>>> Origin = (-118.441851318576212,34.003461706049677)
>>> Pixel Size = (0.000009835776490,-0.000009835776490)
>>> Corner Coordinates:
>>> Upper Left  (-118.4418513,  34.0034617) (118d26'30.66"W, 34d 0'12.46"N)
>>> Lower Left  (-118.4418513,  33.8090773) (118d26'30.66"W, 33d48'32.68"N)
>>> Upper Right (-118.2457948,  34.0034617) (118d14'44.86"W, 34d 0'12.46"N)
>>> Lower Right (-118.2457948,  33.8090773) (118d14'44.86"W, 33d48'32.68"N)
>>> Center      (-118.3438231,  33.9062695) (118d20'37.76"W, 33d54'22.57"N)
>>> Band 1 Block=128x128 Type=Byte, ColorInterp=Red
>>>   Mask Flags: PER_DATASET
>>> Band 2 Block=128x128 Type=Byte, ColorInterp=Green
>>>   Mask Flags: PER_DATASET
>>> Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
>>>   Mask Flags: PER_DATASET
>>> Band 4 Block=128x128 Type=Byte, ColorInterp=Gray
>>>   Mask Flags: PER_DATASET
>>>
>>>
>>>
>>>

-- 
-- 
Check the OTB FAQ at
http://www.orfeo-toolbox.org/FAQ.html

You received this message because you are subscribed to the Google
Groups "otb-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/otb-users?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"otb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to