[otb-users] Re: Performance of LSMS Segmentation with many CPU server

remicres Sun, 21 May 2017 04:09:01 -0700

Hello Stephen,

To let the mpi magic happen, you must compile otb with the option 
OTB_USE_MPI=ON. 
You may also set the option OTB_USE_SPTW=ON wich enables the writing of 
.tif files in parallel.
After this, you set the ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS to 14 (wich is 
the nb of threads per socket)  then you deploy the app over 4 mpi processes 
(each one binded to a socket)
mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -...


I just realize that we need more material about mpi on the wiki / cookbook 
/ blog, I will take care of this soon...

Le jeudi 18 mai 2017 23:52:23 UTC+2, Stephen Woodbridge a écrit :
>
> Hello Remi,
>
> I have never used MPI before. I can run the LSMS Smooth from the cli. The 
> system has 4 cpu sockets with 14 core per socket:
>
> $ lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                56
> On-line CPU(s) list:   0-55
> Thread(s) per core:    2
> Core(s) per socket:    14
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> Stepping:              1
> CPU MHz:               1497.888
> CPU max MHz:           2600.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5207.83
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              35840K
> NUMA node0 CPU(s):     0-13,28-41
> NUMA node1 CPU(s):     14-27,42-55
>
> So I launch something like:
>
> mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -in maur_rgb.png 
> -fout smooth.tif -foutpos position.tif -spatialr 16 -ranger 16 -thres 0.1 
> -maxiter 100
>
> So if I understand this launches 4 copies of the application, but how do 
> they know which instance is working on what? Is that just the magic of MPI?
>
> -Steve
>
> On Thursday, May 18, 2017 at 12:07:52 PM UTC-4, remicres wrote:
>>
>> Hello Stephen,
>> I am really interested in your results. 
>> A few years ago I failed to have good benchmarks of otb apps (that is, a 
>> good scalability over the cpu usage) on the same kind of machine of yours. 
>> The speedup was collapsing near 10-30 cpus (depending of the app). I 
>> suspected fine tuning to be the cause, and I did not have the time to 
>> persevere. This bad speedup might be related to threads positioning, cache 
>> issues: the actual framework is well cpu-scalable in processing images in a 
>> shared memory context, particularly when threads are on the same socket of 
>> cpus. Depending of the algorithm used, I suspect one might need to fine 
>> tune also the settings of the environment. 
>> Could you provide the number of sockets of your machine? (with the number 
>> of cpus for each one)
>>
>> If this machine has many sockets, one quick workaround to have good 
>> speedup could consist in using the MPI support and force the binding of mpi 
>> processes over the sockets (e.g. with openmpi: "mpirun -n <nb of socket of 
>> your machine> --bind-to socket ..."). However, not sure how to use it from 
>> python.
>>
>> Keep us updated!
>>
>> Rémi
>>
>> Le mercredi 17 mai 2017 21:34:48 UTC+2, Stephen Woodbridge a écrit :
>>>
>>> I started watching this with htop and all the cpus are getting action. 
>>> There is a pattern where the number of threads spikes from about 162 up to 
>>> 215 and the number of running threads spkies to  about 50ish for a few 
>>> secounds, then the running threads drops to 2 for 5-10 seconds and repeats 
>>> this pattern. I thinking that the parent thread is spinning up a bunch of 
>>> workers, the finish, then the parent thread cycles through each of the 
>>> finished workers collecting the results and presumably write it to disk or 
>>> something. If it is writing to disk, there could be a huge potential 
>>> performance improvement by writing the output to memory if enough memory is 
>>> available which is clearly the case on this machine, then flushing the 
>>> memory to disk. The current process is only using 3 GB or memory when it 
>>> has 100 GB available to it and the system has 120GB.
>>>
>>> On Wednesday, May 17, 2017 at 12:13:04 PM UTC-4, Stephen Woodbridge 
>>> wrote:
>>>>
>>>> Hi, first I want to say the LSMS Segmentation is very cool and works 
>>>> nicely. I recently got access to a sever with 56 cores and 128GB of memory 
>>>> but I can't seem to get it to use more than 10-15 cores. I'm running the 
>>>> smoothing on an image approx 20000x20000 in size. The image is a gdal VRT 
>>>> file that combines 8 DOQQ images into a mosaic. It has 4 bands R, G, B, IR 
>>>> with each having Mask Flags: PER_DATASET (see below). I'm running this 
>>>> from 
>>>> a Python script like:
>>>>
>>>> def smoothing(fin, fout, foutpos, spatialr, ranger, rangeramp, thres, 
>>>> maxiter, ram):
>>>>     app = otbApplication.Registry.CreateApplication(
>>>> 'MeanShiftSmoothing')
>>>>     app.SetParameterString('in', fin)
>>>>     app.SetParameterString('fout', fout)
>>>>     app.SetParameterString('foutpos', foutpos)
>>>>     app.SetParameterInt('spatialr', spatialr)
>>>>     app.SetParameterFloat('ranger', ranger)
>>>>     app.SetParameterFloat('rangeramp', rangeramp)
>>>>     app.SetParameterFloat('thres', thres)
>>>>     app.SetParameterInt('maxiter', maxiter)
>>>>     app.SetParameterInt('ram', ram)
>>>>     app.SetParameterInt('modesearch', 0)
>>>>     app.ExecuteAndWriteOutput()
>>>>
>>>> Where:
>>>> spatialr: 24
>>>> ranger: 36
>>>> rangeramp: 0
>>>> thres: 0.1
>>>> maxiter: 100
>>>> ram: 102400
>>>>
>>>> Any thoughts on how I can get this to utilize more of the processing 
>>>> power of this machine?
>>>>
>>>> -Steve
>>>>
>>>> woodbri@optane28:/u/ror/buildings/tmp$ otbcli_ReadImageInfo -in tmp-
>>>> 23081-areaofinterest.vrt
>>>> 2017 May 17 15:36:04  :  Application.logger  (INFO)
>>>> Image general information:
>>>>         Number of bands : 4
>>>>         No data flags : Not found
>>>>         Start index :  [0,0]
>>>>         Size :  [19933,19763]
>>>>         Origin :  [-118.442,34.0035]
>>>>         Spacing :  [9.83578e-06,-9.83578e-06]
>>>>         Estimated ground spacing (in meters): [0.90856,1.09369]
>>>>
>>>> Image acquisition information:
>>>>         Sensor :
>>>>         Image identification number:
>>>>         Image projection : GEOGCS["WGS 84",
>>>>     DATUM["WGS_1984",
>>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>>             AUTHORITY["EPSG","7030"]],
>>>>         AUTHORITY["EPSG","6326"]],
>>>>     PRIMEM["Greenwich",0],
>>>>     UNIT["degree",0.0174532925199433],
>>>>     AUTHORITY["EPSG","4326"]]
>>>>
>>>> Image default RGB composition:
>>>>         [R, G, B] = [0,1,2]
>>>>
>>>> Ground control points information:
>>>>         Number of GCPs = 0
>>>>         GCPs projection =
>>>>
>>>> Output parameters value:
>>>> indexx: 0
>>>> indexy: 0
>>>> sizex: 19933
>>>> sizey: 19763
>>>> spacingx: 9.835776837e-06
>>>> spacingy: -9.835776837e-06
>>>> originx: -118.4418488
>>>> originy: 34.00345612
>>>> estimatedgroundspacingx: 0.9085595012
>>>> estimatedgroundspacingy: 1.093693733
>>>> numberbands: 4
>>>> sensor:
>>>> id:
>>>> time:
>>>> ullat: 0
>>>> ullon: 0
>>>> urlat: 0
>>>> urlon: 0
>>>> lrlat: 0
>>>> lrlon: 0
>>>> lllat: 0
>>>> lllon: 0
>>>> town:
>>>> country:
>>>> rgb.r: 0
>>>> rgb.g: 1
>>>> rgb.b: 2
>>>> projectionref: GEOGCS["WGS 84",
>>>>     DATUM["WGS_1984",
>>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>>             AUTHORITY["EPSG","7030"]],
>>>>         AUTHORITY["EPSG","6326"]],
>>>>     PRIMEM["Greenwich",0],
>>>>     UNIT["degree",0.0174532925199433],
>>>>     AUTHORITY["EPSG","4326"]]
>>>> keyword:
>>>> gcp.count: 0
>>>> gcp.proj:
>>>> gcp.ids:
>>>> gcp.info:
>>>> gcp.imcoord:
>>>> gcp.geocoord:
>>>>
>>>> woodbri@optane28:/u/ror/buildings/tmp$ gdalinfo tmp-23081-
>>>> areaofinterest.vrt
>>>> Driver: VRT/Virtual Raster
>>>> Files: tmp-23081-areaofinterest.vrt
>>>>        /u/ror/buildings/tmp/tmp-23081-areaofinterest.vrt.vrt
>>>> Size is 19933, 19763
>>>> Coordinate System is:
>>>> GEOGCS["WGS 84",
>>>>     DATUM["WGS_1984",
>>>>         SPHEROID["WGS 84",6378137,298.257223563,
>>>>             AUTHORITY["EPSG","7030"]],
>>>>         AUTHORITY["EPSG","6326"]],
>>>>     PRIMEM["Greenwich",0],
>>>>     UNIT["degree",0.0174532925199433],
>>>>     AUTHORITY["EPSG","4326"]]
>>>> Origin = (-118.441851318576212,34.003461706049677)
>>>> Pixel Size = (0.000009835776490,-0.000009835776490)
>>>> Corner Coordinates:
>>>> Upper Left  (-118.4418513,  34.0034617) (118d26'30.66"W, 34d 0'12.46"N)
>>>> Lower Left  (-118.4418513,  33.8090773) (118d26'30.66"W, 33d48'32.68"N)
>>>> Upper Right (-118.2457948,  34.0034617) (118d14'44.86"W, 34d 0'12.46"N)
>>>> Lower Right (-118.2457948,  33.8090773) (118d14'44.86"W, 33d48'32.68"N)
>>>> Center      (-118.3438231,  33.9062695) (118d20'37.76"W, 33d54'22.57"N)
>>>> Band 1 Block=128x128 Type=Byte, ColorInterp=Red
>>>>   Mask Flags: PER_DATASET
>>>> Band 2 Block=128x128 Type=Byte, ColorInterp=Green
>>>>   Mask Flags: PER_DATASET
>>>> Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
>>>>   Mask Flags: PER_DATASET
>>>> Band 4 Block=128x128 Type=Byte, ColorInterp=Gray
>>>>   Mask Flags: PER_DATASET
>>>>
>>>>
>>>>
>>>>

-- 
-- 
Check the OTB FAQ at
http://www.orfeo-toolbox.org/FAQ.html

You received this message because you are subscribed to the Google
Groups "otb-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/otb-users?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"otb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[otb-users] Re: Performance of LSMS Segmentation with many CPU server

Reply via email to