Hi Remi,
Thanks, I have this more or less working. I have not set the env variable
ITK_GLOBAL_DEFAULT_NUMBER_OF_
THREADS but I will try that, I seem to be getting about 4 times that many
threads running.
Below are various problems I've run into. Some of these might be code bugs,
or config issues, or who knows what :)
I get an error with --bind-to socket
mpirun -np 4 --bind-to socket otbcli_MeanShiftSmoothing -in /u/ror/buildings
/data/naip/doqqs/2014/33118/m_3311805_se_11_1_20140513.tif -fout /u/ror/
buildings/tmp/test1-smooth.tif -foutpos /u/ror/buildings/tmp/test1-smoothpos
.tif -spatialr 24 -ranger 36 -ram 102400
Unexpected end of /proc/mounts line `overlay / overlay
rw,seclabel,relatime,lowerdir=/var/lib/docker/overlay2/l/JPC7E5F4RB77LOK22ETL5FMEPN:/var/lib/docker/overlay2/l/DM3Q73J52BCAIEZVAQZGAMXLCX:/var/lib/docker/overlay2/l/WC5LQTPG4RBGOUEZ7KBJZLUB2R:/var/lib/docker/overlay2/l/BESSO2WOBICH2P4GSVX7VSCGG6:/var/lib/docker/overlay2/l/FMSJDZMFK67RHOIIZOLKOICAHI:/var/lib/docker/overlay2/l/U7AFHXIVI6KAKUO2VJMZWLQOHH:/var/lib/docker/overlay2/l/EIRHWP2GOK3F2PH7SHY4FK6J6P,upperdir=/var/lib/docker/overlay2/73d138b0a2dadf534a9d9c7d2ed894484515bfe3d2f1807a2b8'
--------------------------------------------------------------------------
WARNING: Open MPI tried to bind a process but failed. This is a
warning only; your job will continue, though performance may
be degraded.
Local host: optane30
Application name: /usr/bin/otbcli_MeanShiftSmoothing
Error message: failed to bind memory
Location: odls_default_module.c:639
--------------------------------------------------------------------------
But the job runs to completion. When I try to run otbcli_LSMSVectorization
under
mpi it fails. The same command runs fine without mpi. If this command
shouldn't run under mpi, you might want to add a check and report to the
user, or just internally disable mpi.
mpirun -np 4 --bind-to socket otbcli_LSMSVectorization -in /u/ror/buildings/
tmp/test1-smooth.tif -inseg /u/ror/buildings/tmp/test1-segs.tif -out /u/ror/
buildings/tmp/test1-segments.shp -tilesizex 1025 -tilesizey 1025
Unexpected end of /proc/mounts line `overlay / overlay
rw,seclabel,relatime,lowerdir=/var/lib/docker/overlay2/l/JPC7E5F4RB77LOK22ETL5FMEPN:/var/lib/docker/overlay2/l/DM3Q73J52BCAIEZVAQZGAMXLCX:/var/lib/docker/overlay2/l/WC5LQTPG4RBGOUEZ7KBJZLUB2R:/var/lib/docker/overlay2/l/BESSO2WOBICH2P4GSVX7VSCGG6:/var/lib/docker/overlay2/l/FMSJDZMFK67RHOIIZOLKOICAHI:/var/lib/docker/overlay2/l/U7AFHXIVI6KAKUO2VJMZWLQOHH:/var/lib/docker/overlay2/l/EIRHWP2GOK3F2PH7SHY4FK6J6P,upperdir=/var/lib/docker/overlay2/73d138b0a2dadf534a9d9c7d2ed894484515bfe3d2f1807a2b8'
--------------------------------------------------------------------------
WARNING: Open MPI tried to bind a process but failed. This is a
warning only; your job will continue, though performance may
be degraded.
Local host: optane30
Application name: /usr/bin/otbcli_LSMSVectorization
Error message: failed to bind memory
Location: odls_default_module.c:639
--------------------------------------------------------------------------
2017 May 22 16:21:20 : Application.logger (CRITICAL) Invalid image
filename /u/ror/buildings/tmp/test1-segs.tif.
2017 May 22 16:21:20 : Application.logger (CRITICAL) Invalid image
filename /u/ror/buildings/tmp/test1-segs.tif.
2017 May 22 16:21:20 : Application.logger (FATAL) The following error
occurred during application execution :
/build/otb-KxFZzD/otb-5.4.0+dfsg/Modules/Wrappers/ApplicationEngine/include/otbWrapperInputImageParameter.txx:76:
itk::ERROR: InputImageParameter(0x560f0bbf6ae0): No input image or filename
detected...
2017 May 22 16:21:20 : Application.logger (FATAL) The following error
occurred during application execution :
/build/otb-KxFZzD/otb-5.4.0+dfsg/Modules/Wrappers/ApplicationEngine/include/otbWrapperInputImageParameter.txx:76:
itk::ERROR: InputImageParameter(0x55b6d1af6ae0): No input image or filename
detected...
2017 May 22 16:21:20 : Application.logger (CRITICAL) Invalid image
filename /u/ror/buildings/tmp/test1-segs.tif.
2017 May 22 16:21:20 : Application.logger (CRITICAL) Invalid image
filename /u/ror/buildings/tmp/test1-segs.tif.
2017 May 22 16:21:20 : Application.logger (FATAL) The following error
occurred during application execution :
/build/otb-KxFZzD/otb-5.4.0+dfsg/Modules/Wrappers/ApplicationEngine/include/otbWrapperInputImageParameter.txx:76:
itk::ERROR: InputImageParameter(0x55f536fbaae0): No input image or filename
detected...
2017 May 22 16:21:20 : Application.logger (FATAL) The following error
occurred during application execution :
/build/otb-KxFZzD/otb-5.4.0+dfsg/Modules/Wrappers/ApplicationEngine/include/otbWrapperInputImageParameter.txx:76:
itk::ERROR: InputImageParameter(0x562285e5dae0): No input image or filename
detected...
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[64617,1],0]
Exit code: 1
--------------------------------------------------------------------------
[optane30.softlayer.com:32749] 3 more processes have sent help message
help-orte-odls-default.txt / memory not bound
[optane30.softlayer.com:32749] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
I also tried otbcli_LSMSVectorization under mpi and it actually took longer
than running it without mpi:
$ time otbcli_LSMSVectorization -in /u/ror/buildings/tmp/test1-smooth.tif
-inseg
/u/ror/buildings/tmp/test1-segs.tif -out /u/ror/buildings/tmp/test1-segments
.shp -tilesizex 1025 -tilesizey 1025
2017 May 22 18:23:43 : Application.logger (INFO) Number of tiles: 8 x 7
2017 May 22 18:23:45 : Application.logger (INFO) Vectorization ...
2017 May 22 18:24:21 : Application.logger (INFO) Merging polygons across
tiles ...
2017 May 22 18:29:01 : Application.logger (INFO) Elapsed time: 380.383
seconds
real 5m18.121s
user 6m17.994s
sys 0m2.704s
$ time mpirun -np 4 --bind-to socket otbcli_LSMSVectorization -in /u/ror/
buildings/tmp/test1-smooth.tif -inseg /u/ror/buildings/tmp/test1-segs.tif -
out /u/ror/buildings/tmp/test1-segments-mpi.shp -tilesizex 1025 -tilesizey
1025
Unexpected end of /proc/mounts line `overlay
/ overlay
rw,seclabel,relatime,lowerdir=/var/lib/docker/overlay2/l/JPC7E5F4RB77LOK22ETL5FMEPN:/var/lib/docker/overlay2/l/DM3Q73J52BCAIEZVAQZGAMXLCX:/var/lib/docker/overlay2/l/WC5LQTPG4RBGOUEZ7KBJZLUB2R:/var/lib/docker/overlay2/l/BESSO2WOBICH2P4GSVX7VSCGG6:/var/lib/docker/overlay2/l/FMSJDZMFK67RHOIIZOLKOICAHI:/var/lib/docker/overlay2/l/U7AFHXIVI6KAKUO2VJMZWLQOHH:/var/lib/docker/overlay2/l/EIRHWP2GOK3F2PH7SHY4FK6J6P,upperdir=/var/lib/docker/overlay2/73d138b0a2dadf534a9d9c7d2ed894484515bfe3d2f1807a2b8'
--------------------------------------------------------------------------
WARNING: Open MPI tried to bind a process but failed. This is a
warning only; your job will continue, though performance may
be degraded.
Local host: optane30
Application name: /usr/bin/otbcli_LSMSVectorization
Error message: failed to bind memory
Location: odls_default_module.c:639
--------------------------------------------------------------------------
2017 May 22 18:30:46 : Application.logger (INFO) Number of tiles: 8 x 7
2017 May 22 18:30:46 : Application.logger (INFO) Number of tiles: 8 x 7
2017 May 22 18:30:46 : Application.logger (INFO) Number of tiles: 8 x 7
2017 May 22 18:30:46 : Application.logger (INFO) Number of tiles: 8 x 7
2017 May 22 18:30:47 : Application.logger (INFO) Vectorization ...
2017 May 22 18:30:47 : Application.logger (INFO) Vectorization ...
2017 May 22 18:30:47 : Application.logger (INFO) Vectorization ...
2017 May 22 18:30:47 : Application.logger (INFO) Vectorization ...
[optane30.softlayer.com:14449] 3 more processes have sent help message
help-orte-odls-default.txt / memory not bound
[optane30.softlayer.com:14449] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
2017 May 22 18:31:22 : Application.logger (INFO) Merging polygons across
tiles ...
2017 May 22 18:31:22 : Application.logger (INFO) Merging polygons across
tiles ...
2017 May 22 18:31:22 : Application.logger (INFO) Merging polygons across
tiles ...
2017 May 22 18:31:22 : Application.logger (INFO) Merging polygons across
tiles ...
2017 May 22 18:37:17 : Application.logger (INFO) Elapsed time: 403.66
seconds
2017 May 22 18:37:20 : Application.logger (INFO) Elapsed time: 406.363
seconds
2017 May 22 18:37:22 : Application.logger (INFO) Elapsed time: 409.377
seconds
2017 May 22 18:38:22 : Application.logger (INFO) Elapsed time: 468.458
seconds
real 7m36.737s
user 27m59.170s
sys 0m9.361s
All in, this is awesome software and I love it. I'll be putting more time
into learn more of its features.
Thanks,
-Steve
On Sunday, May 21, 2017 at 7:08:12 AM UTC-4, remicres wrote:
>
> Hello Stephen,
>
> To let the mpi magic happen, you must compile otb with the option
> OTB_USE_MPI=ON.
> You may also set the option OTB_USE_SPTW=ON wich enables the writing of
> .tif files in parallel.
> After this, you set the ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS to 14 (wich
> is the nb of threads per socket) then you deploy the app over 4 mpi
> processes (each one binded to a socket)
> mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -...
>
> I just realize that we need more material about mpi on the wiki / cookbook
> / blog, I will take care of this soon...
>
> Le jeudi 18 mai 2017 23:52:23 UTC+2, Stephen Woodbridge a écrit :
>>
>> Hello Remi,
>>
>> I have never used MPI before. I can run the LSMS Smooth from the cli. The
>> system has 4 cpu sockets with 14 core per socket:
>>
>> $ lscpu
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Byte Order: Little Endian
>> CPU(s): 56
>> On-line CPU(s) list: 0-55
>> Thread(s) per core: 2
>> Core(s) per socket: 14
>> Socket(s): 2
>> NUMA node(s): 2
>> Vendor ID: GenuineIntel
>> CPU family: 6
>> Model: 79
>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>> Stepping: 1
>> CPU MHz: 1497.888
>> CPU max MHz: 2600.0000
>> CPU min MHz: 1200.0000
>> BogoMIPS: 5207.83
>> Virtualization: VT-x
>> L1d cache: 32K
>> L1i cache: 32K
>> L2 cache: 256K
>> L3 cache: 35840K
>> NUMA node0 CPU(s): 0-13,28-41
>> NUMA node1 CPU(s): 14-27,42-55
>>
>> So I launch something like:
>>
>> mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -in maur_rgb.png
>> -fout smooth.tif -foutpos position.tif -spatialr 16 -ranger 16 -thres 0.1
>> -maxiter 100
>>
>> So if I understand this launches 4 copies of the application, but how do
>> they know which instance is working on what? Is that just the magic of MPI?
>>
>> -Steve
>>
>> On Thursday, May 18, 2017 at 12:07:52 PM UTC-4, remicres wrote:
>>>
>>> Hello Stephen,
>>> I am really interested in your results.
>>> A few years ago I failed to have good benchmarks of otb apps (that is, a
>>> good scalability over the cpu usage) on the same kind of machine of yours.
>>> The speedup was collapsing near 10-30 cpus (depending of the app). I
>>> suspected fine tuning to be the cause, and I did not have the time to
>>> persevere. This bad speedup might be related to threads positioning, cache
>>> issues: the actual framework is well cpu-scalable in processing images in a
>>> shared memory context, particularly when threads are on the same socket of
>>> cpus. Depending of the algorithm used, I suspect one might need to fine
>>> tune also the settings of the environment.
>>> Could you provide the number of sockets of your machine? (with the
>>> number of cpus for each one)
>>>
>>> If this machine has many sockets, one quick workaround to have good
>>> speedup could consist in using the MPI support and force the binding of mpi
>>> processes over the sockets (e.g. with openmpi: "mpirun -n <nb of socket of
>>> your machine> --bind-to socket ..."). However, not sure how to use it from
>>> python.
>>>
>>> Keep us updated!
>>>
>>> Rémi
>>>
>>> Le mercredi 17 mai 2017 21:34:48 UTC+2, Stephen Woodbridge a écrit :
>>>>
>>>> I started watching this with htop and all the cpus are getting action.
>>>> There is a pattern where the number of threads spikes from about 162 up to
>>>> 215 and the number of running threads spkies to about 50ish for a few
>>>> secounds, then the running threads drops to 2 for 5-10 seconds and repeats
>>>> this pattern. I thinking that the parent thread is spinning up a bunch of
>>>> workers, the finish, then the parent thread cycles through each of the
>>>> finished workers collecting the results and presumably write it to disk or
>>>> something. If it is writing to disk, there could be a huge potential
>>>> performance improvement by writing the output to memory if enough memory
>>>> is
>>>> available which is clearly the case on this machine, then flushing the
>>>> memory to disk. The current process is only using 3 GB or memory when it
>>>> has 100 GB available to it and the system has 120GB.
>>>>
>>>> On Wednesday, May 17, 2017 at 12:13:04 PM UTC-4, Stephen Woodbridge
>>>> wrote:
>>>>>
>>>>> Hi, first I want to say the LSMS Segmentation is very cool and works
>>>>> nicely. I recently got access to a sever with 56 cores and 128GB of
>>>>> memory
>>>>> but I can't seem to get it to use more than 10-15 cores. I'm running the
>>>>> smoothing on an image approx 20000x20000 in size. The image is a gdal VRT
>>>>> file that combines 8 DOQQ images into a mosaic. It has 4 bands R, G, B,
>>>>> IR
>>>>> with each having Mask Flags: PER_DATASET (see below). I'm running this
>>>>> from
>>>>> a Python script like:
>>>>>
>>>>> def smoothing(fin, fout, foutpos, spatialr, ranger, rangeramp, thres,
>>>>> maxiter, ram):
>>>>> app = otbApplication.Registry.CreateApplication(
>>>>> 'MeanShiftSmoothing')
>>>>> app.SetParameterString('in', fin)
>>>>> app.SetParameterString('fout', fout)
>>>>> app.SetParameterString('foutpos', foutpos)
>>>>> app.SetParameterInt('spatialr', spatialr)
>>>>> app.SetParameterFloat('ranger', ranger)
>>>>> app.SetParameterFloat('rangeramp', rangeramp)
>>>>> app.SetParameterFloat('thres', thres)
>>>>> app.SetParameterInt('maxiter', maxiter)
>>>>> app.SetParameterInt('ram', ram)
>>>>> app.SetParameterInt('modesearch', 0)
>>>>> app.ExecuteAndWriteOutput()
>>>>>
>>>>> Where:
>>>>> spatialr: 24
>>>>> ranger: 36
>>>>> rangeramp: 0
>>>>> thres: 0.1
>>>>> maxiter: 100
>>>>> ram: 102400
>>>>>
>>>>> Any thoughts on how I can get this to utilize more of the processing
>>>>> power of this machine?
>>>>>
>>>>> -Steve
>>>>>
>>>>> woodbri@optane28:/u/ror/buildings/tmp$ otbcli_ReadImageInfo -in tmp-
>>>>> 23081-areaofinterest.vrt
>>>>> 2017 May 17 15:36:04 : Application.logger (INFO)
>>>>> Image general information:
>>>>> Number of bands : 4
>>>>> No data flags : Not found
>>>>> Start index : [0,0]
>>>>> Size : [19933,19763]
>>>>> Origin : [-118.442,34.0035]
>>>>> Spacing : [9.83578e-06,-9.83578e-06]
>>>>> Estimated ground spacing (in meters): [0.90856,1.09369]
>>>>>
>>>>> Image acquisition information:
>>>>> Sensor :
>>>>> Image identification number:
>>>>> Image projection : GEOGCS["WGS 84",
>>>>> DATUM["WGS_1984",
>>>>> SPHEROID["WGS 84",6378137,298.257223563,
>>>>> AUTHORITY["EPSG","7030"]],
>>>>> AUTHORITY["EPSG","6326"]],
>>>>> PRIMEM["Greenwich",0],
>>>>> UNIT["degree",0.0174532925199433],
>>>>> AUTHORITY["EPSG","4326"]]
>>>>>
>>>>> Image default RGB composition:
>>>>> [R, G, B] = [0,1,2]
>>>>>
>>>>> Ground control points information:
>>>>> Number of GCPs = 0
>>>>> GCPs projection =
>>>>>
>>>>> Output parameters value:
>>>>> indexx: 0
>>>>> indexy: 0
>>>>> sizex: 19933
>>>>> sizey: 19763
>>>>> spacingx: 9.835776837e-06
>>>>> spacingy: -9.835776837e-06
>>>>> originx: -118.4418488
>>>>> originy: 34.00345612
>>>>> estimatedgroundspacingx: 0.9085595012
>>>>> estimatedgroundspacingy: 1.093693733
>>>>> numberbands: 4
>>>>> sensor:
>>>>> id:
>>>>> time:
>>>>> ullat: 0
>>>>> ullon: 0
>>>>> urlat: 0
>>>>> urlon: 0
>>>>> lrlat: 0
>>>>> lrlon: 0
>>>>> lllat: 0
>>>>> lllon: 0
>>>>> town:
>>>>> country:
>>>>> rgb.r: 0
>>>>> rgb.g: 1
>>>>> rgb.b: 2
>>>>> projectionref: GEOGCS["WGS 84",
>>>>> DATUM["WGS_1984",
>>>>> SPHEROID["WGS 84",6378137,298.257223563,
>>>>> AUTHORITY["EPSG","7030"]],
>>>>> AUTHORITY["EPSG","6326"]],
>>>>> PRIMEM["Greenwich",0],
>>>>> UNIT["degree",0.0174532925199433],
>>>>> AUTHORITY["EPSG","4326"]]
>>>>> keyword:
>>>>> gcp.count: 0
>>>>> gcp.proj:
>>>>> gcp.ids:
>>>>> gcp.info:
>>>>> gcp.imcoord:
>>>>> gcp.geocoord:
>>>>>
>>>>> woodbri@optane28:/u/ror/buildings/tmp$ gdalinfo tmp-23081-
>>>>> areaofinterest.vrt
>>>>> Driver: VRT/Virtual Raster
>>>>> Files: tmp-23081-areaofinterest.vrt
>>>>> /u/ror/buildings/tmp/tmp-23081-areaofinterest.vrt.vrt
>>>>> Size is 19933, 19763
>>>>> Coordinate System is:
>>>>> GEOGCS["WGS 84",
>>>>> DATUM["WGS_1984",
>>>>> SPHEROID["WGS 84",6378137,298.257223563,
>>>>> AUTHORITY["EPSG","7030"]],
>>>>> AUTHORITY["EPSG","6326"]],
>>>>> PRIMEM["Greenwich",0],
>>>>> UNIT["degree",0.0174532925199433],
>>>>> AUTHORITY["EPSG","4326"]]
>>>>> Origin = (-118.441851318576212,34.003461706049677)
>>>>> Pixel Size = (0.000009835776490,-0.000009835776490)
>>>>> Corner Coordinates:
>>>>> Upper Left (-118.4418513, 34.0034617) (118d26'30.66"W, 34d 0'12.46
>>>>> "N)
>>>>> Lower Left (-118.4418513, 33.8090773) (118d26'30.66"W, 33d48
>>>>> '32.68"N)
>>>>> Upper Right (-118.2457948, 34.0034617) (118d14'44.86"W, 34d 0'12.46"N
>>>>> )
>>>>> Lower Right (-118.2457948, 33.8090773) (118d14'44.86"W, 33d48'32.68
>>>>> "N)
>>>>> Center (-118.3438231, 33.9062695) (118d20'37.76"W, 33d54
>>>>> '22.57"N)
>>>>> Band 1 Block=128x128 Type=Byte, ColorInterp=Red
>>>>> Mask Flags: PER_DATASET
>>>>> Band 2 Block=128x128 Type=Byte, ColorInterp=Green
>>>>> Mask Flags: PER_DATASET
>>>>> Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
>>>>> Mask Flags: PER_DATASET
>>>>> Band 4 Block=128x128 Type=Byte, ColorInterp=Gray
>>>>> Mask Flags: PER_DATASET
>>>>>
>>>>>
>>>>>
>>>>>
--
--
Check the OTB FAQ at
http://www.orfeo-toolbox.org/FAQ.html
You received this message because you are subscribed to the Google
Groups "otb-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/otb-users?hl=en
---
You received this message because you are subscribed to the Google Groups
"otb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.