Hi Stephen, just want to add that there is perhaps something else to try with the ITK mechanism which allows to use pool of threads:
https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/Common/include/itkMultiThreader.h#L210 You can easily test this by setting the environment variable ITK_USE_THREADPOOL (to 'ON' for instance). Never personally tried this configuration and I was not able to find much documentation about it for now. Best regards, Manuel 2017-05-18 23:52 GMT+02:00 Stephen Woodbridge <[email protected]>: > Hello Remi, > > I have never used MPI before. I can run the LSMS Smooth from the cli. The > system has 4 cpu sockets with 14 core per socket: > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 56 > On-line CPU(s) list: 0-55 > Thread(s) per core: 2 > Core(s) per socket: 14 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 79 > Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz > Stepping: 1 > CPU MHz: 1497.888 > CPU max MHz: 2600.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 5207.83 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 35840K > NUMA node0 CPU(s): 0-13,28-41 > NUMA node1 CPU(s): 14-27,42-55 > > So I launch something like: > > mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -in maur_rgb.png > -fout smooth.tif -foutpos position.tif -spatialr 16 -ranger 16 -thres 0.1 > -maxiter 100 > > So if I understand this launches 4 copies of the application, but how do > they know which instance is working on what? Is that just the magic of MPI? > > -Steve > > > On Thursday, May 18, 2017 at 12:07:52 PM UTC-4, remicres wrote: >> >> Hello Stephen, >> I am really interested in your results. >> A few years ago I failed to have good benchmarks of otb apps (that is, a >> good scalability over the cpu usage) on the same kind of machine of yours. >> The speedup was collapsing near 10-30 cpus (depending of the app). I >> suspected fine tuning to be the cause, and I did not have the time to >> persevere. This bad speedup might be related to threads positioning, cache >> issues: the actual framework is well cpu-scalable in processing images in a >> shared memory context, particularly when threads are on the same socket of >> cpus. Depending of the algorithm used, I suspect one might need to fine >> tune also the settings of the environment. >> Could you provide the number of sockets of your machine? (with the number >> of cpus for each one) >> >> If this machine has many sockets, one quick workaround to have good >> speedup could consist in using the MPI support and force the binding of mpi >> processes over the sockets (e.g. with openmpi: "mpirun -n <nb of socket of >> your machine> --bind-to socket ..."). However, not sure how to use it from >> python. >> >> Keep us updated! >> >> Rémi >> >> Le mercredi 17 mai 2017 21:34:48 UTC+2, Stephen Woodbridge a écrit : >>> >>> I started watching this with htop and all the cpus are getting action. >>> There is a pattern where the number of threads spikes from about 162 up to >>> 215 and the number of running threads spkies to about 50ish for a few >>> secounds, then the running threads drops to 2 for 5-10 seconds and repeats >>> this pattern. I thinking that the parent thread is spinning up a bunch of >>> workers, the finish, then the parent thread cycles through each of the >>> finished workers collecting the results and presumably write it to disk or >>> something. If it is writing to disk, there could be a huge potential >>> performance improvement by writing the output to memory if enough memory is >>> available which is clearly the case on this machine, then flushing the >>> memory to disk. The current process is only using 3 GB or memory when it >>> has 100 GB available to it and the system has 120GB. >>> >>> On Wednesday, May 17, 2017 at 12:13:04 PM UTC-4, Stephen Woodbridge >>> wrote: >>>> >>>> Hi, first I want to say the LSMS Segmentation is very cool and works >>>> nicely. I recently got access to a sever with 56 cores and 128GB of memory >>>> but I can't seem to get it to use more than 10-15 cores. I'm running the >>>> smoothing on an image approx 20000x20000 in size. The image is a gdal VRT >>>> file that combines 8 DOQQ images into a mosaic. It has 4 bands R, G, B, IR >>>> with each having Mask Flags: PER_DATASET (see below). I'm running this from >>>> a Python script like: >>>> >>>> def smoothing(fin, fout, foutpos, spatialr, ranger, rangeramp, thres, >>>> maxiter, ram): >>>> app = otbApplication.Registry.CreateApplication('MeanShiftSmoothin >>>> g') >>>> app.SetParameterString('in', fin) >>>> app.SetParameterString('fout', fout) >>>> app.SetParameterString('foutpos', foutpos) >>>> app.SetParameterInt('spatialr', spatialr) >>>> app.SetParameterFloat('ranger', ranger) >>>> app.SetParameterFloat('rangeramp', rangeramp) >>>> app.SetParameterFloat('thres', thres) >>>> app.SetParameterInt('maxiter', maxiter) >>>> app.SetParameterInt('ram', ram) >>>> app.SetParameterInt('modesearch', 0) >>>> app.ExecuteAndWriteOutput() >>>> >>>> Where: >>>> spatialr: 24 >>>> ranger: 36 >>>> rangeramp: 0 >>>> thres: 0.1 >>>> maxiter: 100 >>>> ram: 102400 >>>> >>>> Any thoughts on how I can get this to utilize more of the processing >>>> power of this machine? >>>> >>>> -Steve >>>> >>>> woodbri@optane28:/u/ror/buildings/tmp$ otbcli_ReadImageInfo -in tmp- >>>> 23081-areaofinterest.vrt >>>> 2017 May 17 15:36:04 : Application.logger (INFO) >>>> Image general information: >>>> Number of bands : 4 >>>> No data flags : Not found >>>> Start index : [0,0] >>>> Size : [19933,19763] >>>> Origin : [-118.442,34.0035] >>>> Spacing : [9.83578e-06,-9.83578e-06] >>>> Estimated ground spacing (in meters): [0.90856,1.09369] >>>> >>>> Image acquisition information: >>>> Sensor : >>>> Image identification number: >>>> Image projection : GEOGCS["WGS 84", >>>> DATUM["WGS_1984", >>>> SPHEROID["WGS 84",6378137,298.257223563, >>>> AUTHORITY["EPSG","7030"]], >>>> AUTHORITY["EPSG","6326"]], >>>> PRIMEM["Greenwich",0], >>>> UNIT["degree",0.0174532925199433], >>>> AUTHORITY["EPSG","4326"]] >>>> >>>> Image default RGB composition: >>>> [R, G, B] = [0,1,2] >>>> >>>> Ground control points information: >>>> Number of GCPs = 0 >>>> GCPs projection = >>>> >>>> Output parameters value: >>>> indexx: 0 >>>> indexy: 0 >>>> sizex: 19933 >>>> sizey: 19763 >>>> spacingx: 9.835776837e-06 >>>> spacingy: -9.835776837e-06 >>>> originx: -118.4418488 >>>> originy: 34.00345612 >>>> estimatedgroundspacingx: 0.9085595012 >>>> estimatedgroundspacingy: 1.093693733 >>>> numberbands: 4 >>>> sensor: >>>> id: >>>> time: >>>> ullat: 0 >>>> ullon: 0 >>>> urlat: 0 >>>> urlon: 0 >>>> lrlat: 0 >>>> lrlon: 0 >>>> lllat: 0 >>>> lllon: 0 >>>> town: >>>> country: >>>> rgb.r: 0 >>>> rgb.g: 1 >>>> rgb.b: 2 >>>> projectionref: GEOGCS["WGS 84", >>>> DATUM["WGS_1984", >>>> SPHEROID["WGS 84",6378137,298.257223563, >>>> AUTHORITY["EPSG","7030"]], >>>> AUTHORITY["EPSG","6326"]], >>>> PRIMEM["Greenwich",0], >>>> UNIT["degree",0.0174532925199433], >>>> AUTHORITY["EPSG","4326"]] >>>> keyword: >>>> gcp.count: 0 >>>> gcp.proj: >>>> gcp.ids: >>>> gcp.info: >>>> gcp.imcoord: >>>> gcp.geocoord: >>>> >>>> woodbri@optane28:/u/ror/buildings/tmp$ gdalinfo tmp-23081- >>>> areaofinterest.vrt >>>> Driver: VRT/Virtual Raster >>>> Files: tmp-23081-areaofinterest.vrt >>>> /u/ror/buildings/tmp/tmp-23081-areaofinterest.vrt.vrt >>>> Size is 19933, 19763 >>>> Coordinate System is: >>>> GEOGCS["WGS 84", >>>> DATUM["WGS_1984", >>>> SPHEROID["WGS 84",6378137,298.257223563, >>>> AUTHORITY["EPSG","7030"]], >>>> AUTHORITY["EPSG","6326"]], >>>> PRIMEM["Greenwich",0], >>>> UNIT["degree",0.0174532925199433], >>>> AUTHORITY["EPSG","4326"]] >>>> Origin = (-118.441851318576212,34.003461706049677) >>>> Pixel Size = (0.000009835776490,-0.000009835776490) >>>> Corner Coordinates: >>>> Upper Left (-118.4418513, 34.0034617) (118d26'30.66"W, 34d 0'12.46"N) >>>> Lower Left (-118.4418513, 33.8090773) (118d26'30.66"W, 33d48'32.68"N) >>>> Upper Right (-118.2457948, 34.0034617) (118d14'44.86"W, 34d 0'12.46"N) >>>> Lower Right (-118.2457948, 33.8090773) (118d14'44.86"W, 33d48'32.68"N) >>>> Center (-118.3438231, 33.9062695) (118d20'37.76"W, 33d54'22.57"N) >>>> Band 1 Block=128x128 Type=Byte, ColorInterp=Red >>>> Mask Flags: PER_DATASET >>>> Band 2 Block=128x128 Type=Byte, ColorInterp=Green >>>> Mask Flags: PER_DATASET >>>> Band 3 Block=128x128 Type=Byte, ColorInterp=Blue >>>> Mask Flags: PER_DATASET >>>> Band 4 Block=128x128 Type=Byte, ColorInterp=Gray >>>> Mask Flags: PER_DATASET >>>> >>>> >>>> >>>> -- > -- > Check the OTB FAQ at > http://www.orfeo-toolbox.org/FAQ.html > > You received this message because you are subscribed to the Google > Groups "otb-users" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/otb-users?hl=en > --- > You received this message because you are subscribed to the Google Groups > "otb-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Manuel Grizonnet -- -- Check the OTB FAQ at http://www.orfeo-toolbox.org/FAQ.html You received this message because you are subscribed to the Google Groups "otb-users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/otb-users?hl=en --- You received this message because you are subscribed to the Google Groups "otb-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
