Here is an interim summary of what I have found out so far: - base starting case (10-15 Load average) - converting the input file from .vrt to .tif improves performance a little (12-17 Load average) - ITK_USE_THREADPOOL=ON improves performance a little using .tif (17-28 Load average) - using .tif input and mpi improves performance a lot (80-85 Load average) Load average as reported by htop.
On Friday, May 19, 2017 at 7:11:40 AM UTC-4, Manuel Grizonnet wrote: > > Hi Stephen, > > just want to add that there is perhaps something else to try with the ITK > mechanism which allows to use pool of threads: > > > https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/Common/include/itkMultiThreader.h#L210 > > You can easily test this by setting the environment variable > ITK_USE_THREADPOOL (to 'ON' for instance). > > Never personally tried this configuration and I was not able to find much > documentation about it for now. > > Best regards, > > Manuel > > > 2017-05-18 23:52 GMT+02:00 Stephen Woodbridge <[email protected] > <javascript:>>: > >> Hello Remi, >> >> I have never used MPI before. I can run the LSMS Smooth from the cli. The >> system has 4 cpu sockets with 14 core per socket: >> >> $ lscpu >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 56 >> On-line CPU(s) list: 0-55 >> Thread(s) per core: 2 >> Core(s) per socket: 14 >> Socket(s): 2 >> NUMA node(s): 2 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 79 >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz >> Stepping: 1 >> CPU MHz: 1497.888 >> CPU max MHz: 2600.0000 >> CPU min MHz: 1200.0000 >> BogoMIPS: 5207.83 >> Virtualization: VT-x >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 256K >> L3 cache: 35840K >> NUMA node0 CPU(s): 0-13,28-41 >> NUMA node1 CPU(s): 14-27,42-55 >> >> So I launch something like: >> >> mpirun -n 4 --bind-to socket otbcli_MeanShiftSmoothing -in maur_rgb.png >> -fout smooth.tif -foutpos position.tif -spatialr 16 -ranger 16 -thres 0.1 >> -maxiter 100 >> >> So if I understand this launches 4 copies of the application, but how do >> they know which instance is working on what? Is that just the magic of MPI? >> >> -Steve >> >> >> On Thursday, May 18, 2017 at 12:07:52 PM UTC-4, remicres wrote: >>> >>> Hello Stephen, >>> I am really interested in your results. >>> A few years ago I failed to have good benchmarks of otb apps (that is, a >>> good scalability over the cpu usage) on the same kind of machine of yours. >>> The speedup was collapsing near 10-30 cpus (depending of the app). I >>> suspected fine tuning to be the cause, and I did not have the time to >>> persevere. This bad speedup might be related to threads positioning, cache >>> issues: the actual framework is well cpu-scalable in processing images in a >>> shared memory context, particularly when threads are on the same socket of >>> cpus. Depending of the algorithm used, I suspect one might need to fine >>> tune also the settings of the environment. >>> Could you provide the number of sockets of your machine? (with the >>> number of cpus for each one) >>> >>> If this machine has many sockets, one quick workaround to have good >>> speedup could consist in using the MPI support and force the binding of mpi >>> processes over the sockets (e.g. with openmpi: "mpirun -n <nb of socket of >>> your machine> --bind-to socket ..."). However, not sure how to use it from >>> python. >>> >>> Keep us updated! >>> >>> Rémi >>> >>> Le mercredi 17 mai 2017 21:34:48 UTC+2, Stephen Woodbridge a écrit : >>>> >>>> I started watching this with htop and all the cpus are getting action. >>>> There is a pattern where the number of threads spikes from about 162 up to >>>> 215 and the number of running threads spkies to about 50ish for a few >>>> secounds, then the running threads drops to 2 for 5-10 seconds and repeats >>>> this pattern. I thinking that the parent thread is spinning up a bunch of >>>> workers, the finish, then the parent thread cycles through each of the >>>> finished workers collecting the results and presumably write it to disk or >>>> something. If it is writing to disk, there could be a huge potential >>>> performance improvement by writing the output to memory if enough memory >>>> is >>>> available which is clearly the case on this machine, then flushing the >>>> memory to disk. The current process is only using 3 GB or memory when it >>>> has 100 GB available to it and the system has 120GB. >>>> >>>> On Wednesday, May 17, 2017 at 12:13:04 PM UTC-4, Stephen Woodbridge >>>> wrote: >>>>> >>>>> Hi, first I want to say the LSMS Segmentation is very cool and works >>>>> nicely. I recently got access to a sever with 56 cores and 128GB of >>>>> memory >>>>> but I can't seem to get it to use more than 10-15 cores. I'm running the >>>>> smoothing on an image approx 20000x20000 in size. The image is a gdal VRT >>>>> file that combines 8 DOQQ images into a mosaic. It has 4 bands R, G, B, >>>>> IR >>>>> with each having Mask Flags: PER_DATASET (see below). I'm running this >>>>> from >>>>> a Python script like: >>>>> >>>>> def smoothing(fin, fout, foutpos, spatialr, ranger, rangeramp, thres, >>>>> maxiter, ram): >>>>> app = otbApplication.Registry.CreateApplication( >>>>> 'MeanShiftSmoothing') >>>>> app.SetParameterString('in', fin) >>>>> app.SetParameterString('fout', fout) >>>>> app.SetParameterString('foutpos', foutpos) >>>>> app.SetParameterInt('spatialr', spatialr) >>>>> app.SetParameterFloat('ranger', ranger) >>>>> app.SetParameterFloat('rangeramp', rangeramp) >>>>> app.SetParameterFloat('thres', thres) >>>>> app.SetParameterInt('maxiter', maxiter) >>>>> app.SetParameterInt('ram', ram) >>>>> app.SetParameterInt('modesearch', 0) >>>>> app.ExecuteAndWriteOutput() >>>>> >>>>> Where: >>>>> spatialr: 24 >>>>> ranger: 36 >>>>> rangeramp: 0 >>>>> thres: 0.1 >>>>> maxiter: 100 >>>>> ram: 102400 >>>>> >>>>> Any thoughts on how I can get this to utilize more of the processing >>>>> power of this machine? >>>>> >>>>> -Steve >>>>> >>>>> woodbri@optane28:/u/ror/buildings/tmp$ otbcli_ReadImageInfo -in tmp- >>>>> 23081-areaofinterest.vrt >>>>> 2017 May 17 15:36:04 : Application.logger (INFO) >>>>> Image general information: >>>>> Number of bands : 4 >>>>> No data flags : Not found >>>>> Start index : [0,0] >>>>> Size : [19933,19763] >>>>> Origin : [-118.442,34.0035] >>>>> Spacing : [9.83578e-06,-9.83578e-06] >>>>> Estimated ground spacing (in meters): [0.90856,1.09369] >>>>> >>>>> Image acquisition information: >>>>> Sensor : >>>>> Image identification number: >>>>> Image projection : GEOGCS["WGS 84", >>>>> DATUM["WGS_1984", >>>>> SPHEROID["WGS 84",6378137,298.257223563, >>>>> AUTHORITY["EPSG","7030"]], >>>>> AUTHORITY["EPSG","6326"]], >>>>> PRIMEM["Greenwich",0], >>>>> UNIT["degree",0.0174532925199433], >>>>> AUTHORITY["EPSG","4326"]] >>>>> >>>>> Image default RGB composition: >>>>> [R, G, B] = [0,1,2] >>>>> >>>>> Ground control points information: >>>>> Number of GCPs = 0 >>>>> GCPs projection = >>>>> >>>>> Output parameters value: >>>>> indexx: 0 >>>>> indexy: 0 >>>>> sizex: 19933 >>>>> sizey: 19763 >>>>> spacingx: 9.835776837e-06 >>>>> spacingy: -9.835776837e-06 >>>>> originx: -118.4418488 >>>>> originy: 34.00345612 >>>>> estimatedgroundspacingx: 0.9085595012 >>>>> estimatedgroundspacingy: 1.093693733 >>>>> numberbands: 4 >>>>> sensor: >>>>> id: >>>>> time: >>>>> ullat: 0 >>>>> ullon: 0 >>>>> urlat: 0 >>>>> urlon: 0 >>>>> lrlat: 0 >>>>> lrlon: 0 >>>>> lllat: 0 >>>>> lllon: 0 >>>>> town: >>>>> country: >>>>> rgb.r: 0 >>>>> rgb.g: 1 >>>>> rgb.b: 2 >>>>> projectionref: GEOGCS["WGS 84", >>>>> DATUM["WGS_1984", >>>>> SPHEROID["WGS 84",6378137,298.257223563, >>>>> AUTHORITY["EPSG","7030"]], >>>>> AUTHORITY["EPSG","6326"]], >>>>> PRIMEM["Greenwich",0], >>>>> UNIT["degree",0.0174532925199433], >>>>> AUTHORITY["EPSG","4326"]] >>>>> keyword: >>>>> gcp.count: 0 >>>>> gcp.proj: >>>>> gcp.ids: >>>>> gcp.info: >>>>> gcp.imcoord: >>>>> gcp.geocoord: >>>>> >>>>> woodbri@optane28:/u/ror/buildings/tmp$ gdalinfo tmp-23081- >>>>> areaofinterest.vrt >>>>> Driver: VRT/Virtual Raster >>>>> Files: tmp-23081-areaofinterest.vrt >>>>> /u/ror/buildings/tmp/tmp-23081-areaofinterest.vrt.vrt >>>>> Size is 19933, 19763 >>>>> Coordinate System is: >>>>> GEOGCS["WGS 84", >>>>> DATUM["WGS_1984", >>>>> SPHEROID["WGS 84",6378137,298.257223563, >>>>> AUTHORITY["EPSG","7030"]], >>>>> AUTHORITY["EPSG","6326"]], >>>>> PRIMEM["Greenwich",0], >>>>> UNIT["degree",0.0174532925199433], >>>>> AUTHORITY["EPSG","4326"]] >>>>> Origin = (-118.441851318576212,34.003461706049677) >>>>> Pixel Size = (0.000009835776490,-0.000009835776490) >>>>> Corner Coordinates: >>>>> Upper Left (-118.4418513, 34.0034617) (118d26'30.66"W, 34d 0'12.46 >>>>> "N) >>>>> Lower Left (-118.4418513, 33.8090773) (118d26'30.66"W, 33d48 >>>>> '32.68"N) >>>>> Upper Right (-118.2457948, 34.0034617) (118d14'44.86"W, 34d 0'12.46"N >>>>> ) >>>>> Lower Right (-118.2457948, 33.8090773) (118d14'44.86"W, 33d48'32.68 >>>>> "N) >>>>> Center (-118.3438231, 33.9062695) (118d20'37.76"W, 33d54 >>>>> '22.57"N) >>>>> Band 1 Block=128x128 Type=Byte, ColorInterp=Red >>>>> Mask Flags: PER_DATASET >>>>> Band 2 Block=128x128 Type=Byte, ColorInterp=Green >>>>> Mask Flags: PER_DATASET >>>>> Band 3 Block=128x128 Type=Byte, ColorInterp=Blue >>>>> Mask Flags: PER_DATASET >>>>> Band 4 Block=128x128 Type=Byte, ColorInterp=Gray >>>>> Mask Flags: PER_DATASET >>>>> >>>>> >>>>> >>>>> -- >> -- >> Check the OTB FAQ at >> http://www.orfeo-toolbox.org/FAQ.html >> >> You received this message because you are subscribed to the Google >> Groups "otb-users" group. >> To post to this group, send email to [email protected] >> <javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/otb-users?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "otb-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Manuel Grizonnet > -- -- Check the OTB FAQ at http://www.orfeo-toolbox.org/FAQ.html You received this message because you are subscribed to the Google Groups "otb-users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/otb-users?hl=en --- You received this message because you are subscribed to the Google Groups "otb-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
