Re: [gdal-dev] VRT derived band pixel functions written in Python
I agree...could be slightly inelegant to figure out what version of python is used though...A system call to `where python` on windows would return the interpreter on the path, and you could replace the '.exe' with '.dll' (this is assuming python dll is always alongside the exe, but i think this is the case on all windows distributions), or Evens' suggest by calling python. I cant think of an (easy) way to do it without a system call though. On 15 September 2016 at 14:56, Sean Gillies wrote: > Hi Even, James: > > I suspect users will be find it very surprising if `which python` and the > python interpreter used by VRT are not the same. Imagine starting python in > an environment created by virtualenv or conda that contains extension > modules like scikit-image, scikit-learn, &c. You import osgeo.gdal, open > your super duper VRT with `gdal.Open()`, and get an error because your > environment's extension modules aren't available in the VRT script. This > could be tough to debug for a beginning to intermediate level Python > programmer. > > > On Thu, Sep 15, 2016 at 10:29 AM, James Ramm wrote: > >> excellent, working setting the PYTHONSO variable. Trying to match the >> version of python on the path would be smart, but it is probably 'good >> enough' to give PYTHONSO some prominence in the documentation - some kind >> of warning on how python is discovered perhaps? >> >> On 14 September 2016 at 20:20, James Ramm wrote: >> >>> Ah, that makes sense. I'll have to try the config option in the morning, >>> but it sounds like that could be it >>> >>> On 14 Sep 2016 8:06 p.m., "Even Rouault" >>> wrote: >>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : > Yes, it is loading a different pythonThe path printed by sys.path is > different to if I open the command window and type: > > python > > >>> import sys > >>> print(sys.path) > > Gdal and the python bindings were compiled with vs2015 and python 3.5, and > I can correctly import python in the 3.5 interpreter, yet somehow a > different python DLL (2.7) is being loaded at runtime. Might be related to the default try order in const char* const apszPythonSO[] = { "python27.dll", "python26.dll", "python34.dll", "python35.dll", "python36.dll", "python33.dll", "python32.dll" }; First found, first served. Hum maybe we should try to match the version that issuing python on the command line would start. We could potentially look at the PATH to see if there's something like "bla:\pythonXX" and try the related .dll... Or more costly, but more reliable, try issuing a 'python -c "import sys; print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command That's one of the downside of requiring no dependency at build time. Anyway you can override the default guess by setting the PYTHONSO config option to point to the desired python dll. By the way I've committed the doc in https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a few hours. >> On 14 September 2016 at 20:20, James Ramm wrote: >> >>> Ah, that makes sense. I'll have to try the config option in the morning, >>> but it sounds like that could be it >>> >>> On 14 Sep 2016 8:06 p.m., "Even Rouault" >>> wrote: >>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : > Yes, it is loading a different pythonThe path printed by sys.path is > different to if I open the command window and type: > > python > > >>> import sys > >>> print(sys.path) > > Gdal and the python bindings were compiled with vs2015 and python 3.5, and > I can correctly import python in the 3.5 interpreter, yet somehow a > different python DLL (2.7) is being loaded at runtime. Might be related to the default try order in const char* const apszPythonSO[] = { "python27.dll", "python26.dll", "python34.dll", "python35.dll", "python36.dll", "python33.dll", "python32.dll" }; First found, first served. Hum maybe we should try to match the version that issuing python on the command line would start. We could potentially look at the PATH to see if there's something like "bla:\pythonXX" and
Re: [gdal-dev] VRT derived band pixel functions written in Python
Sean, > I suspect users will be find it very surprising if `which python` and the > python interpreter used by VRT are not the same Yes, we should probably make some effort to use the same python version as the one that comes as default given the PATH. I've just opened https://trac.osgeo.org/gdal/ticket/6652 regarding this. > . Imagine starting python in > an environment created by virtualenv or conda that contains extension > modules like scikit-image, scikit-learn, &c. You import osgeo.gdal, or rasterio ;-) > open > your super duper VRT with `gdal.Open()`, and get an error because your > environment's extension modules aren't available in the VRT script. This > could be tough to debug for a beginning to intermediate level Python > programmer. In the situation you describe here, things will work as expected. External "libpythonXX.so" or the one specified by PYTHONSO are only tried *after* checking if the process has already the Python symbols loaded. Quoting http://gdal.org/gdal_vrttut.html#gdal_vrttut_derived_python : "When GDAL will need to run Python code, it will first determine if the Python interpreter is loaded in the current process (which is the case if the program is a Python interpreter itself, or if another program, e.g. QGIS, has already loaded the CPython library). Otherwise it will look if the PYTHONSO configuration option is defined. [...] If the PYTHONSO configuration option is not defined, then a predefined list of shared objects will be tried [...]" This order is strictly needed and was not what I implemented first. My first attempt honoured PYTHONSO first, but then if you did things like "PYTHONSO=libpython3.1.so python2.7 read_my_vrt.py", this crashed due to conflicting symbols being imported. Even -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Hi Even, James: I suspect users will be find it very surprising if `which python` and the python interpreter used by VRT are not the same. Imagine starting python in an environment created by virtualenv or conda that contains extension modules like scikit-image, scikit-learn, &c. You import osgeo.gdal, open your super duper VRT with `gdal.Open()`, and get an error because your environment's extension modules aren't available in the VRT script. This could be tough to debug for a beginning to intermediate level Python programmer. On Thu, Sep 15, 2016 at 10:29 AM, James Ramm wrote: > excellent, working setting the PYTHONSO variable. Trying to match the > version of python on the path would be smart, but it is probably 'good > enough' to give PYTHONSO some prominence in the documentation - some kind > of warning on how python is discovered perhaps? > > On 14 September 2016 at 20:20, James Ramm wrote: > >> Ah, that makes sense. I'll have to try the config option in the morning, >> but it sounds like that could be it >> >> On 14 Sep 2016 8:06 p.m., "Even Rouault" >> wrote: >> >>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : >>> > Yes, it is loading a different pythonThe path printed by sys.path >>> is >>> > different to if I open the command window and type: >>> > >>> > python >>> > >>> > >>> import sys >>> > >>> print(sys.path) >>> > >>> > Gdal and the python bindings were compiled with vs2015 and python 3.5, >>> and >>> > I can correctly import python in the 3.5 interpreter, yet somehow a >>> > different python DLL (2.7) is being loaded at runtime. >>> >>> Might be related to the default try order in >>> >>> const char* const apszPythonSO[] = { "python27.dll", >>> "python26.dll", >>> "python34.dll", >>> "python35.dll", >>> "python36.dll", >>> "python33.dll", >>> "python32.dll" }; >>> >>> First found, first served. >>> >>> Hum maybe we should try to match the version that issuing python on the >>> command line would start. We could potentially look at the PATH to see if >>> there's something like "bla:\pythonXX" and try the related .dll... Or >>> more >>> costly, but more reliable, try issuing a 'python -c "import sys; >>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command >>> >>> That's one of the downside of requiring no dependency at build time. >>> >>> Anyway you can override the default guess by setting the PYTHONSO config >>> option >>> to point to the desired python dll. >>> >>> By the way I've committed the doc in >>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected >>> online in a >>> few hours. >>> >>> > On 14 September 2016 at 20:20, James Ramm wrote: > >> Ah, that makes sense. I'll have to try the config option in the morning, >> but it sounds like that could be it >> >> On 14 Sep 2016 8:06 p.m., "Even Rouault" >> wrote: >> >>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : >>> > Yes, it is loading a different pythonThe path printed by sys.path >>> is >>> > different to if I open the command window and type: >>> > >>> > python >>> > >>> > >>> import sys >>> > >>> print(sys.path) >>> > >>> > Gdal and the python bindings were compiled with vs2015 and python 3.5, >>> and >>> > I can correctly import python in the 3.5 interpreter, yet somehow a >>> > different python DLL (2.7) is being loaded at runtime. >>> >>> Might be related to the default try order in >>> >>> const char* const apszPythonSO[] = { "python27.dll", >>> "python26.dll", >>> "python34.dll", >>> "python35.dll", >>> "python36.dll", >>> "python33.dll", >>> "python32.dll" }; >>> >>> First found, first served. >>> >>> Hum maybe we should try to match the version that issuing python on the >>> command line would start. We could potentially look at the PATH to see if >>> there's something like "bla:\pythonXX" and try the related .dll... Or >>> more >>> costly, but more reliable, try issuing a 'python -c "import sys; >>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command >>> >>> That's one of the downside of requiring no dependency at build time. >>> >>> Anyway you can override the default guess by setting the PYTHONSO config >>> option >>> to point to the desired python dll. >>> >>> By the way I've committed the doc in >>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected >>> online in a >>> few hours. >>> >>> >>> > I am on a >>> > 'inherited' PC right now, so final thing to do is to ensure that th
Re: [gdal-dev] VRT derived band pixel functions written in Python
excellent, working setting the PYTHONSO variable. Trying to match the version of python on the path would be smart, but it is probably 'good enough' to give PYTHONSO some prominence in the documentation - some kind of warning on how python is discovered perhaps? On 14 September 2016 at 20:20, James Ramm wrote: > Ah, that makes sense. I'll have to try the config option in the morning, > but it sounds like that could be it > > On 14 Sep 2016 8:06 p.m., "Even Rouault" > wrote: > >> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : >> > Yes, it is loading a different pythonThe path printed by sys.path is >> > different to if I open the command window and type: >> > >> > python >> > >> > >>> import sys >> > >>> print(sys.path) >> > >> > Gdal and the python bindings were compiled with vs2015 and python 3.5, >> and >> > I can correctly import python in the 3.5 interpreter, yet somehow a >> > different python DLL (2.7) is being loaded at runtime. >> >> Might be related to the default try order in >> >> const char* const apszPythonSO[] = { "python27.dll", >> "python26.dll", >> "python34.dll", >> "python35.dll", >> "python36.dll", >> "python33.dll", >> "python32.dll" }; >> >> First found, first served. >> >> Hum maybe we should try to match the version that issuing python on the >> command line would start. We could potentially look at the PATH to see if >> there's something like "bla:\pythonXX" and try the related .dll... Or more >> costly, but more reliable, try issuing a 'python -c "import sys; >> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command >> >> That's one of the downside of requiring no dependency at build time. >> >> Anyway you can override the default guess by setting the PYTHONSO config >> option >> to point to the desired python dll. >> >> By the way I've committed the doc in >> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online >> in a >> few hours. >> >> On 14 September 2016 at 20:20, James Ramm wrote: > Ah, that makes sense. I'll have to try the config option in the morning, > but it sounds like that could be it > > On 14 Sep 2016 8:06 p.m., "Even Rouault" > wrote: > >> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : >> > Yes, it is loading a different pythonThe path printed by sys.path is >> > different to if I open the command window and type: >> > >> > python >> > >> > >>> import sys >> > >>> print(sys.path) >> > >> > Gdal and the python bindings were compiled with vs2015 and python 3.5, >> and >> > I can correctly import python in the 3.5 interpreter, yet somehow a >> > different python DLL (2.7) is being loaded at runtime. >> >> Might be related to the default try order in >> >> const char* const apszPythonSO[] = { "python27.dll", >> "python26.dll", >> "python34.dll", >> "python35.dll", >> "python36.dll", >> "python33.dll", >> "python32.dll" }; >> >> First found, first served. >> >> Hum maybe we should try to match the version that issuing python on the >> command line would start. We could potentially look at the PATH to see if >> there's something like "bla:\pythonXX" and try the related .dll... Or more >> costly, but more reliable, try issuing a 'python -c "import sys; >> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command >> >> That's one of the downside of requiring no dependency at build time. >> >> Anyway you can override the default guess by setting the PYTHONSO config >> option >> to point to the desired python dll. >> >> By the way I've committed the doc in >> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online >> in a >> few hours. >> >> >> > I am on a >> > 'inherited' PC right now, so final thing to do is to ensure that the >> > gdal_translate I am running is the one I compiled and there isn't >> another >> > version lurking somewhere >> > >> > >> > On 14 September 2016 at 16:50, Even Rouault > > >> > >> > wrote: >> > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit : >> > > > Trying to run this using a function relying on scipy.ndimage... >> > > > >> > > > When running gdal_translate on the VRT, I get ImportError: No module >> > > >> > > named >> > > >> > > > scipy.ndimage >> > > > This comes after successfully import numpy. scipy.ndimage will >> happily >> > > > import within the python interpreter. >> > > >> > > Works for me for both inline or offline functions. >> > > >> > > Are you sure GDAL loads the same python lib
Re: [gdal-dev] VRT derived band pixel functions written in Python
Ah, that makes sense. I'll have to try the config option in the morning, but it sounds like that could be it On 14 Sep 2016 8:06 p.m., "Even Rouault" wrote: > Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : > > Yes, it is loading a different pythonThe path printed by sys.path is > > different to if I open the command window and type: > > > > python > > > > >>> import sys > > >>> print(sys.path) > > > > Gdal and the python bindings were compiled with vs2015 and python 3.5, > and > > I can correctly import python in the 3.5 interpreter, yet somehow a > > different python DLL (2.7) is being loaded at runtime. > > Might be related to the default try order in > > const char* const apszPythonSO[] = { "python27.dll", > "python26.dll", > "python34.dll", > "python35.dll", > "python36.dll", > "python33.dll", > "python32.dll" }; > > First found, first served. > > Hum maybe we should try to match the version that issuing python on the > command line would start. We could potentially look at the PATH to see if > there's something like "bla:\pythonXX" and try the related .dll... Or more > costly, but more reliable, try issuing a 'python -c "import sys; > print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command > > That's one of the downside of requiring no dependency at build time. > > Anyway you can override the default guess by setting the PYTHONSO config > option > to point to the desired python dll. > > By the way I've committed the doc in > https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online > in a > few hours. > > > > I am on a > > 'inherited' PC right now, so final thing to do is to ensure that the > > gdal_translate I am running is the one I compiled and there isn't another > > version lurking somewhere > > > > > > On 14 September 2016 at 16:50, Even Rouault > > > > wrote: > > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit : > > > > Trying to run this using a function relying on scipy.ndimage... > > > > > > > > When running gdal_translate on the VRT, I get ImportError: No module > > > > > > named > > > > > > > scipy.ndimage > > > > This comes after successfully import numpy. scipy.ndimage will > happily > > > > import within the python interpreter. > > > > > > Works for me for both inline or offline functions. > > > > > > Are you sure GDAL loads the same python lib as the python version used > in > > > the > > > python interpreter ? (check the debug traces with CPL_DEBUG=ON) > > > > > > You can also add at the top of your script > > > > > > import sys > > > print(sys.path) > > > > > > and check if the output points to a location where your scipy package > can > > > be > > > found. > > > > > > > Any tips on how to track this down/debug? > > > > > > > The entire VRT file is as follows: > > > I guess this is not the entire VRT since it refers to an inline > > > definition of > > > the script but has empty content. > > > > > > > > > > > > > > > PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB > > > > > > > > 1936",DATUM["OSGB_1936",SPHEROID["Airy > > > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG > > > > > > S84[446.448,-12 > > > > > > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627 > > > > > > 7"]],PRIMEM["Gr > > > > > > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532 > > > > > > 925199433,AUTHO > > > > > > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION[" > > > > > > Transverse_Merca > > > > > > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_ > > > > > > meridian",-2],P > > > > > > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti > > > > > > ng",40],PAR > > > > > > > AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY[" > > > > > > EPSG","9001"]],A > > > > > > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG", > > > > > > "27700"]] > > > > > > > 100180.0,5.0,0.0,1215730.0,0.0,-5.0 > > > > > > > subClass="VRTDerivedRasterBand"> > > > > > > > > > > > > > > > >> > > > > > > relativeToVrt="0">F:\tif_data\large_sparse.tif > > > > > > > >> > > > > > > RasterXSize="111090" RasterYSize="259376"/> > > > > > > > > > > > > > > > > 4 > > > > TRUE > > > > > > > > > > > > > > > > > > > > extract_blobs > > > > Python > > > > > > > > > > > > > > > > > > > > 5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived- > > > > > > band-pixel-functi > > > > > > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - > Dev > > > > mailing l
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : > Yes, it is loading a different pythonThe path printed by sys.path is > different to if I open the command window and type: > > python > > >>> import sys > >>> print(sys.path) > > Gdal and the python bindings were compiled with vs2015 and python 3.5, and > I can correctly import python in the 3.5 interpreter, yet somehow a > different python DLL (2.7) is being loaded at runtime. Might be related to the default try order in const char* const apszPythonSO[] = { "python27.dll", "python26.dll", "python34.dll", "python35.dll", "python36.dll", "python33.dll", "python32.dll" }; First found, first served. Hum maybe we should try to match the version that issuing python on the command line would start. We could potentially look at the PATH to see if there's something like "bla:\pythonXX" and try the related .dll... Or more costly, but more reliable, try issuing a 'python -c "import sys; print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command That's one of the downside of requiring no dependency at build time. Anyway you can override the default guess by setting the PYTHONSO config option to point to the desired python dll. By the way I've committed the doc in https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a few hours. > I am on a > 'inherited' PC right now, so final thing to do is to ensure that the > gdal_translate I am running is the one I compiled and there isn't another > version lurking somewhere > > > On 14 September 2016 at 16:50, Even Rouault > > wrote: > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit : > > > Trying to run this using a function relying on scipy.ndimage... > > > > > > When running gdal_translate on the VRT, I get ImportError: No module > > > > named > > > > > scipy.ndimage > > > This comes after successfully import numpy. scipy.ndimage will happily > > > import within the python interpreter. > > > > Works for me for both inline or offline functions. > > > > Are you sure GDAL loads the same python lib as the python version used in > > the > > python interpreter ? (check the debug traces with CPL_DEBUG=ON) > > > > You can also add at the top of your script > > > > import sys > > print(sys.path) > > > > and check if the output points to a location where your scipy package can > > be > > found. > > > > > Any tips on how to track this down/debug? > > > > > The entire VRT file is as follows: > > I guess this is not the entire VRT since it refers to an inline > > definition of > > the script but has empty content. > > > > > > > > > > > PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB > > > > > > 1936",DATUM["OSGB_1936",SPHEROID["Airy > > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG > > > > S84[446.448,-12 > > > > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627 > > > > 7"]],PRIMEM["Gr > > > > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532 > > > > 925199433,AUTHO > > > > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION[" > > > > Transverse_Merca > > > > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_ > > > > meridian",-2],P > > > > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti > > > > ng",40],PAR > > > > > AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY[" > > > > EPSG","9001"]],A > > > > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG", > > > > "27700"]] > > > > > 100180.0,5.0,0.0,1215730.0,0.0,-5.0 > > > > > subClass="VRTDerivedRasterBand"> > > > > > > > > > > > >> > > > > relativeToVrt="0">F:\tif_data\large_sparse.tif > > > > > >> > > > > RasterXSize="111090" RasterYSize="259376"/> > > > > > > > > > > > > 4 > > > TRUE > > > > > > > > > > > > > > > extract_blobs > > > Python > > > > > > > > > > > > > > > 5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > View this message in context: > > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived- > > > > band-pixel-functi > > > > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev > > > mailing list archive at Nabble.com. > > > ___ > > > gdal-dev mailing list > > > gdal-dev@lists.osgeo.org > > > http://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > -- > > Spatialys - Geospatial professional services > > http://www.spatialys.com -- Spatialys - Geospatial professional services http://www.spatialys.com _
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mercredi 14 septembre 2016 17:24:53, jramm a écrit : > Trying to run this using a function relying on scipy.ndimage... > > When running gdal_translate on the VRT, I get ImportError: No module named > scipy.ndimage > This comes after successfully import numpy. scipy.ndimage will happily > import within the python interpreter. Works for me for both inline or offline functions. Are you sure GDAL loads the same python lib as the python version used in the python interpreter ? (check the debug traces with CPL_DEBUG=ON) You can also add at the top of your script import sys print(sys.path) and check if the output points to a location where your scipy package can be found. > Any tips on how to track this down/debug? > > The entire VRT file is as follows: I guess this is not the entire VRT since it refers to an inline definition of the script but has empty content. > > > PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB > 1936",DATUM["OSGB_1936",SPHEROID["Airy > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-12 > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Gr > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHO > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Merca > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],P > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PAR > AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],A > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]] > 100180.0,5.0,0.0,1215730.0,0.0,-5.0 > subClass="VRTDerivedRasterBand"> > >relativeToVrt="0">F:\tif_data\large_sparse.tif >RasterXSize="111090" RasterYSize="259376"/> > > 4 > TRUE > > > extract_blobs > Python > > > 5 > > > > > > > > -- > View this message in context: > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev > mailing list archive at Nabble.com. > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Trying to run this using a function relying on scipy.ndimage... When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter. Any tips on how to track this down/debug? The entire VRT file is as follows: PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PARAMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]] 100180.0,5.0,0.0,1215730.0,0.0,-5.0 F:\tif_data\large_sparse.tif 4 TRUE extract_blobs Python 5 -- View this message in context: http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285883.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Trying to run this using a function relying on scipy.ndimage... When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter. Any tips on how to track this down/debug? The entire VRT file is as follows: PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PARAMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]] 100180.0,5.0,0.0,1215730.0,0.0,-5.0 F:\tif_data\large_sparse.tif 4 TRUE extract_blobs Python 5 -- View this message in context: http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mercredi 14 septembre 2016 09:02:07, Rutger a écrit : > Then I guess i'm mistaken in thinking that Python would become a slowdown > in such a case. It's been a while since i tested it. Anyway, thanks for > your comments and efforts to improve the performance. > > I couldnt find any Conda channels* which build from trunk, so i probably > have to wait a while before i can > try it out, something to look forward to. :) > > * https://anaconda.org/search?q=gdal You can also grab a nightly build at http://www.gisinternals.com/development.php > > > Regards, > Rutger > > > Even Rouault-2 wrote > > > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > > > > > With all that the cost of the Python layer becomes neglectable (except > > loading > > the Python environment the first time, if not already loaded, but for a > > computation that will be longer than a few seconds, that's not really a > > big > > deal) > > > > > > gdal-dev@.osgeo > > > > http://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- > View this message in context: > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi > ons-written-in-Python-tp5285323p5285730.html Sent from the GDAL - Dev > mailing list archive at Nabble.com. > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Then I guess i'm mistaken in thinking that Python would become a slowdown in such a case. It's been a while since i tested it. Anyway, thanks for your comments and efforts to improve the performance. I couldnt find any Conda channels* which build from trunk, so i probably have to wait a while before i can try it out, something to look forward to. :) * https://anaconda.org/search?q=gdal Regards, Rutger Even Rouault-2 wrote > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > > With all that the cost of the Python layer becomes neglectable (except > loading > the Python environment the first time, if not already loaded, but for a > computation that will be longer than a few seconds, that's not really a > big > deal) > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > ___ > gdal-dev mailing list > gdal-dev@.osgeo > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- View this message in context: http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285730.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mardi 13 septembre 2016 21:22:20, James Ramm a écrit : > I think you can call SWIG with the -threads argument on the command line so > it will always release the GIL. Could be an easy option if it works That's mostly what I've done. See my other message : https://lists.osgeo.org/pipermail/gdal-dev/2016-September/045155.html > > On Tuesday, 13 September 2016, Even Rouault > > wrote: > > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > > I overlooked the fact that it still moves through Python, is that the > > > 'only' hurdle preventing parallel IO? > > > > Not sure to understand your question. But if you have several sources, > > you could potentially do parallelized reading of them from the Python > > code by using Python threads and GDAL Python API. But looking in the > > SWIG generated code, it doesn't seem that SWIG releases the GIL > > automatically before calling > > native code. Hum... So that should probably added manually, at least > > around GDALRasterIO() calls, otherwise you'll get zero perf > > improvements. > > > > > Since gdalwarp for example has the > > > -multi flag, it seems as if GDAL is capable of it, or is that a > > > specific/specialized implementation? > > > > Parallelized I/O doesn't mean much by itself without more context. You > > may want to parallelize reading of different regions of the same > > dataset, or parallelize reading of different datasets. Due to GDAL > > objects not being thread-safe, the first case (reading of different > > regions of the same dataset) > > can be solved with the second one by opening several datasets for the > > same filename. > > > > Regarding gdalwarp -multi, here's how that works. When you warp a > > dataset, there's a list of all chunks (windows) to be processed that is > > generated. gdalwarp -multi does the following > > > > Thread I/O Thread > > computation Read data for chunk 1 > > Read data for chunk 2 Do calculations for chunk 1 > > Write output of chunk 1 Do calculations for chunk 2 > > Read data for chunk 3 > > Write output of chunk 2 Do calculations for chunk 3 > > > > > Numba has several options which might eliminate using Python during > > > execution. There are c-callbacks: > > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html > > > > You can also use @jit(nopython=True, nogil=True) and your Python method > > will > > end up being pure native code (provided that you don't use too high level > > stuff > > otherwise the jit'ification will fail with an exception). > > > > And for code that is not inlined in the VRT, you can also add cache=True > > so that the jit'ification can be reused. > > > > With all that the cost of the Python layer becomes neglectable (except > > loading > > the Python environment the first time, if not already loaded, but for a > > computation that will be longer than a few seconds, that's not really a > > big deal) > > > > -- > > Spatialys - Geospatial professional services > > http://www.spatialys.com > > ___ > > gdal-dev mailing list > > gdal-dev@lists.osgeo.org > > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
I think you can call SWIG with the -threads argument on the command line so it will always release the GIL. Could be an easy option if it works On Tuesday, 13 September 2016, Even Rouault wrote: > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > I overlooked the fact that it still moves through Python, is that the > > 'only' hurdle preventing parallel IO? > > Not sure to understand your question. But if you have several sources, you > could potentially do parallelized reading of them from the Python code by > using Python threads and GDAL Python API. But looking in the SWIG generated > code, it doesn't seem that SWIG releases the GIL automatically before > calling > native code. Hum... So that should probably added manually, at least around > GDALRasterIO() calls, otherwise you'll get zero perf improvements. > > > Since gdalwarp for example has the > > -multi flag, it seems as if GDAL is capable of it, or is that a > > specific/specialized implementation? > > Parallelized I/O doesn't mean much by itself without more context. You may > want to parallelize reading of different regions of the same dataset, or > parallelize reading of different datasets. Due to GDAL objects not being > thread-safe, the first case (reading of different regions of the same > dataset) > can be solved with the second one by opening several datasets for the same > filename. > > Regarding gdalwarp -multi, here's how that works. When you warp a dataset, > there's a list of all chunks (windows) to be processed that is generated. > gdalwarp -multi does the following > > Thread I/O Thread computation > Read data for chunk 1 > Read data for chunk 2 Do calculations for chunk 1 > Write output of chunk 1 Do calculations for chunk 2 > Read data for chunk 3 > Write output of chunk 2 Do calculations for chunk 3 > > > > > > Numba has several options which might eliminate using Python during > > execution. There are c-callbacks: > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html > > You can also use @jit(nopython=True, nogil=True) and your Python method > will > end up being pure native code (provided that you don't use too high level > stuff > otherwise the jit'ification will fail with an exception). > > And for code that is not inlined in the VRT, you can also add cache=True so > that the jit'ification can be reused. > > With all that the cost of the Python layer becomes neglectable (except > loading > the Python environment the first time, if not already loaded, but for a > computation that will be longer than a few seconds, that's not really a big > deal) > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > I overlooked the fact that it still moves through Python, is that the > 'only' hurdle preventing parallel IO? Not sure to understand your question. But if you have several sources, you could potentially do parallelized reading of them from the Python code by using Python threads and GDAL Python API. But looking in the SWIG generated code, it doesn't seem that SWIG releases the GIL automatically before calling native code. Hum... So that should probably added manually, at least around GDALRasterIO() calls, otherwise you'll get zero perf improvements. > Since gdalwarp for example has the > -multi flag, it seems as if GDAL is capable of it, or is that a > specific/specialized implementation? Parallelized I/O doesn't mean much by itself without more context. You may want to parallelize reading of different regions of the same dataset, or parallelize reading of different datasets. Due to GDAL objects not being thread-safe, the first case (reading of different regions of the same dataset) can be solved with the second one by opening several datasets for the same filename. Regarding gdalwarp -multi, here's how that works. When you warp a dataset, there's a list of all chunks (windows) to be processed that is generated. gdalwarp -multi does the following Thread I/O Thread computation Read data for chunk 1 Read data for chunk 2 Do calculations for chunk 1 Write output of chunk 1 Do calculations for chunk 2 Read data for chunk 3 Write output of chunk 2 Do calculations for chunk 3 > > Numba has several options which might eliminate using Python during > execution. There are c-callbacks: > http://numba.pydata.org/numba-doc/dev/user/cfunc.html You can also use @jit(nopython=True, nogil=True) and your Python method will end up being pure native code (provided that you don't use too high level stuff otherwise the jit'ification will fail with an exception). And for code that is not inlined in the VRT, you can also add cache=True so that the jit'ification can be reused. With all that the cost of the Python layer becomes neglectable (except loading the Python environment the first time, if not already loaded, but for a computation that will be longer than a few seconds, that's not really a big deal) -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
I overlooked the fact that it still moves through Python, is that the 'only' hurdle preventing parallel IO? Since gdalwarp for example has the -multi flag, it seems as if GDAL is capable of it, or is that a specific/specialized implementation? Numba has several options which might eliminate using Python during execution. There are c-callbacks: http://numba.pydata.org/numba-doc/dev/user/cfunc.html It probably works by using the Numpy C-API (which i have zero experience with). I don't know if its possible that other programs, like GDAL, can also use those compiled functions without moving to Python first. There is also ahead-of-time compilation (AOT): http://numba.pydata.org/numba-doc/dev/user/pycc.html AOT has the benefit that users only need Numpy as a dependency, and don't need Numba/llvm. There some drawbacks as well, like no longer being able to compile optimizations for the hardware its running on. Regards, Rutger Even Rouault-2 wrote > Le mardi 13 septembre 2016 09:02:09, Rutger a écrit : > > I've only scratched up the surface of Numba (didn't know it 2 days ago). I > guess this might work (might only be interested if the > VRTDerivedRasterBand::IRasterIO() is called with a big enough region. > Which > depends on the pixel access pattern of upper layers). The VRT drivers just > calls a Python function that takes numpy arrays and a few extra args. > >> And would that give parallel >> processing for both IO and calculations? > > Only calculations. I/O is done before going to Python and after returning > from > it. > > Actually... if you specify zero source in the VRT, then it is up to you to > do > the read operations the way you like in Python, so you could possibly > parallelize them there. > > > Even > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > ___ > gdal-dev mailing list > gdal-dev@.osgeo > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- View this message in context: http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285482.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit : > Hi Even, > > This is really amazing. I'm becoming more and more a fan of Numba for > number crunching, so this certainly makes my day. As soon as i can find a > Win x64 dev version on a Conda channel I'll give it a try. > > A use case that comes to mind, and which i run into regularly, is when i > want to do some simple aggregation before using something like gdalwarp. > For example when you have a file containing 24 hourly temperature values, > and you are only interested in the daily mean. Currently i either > aggregate before warping and write the intermediates to disk, or aggregate > after warping which is computationally inefficient. Neither is optimal. > > A few questions, can you access a files metadata from within a pixel > function? This would perhaps allow for example interpolating atmospheric > data to the overpass time of a satellite image. This is indeed a question I've asked myself, if the prototype of the pixel function contained enough information. I had not put initially the geotransform and whole raster dimensions for example, but found it was necessary to have correct behaviour for hillshading at different zoom scales. And I also added afterwards the xoff, yoff to be able to do Mandelbrot generation (I guess there might be some more valid use cases ;-)). And then the user provided dictionnary to have some parametrization of the algorithm instead of hardcoding constants in it, so that you can off-load it into a general lib. And I stopped at that point. For your use case, 2 possibilities currently, using the capability: - either you get the metadata items you need at VRT generation and put them as arguments - either you only pass the names of the sources as arguments, and you use the GDAL Python bindings themselves inside the pixel function itself to get access to everything needed. But I'm just thinking it might be a bit inefficient to do that for each RasterIO() request. Perhaps a compromise would be to allow, in addition to simple functions, to specify a class name, where the constructor would receive values that don't change from one call to another one, and a calc() method would receive the ones that change at each RasterIO() request def MyProcessingClass: def __init__(self, raster_xsize, raster_ysize, buf_radius, gt, source_filenames, **kwargs): save above parameters that you may need in calc() do one time things like gdal.Open()'ing sources to get metadata def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ): do your computation We could possibly pass the sources as datasets themselves, since they are actually already opened, but that would make the osgeo.gdal bindings availability a requirement (well, we could as a fallback pass the filenames if the import fails) > > Do the pixel functions also work with @numba.vectorize(), in particular > when targeting 'parallel' or 'cuda'. I've only scratched up the surface of Numba (didn't know it 2 days ago). I guess this might work (might only be interested if the VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which depends on the pixel access pattern of upper layers). The VRT drivers just calls a Python function that takes numpy arrays and a few extra args. > And would that give parallel > processing for both IO and calculations? Only calculations. I/O is done before going to Python and after returning from it. Actually... if you specify zero source in the VRT, then it is up to you to do the read operations the way you like in Python, so you could possibly parallelize them there. >You should give the folks at Continuum a heads up, i'm sure they appreciate > seeing Numba used like this. I have no connection with them, but yes their project rocks. Even -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le lundi 12 septembre 2016 23:38:41, James Ramm a écrit : > Incredible > Really cool initiative. > > I had a play around and wanted a way to be able to write the pixel function > in a 'stand alone' python module and generate the VRT from it. This would > allow independent testing and easier maintenance of the python code. It is > fairly easy using lxml to build up the VRT dynamically: > > https://github.com/JamesRamm/gdal_pixel_functions > > In the example I more-or-less recreate the hillshade example Even posted, > although there is a lot of functionality missing from the VRT-builder > still, so some nodes are not present. > It would be quite easy therefore, to expand this to a sort of GDAL > 'algorithms library' (I think it can wholesale replace what I had tried > here: https://github.com/JamesRamm/GeoAlg). a "-of VRT" option in current gdal_calc.py could also be cool (at least for simple one line operations) Programmatic creation of VRT derived band, with the previous scope (ie calling a native registered pixel function), already existed. See https://github.com/OSGeo/gdal/blob/trunk/autotest/gdrivers/vrtderived.py#L175 I didn't extend for now the VRTDataset::AddBand implementation to set the new elements needed for the Python stuff, but that would certainly be doable. > > One thing I would note (if I understand correctly) is that you have > specified any "import" statement as unsafe except numpy/math and numba? It > might be better to either allow imports or not allow them at all (unless > the config option is set) as I don't see why those libraries would be any > more or any less 'safe' than another library...e.g. another standard > library module. Well, the idea was that those modules should only do math stuff and not interact with the rest of the system (contrary to 'os', 'system' and the like) But is in the numpy case, it is not actually true since it has I/O methods, hence my naive attempt to blacklist those by names. > > I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the > users discretion. That's what I've done (see previous message) > Many users who are working with their own local data will > likely have no problems about just setting it to 'YES' and I imagine users > for whom it would be a probably might already have their own rules on what > constitutes 'safe'? > > On a sidenote - does this new functionality overlap considerably with the > gdal map algebra project? Indeed. I think there might be several incarnations of such capability and I'm not sure we can come up with a single one that will satisfy all needs and constraints. The VRT way has some use cases where it shines (on-the-fly computation of a region of interest. easy integration with upper levels in the stack: QGIS, MapServer, etc...), whereas you could imagine something else efficient in doing whole raster processing with parallelism etc... -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Hi Even, This is really amazing. I'm becoming more and more a fan of Numba for number crunching, so this certainly makes my day. As soon as i can find a Win x64 dev version on a Conda channel I'll give it a try. A use case that comes to mind, and which i run into regularly, is when i want to do some simple aggregation before using something like gdalwarp. For example when you have a file containing 24 hourly temperature values, and you are only interested in the daily mean. Currently i either aggregate before warping and write the intermediates to disk, or aggregate after warping which is computationally inefficient. Neither is optimal. A few questions, can you access a files metadata from within a pixel function? This would perhaps allow for example interpolating atmospheric data to the overpass time of a satellite image. Do the pixel functions also work with @numba.vectorize(), in particular when targeting 'parallel' or 'cuda'. And would that give parallel processing for both IO and calculations? You should give the folks at Continuum a heads up, i'm sure they appreciate seeing Numba used like this. Regards, Rutger -- View this message in context: http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285450.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Incredible Really cool initiative. I had a play around and wanted a way to be able to write the pixel function in a 'stand alone' python module and generate the VRT from it. This would allow independent testing and easier maintenance of the python code. It is fairly easy using lxml to build up the VRT dynamically: https://github.com/JamesRamm/gdal_pixel_functions In the example I more-or-less recreate the hillshade example Even posted, although there is a lot of functionality missing from the VRT-builder still, so some nodes are not present. It would be quite easy therefore, to expand this to a sort of GDAL 'algorithms library' (I think it can wholesale replace what I had tried here: https://github.com/JamesRamm/GeoAlg). One thing I would note (if I understand correctly) is that you have specified any "import" statement as unsafe except numpy/math and numba? It might be better to either allow imports or not allow them at all (unless the config option is set) as I don't see why those libraries would be any more or any less 'safe' than another library...e.g. another standard library module. I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the users discretion. Many users who are working with their own local data will likely have no problems about just setting it to 'YES' and I imagine users for whom it would be a probably might already have their own rules on what constitutes 'safe'? On a sidenote - does this new functionality overlap considerably with the gdal map algebra project? On 12 September 2016 at 19:25, Even Rouault wrote: > > I found http://nedbatchelder.com/blog/201206/eval_really_is_ > dangerous.html > > to be a good intro to the risks of eval'ing untrusted Python code. > > Mentioned in there is a notable attempt to make a secure subset of Python > > called "pysandbox", but its developer has since declared it "broken by > > design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough > > about sandboxing (OS or otherwise) to say if that's right. > > Those links are fascinating. I wouldn't have imagined that there was valid > Python code that could crash the Python interpreter, almost by design ! (so > the osgeo.gdal gotchas are not so bad after all :-) ) > OK, so I've followed your suggestion: > * IF_SAFE mode is removed (actually between #if 0 / #endif if someone > wants to > pursue the sandboxing effort), > * and the default of GDAL_VRT_ENABLE_PYTHON is now NO. > > > > > I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it > > possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the > > reader's own trust policies? > > No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, > so can > only be set as a env variable or through CPLSetConfigOption(). > > We could imagine to provide it as an option of the VRT indeed, but then > there > would be interesting situations like the following. Imagine something > called > "my.tif" with the following content : > > > > Gray > > > > YES > > > > > > The internalized VRT would then be allowed with ENABLE_PYTHON=YES. > > So it's probably a good idea to not provide an open option for now. One > could > perhaps imagine to introduce one, provided we make it illegal as an item in > , so it can only be provided by "top-level" GDALOpenEx() > calls. > > > My ignorance of how GDAL separates "open" > > options from "config" options might be on display in this question. > > They are just 2 different mechanisms of providing options. There's no > automatic > bridge between both concepts. Sometimes some option may exist in both > worlds > (because historically there was only global config option, but in some use > cases, some options mostly make sense per dataset) or in just one of them. > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
> I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html > to be a good intro to the risks of eval'ing untrusted Python code. > Mentioned in there is a notable attempt to make a secure subset of Python > called "pysandbox", but its developer has since declared it "broken by > design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough > about sandboxing (OS or otherwise) to say if that's right. Those links are fascinating. I wouldn't have imagined that there was valid Python code that could crash the Python interpreter, almost by design ! (so the osgeo.gdal gotchas are not so bad after all :-) ) OK, so I've followed your suggestion: * IF_SAFE mode is removed (actually between #if 0 / #endif if someone wants to pursue the sandboxing effort), * and the default of GDAL_VRT_ENABLE_PYTHON is now NO. > > I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it > possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the > reader's own trust policies? No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, so can only be set as a env variable or through CPLSetConfigOption(). We could imagine to provide it as an option of the VRT indeed, but then there would be interesting situations like the following. Imagine something called "my.tif" with the following content : Gray YES The internalized VRT would then be allowed with ENABLE_PYTHON=YES. So it's probably a good idea to not provide an open option for now. One could perhaps imagine to introduce one, provided we make it illegal as an item in , so it can only be provided by "top-level" GDALOpenEx() calls. > My ignorance of how GDAL separates "open" > options from "config" options might be on display in this question. They are just 2 different mechanisms of providing options. There's no automatic bridge between both concepts. Sometimes some option may exist in both worlds (because historically there was only global config option, but in some use cases, some options mostly make sense per dataset) or in just one of them. -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Le lundi 12 septembre 2016 16:59:34, Poughon Victor a écrit : > Hi Even, > > This is a really cool and really impressive feature! > Calling Python code from C++ without development packages as a dependency > sounds like black magic to me. No black magic. Just setting function pointers using dlopen() : https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L390 > Obviously Python symbols have to be there > at some point to execute Python code, so this is only usable from a binary > that happens to load them already (CPython, QGIS, etc.), correct? No, there are 2 modes: - GDAL is loaded after something else has already loaded python (so your CPython, QGIS) - GDAL cannot find any python symbols in the already available symbols of the process, in which case it wil try to load a few reasonable shared objects like "libpython2.7.so", etc... See https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L238 and below lines > But I'm confused because later you mention > incompatibilities issues between the CPython and Pypy API. GDAL's secret > sauce, I guess...? Nothing GDAL specific here. It is just that the way to embed PyPy from C is completely different from CPython : http://doc.pypy.org/en/latest/embedding.html vs https://docs.python.org/2/extending/embedding.html > I'm also curious why it's possible to use numba but no > pypy, which AFAIK are both python JITs? numba is "just" a CPython extension (a non-trivial one dragging llvmlite etc...), so once you support CPython, it comes for free to use it, provided you use the right import and function decorations in your python code. > And finally did you consider using > Cython (which claims pypy compatibility)? No, I didn't investigate that. Even -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] VRT derived band pixel functions written in Python
Hi Even, This is a really cool and really impressive feature! Calling Python code from C++ without development packages as a dependency sounds like black magic to me. Obviously Python symbols have to be there at some point to execute Python code, so this is only usable from a binary that happens to load them already (CPython, QGIS, etc.), correct? In particular you say that it is "done at run-time by dynamically loading Python symbols". But I'm confused because later you mention incompatibilities issues between the CPython and Pypy API. GDAL's secret sauce, I guess...? I'm also curious why it's possible to use numba but no pypy, which AFAIK are both python JITs? And finally did you consider using Cython (which claims pypy compatibility)? Cheers, Victor Poughon De : gdal-dev [gdal-dev-boun...@lists.osgeo.org] de la part de Even Rouault [even.roua...@spatialys.com] Envoyé : lundi 12 septembre 2016 14:31 À : gdal-dev@lists.osgeo.org Objet : [gdal-dev] VRT derived band pixel functions written in Python Hi, I wanted to mention a new (and I think pretty cool ) feature I've added in trunk: the possibility to define pixel functions in Python in a VRT derived band. Let's start with a simple example to multiply a raster by 1.5 : EPSG:26711 440720,60,0,3751320,0,-60 multiply Python byte.tif or to add 2 (or more) rasters: EPSG:26711 440720,60,0,3751320,0,-60 add Python byte.tif byte2.tif You can put any python module code inside PixelFunctionCode, with at least one function with the following arguments : - in_ar: list of input numpy arrays (one numpy array for each source) - out_ar: output numpy array to fill (initialized at the right dimensions and with the VRTRasterBand.dataType) - xoff: pixel offset to the top left corner of the accessed region of the band - yoff: line offset to the top left corner of the accessed region of the band - xsize: width of the region of the accessed region of the band - ysize: height of the region of the accessed region of the band - raster_xsize: total with of the raster band - raster_ysize: total height of the raster band - buf_radius: radius of the buffer (in pixels) added to the left, right, top and bottom of in_ar / out_ar - gt: geotransform - kwargs: dictionnary with user arguments defined in PixelFunctionArguments For basic operations, you just need to care about in_ar and out_ar. With all that, you can do interesting stuff like hillshading (code ported from gdaldem): EPSG:4326 -80.0041663,0.008,0, 44.0041663,0,-0.008 Gray n43.dt0 Python hillshade 1 Int16 You can completely offload the python code itself into a proper my_lib.py file and just specify my_lib.hillshade Technically, the interfacing with Python is done at run-time by dynamically loading Python symbols with dlopen()/GetProcAddress(), when they are already available in the process. For example if libgdal is loaded from a Python interpreter, or from a binary like QGIS which has already loaded the Python lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll", "libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is specified to point to a precise filename. The advantage of this approach is that the same GDAL library binary is compatible of all Python 2.X (tested: 2.6, 2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in the Python environment used (at compilation time you don't need any python/numpy development package). The numpy dependency is not a critical one: one could imagine a fallback mode where Python arrays would be used instead, but this has likely little interest. Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking of the predefined set of shared object names - that are probed when no already loaded Python environment is found - might be needed. The implementation should be thread-safe regarding use of the Python Global Interpreter Lock (GIL). I've also tested with the numba Just-In-Time compiler (http://numba.pydata.org/) and it provides major performance improvements for highly computational code (example given below). With numba enabled, I found that my above Python hillshading on a 10Kx10K float32 raster was even faster than gdaldem (the reason is that the Python version is a bit simplified as it doesn't take into account input nodata values), whereas the non-jit'ed one is 100x slower. When removing the nodata flag, gdaldem is only twice faster as the jit'ed python code. So this is a good sign that such approach isn't only a toy or just for prototyping. Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would require a new backend as PyPy C API has nothing to do with the CPython one. There are obvious security c
Re: [gdal-dev] VRT derived band pixel functions written in Python
Hi Even, On Mon, Sep 12, 2016 at 2:31 PM, Even Rouault wrote: > Hi, > > I wanted to mention a new (and I think pretty cool ) feature I've added in > trunk: the possibility to define pixel functions in Python in a VRT derived > band. > > ... > > There are obvious security concerns in allowing Python code to be run when > getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config > option = > IF_SAFE / NO / YES can be set to control the behaviour. The default is > IF_SAFE > (can be change at compilation time by defining > -DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can > be > completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be > understood > as: the code will not read, write, delete... files, spawn external code, do > network activity, etc. Said otherwise, the code will only do "maths". But > infinite looping is something definitely possible in the safe mode. The > heuristics of the IF_SAFE mode is rather basic and I'd be grateful if > people > could point ways of breaking it. If any of the following strings - "import" > (unless it is "import numpy" / "from numpy import ...", "import math" / > "from > math import ..." or "from numba import jit"), "eval", "compile", "open", > "load", "file", "input", "save", "memmap", "DataSource", "genfromtxt", > "getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere > in > the code, then the code is considered unsafe (there are interestingly a > lot of > methods in numpy to do file I/O. Hopefully I've captured them with the > previous > filters). Another 'unsafe' pattern is when the pixel function references an > external module like my above my_lib.hillshade example (who knows if there > will not be some day a shutil.reformat_your_hard_drive function with the > right > prototype...) > > This new capability isn't yet documented in the VRT doc, although this > message > will be a start. > > I'm interested in feedback you may have. > I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html to be a good intro to the risks of eval'ing untrusted Python code. Mentioned in there is a notable attempt to make a secure subset of Python called "pysandbox", but its developer has since declared it "broken by design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough about sandboxing (OS or otherwise) to say if that's right. I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the reader's own trust policies? My ignorance of how GDAL separates "open" options from "config" options might be on display in this question. My $.02 is that since "is safe" will be hard to guarantee (it's an outstanding unsolved Python community issue), removing "IF_SAFE" from the options would be a good thing and that the default for GDAL_VRT_ENABLE_PYTHON should be "NO". -- Sean Gillies ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev