Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-15 Thread James Ramm
I agree...could be slightly inelegant to figure out what version of python
is used though...A system call to `where python` on windows would return
the interpreter on the path, and you could replace the '.exe' with '.dll'
(this is assuming python dll is always alongside the exe, but i think this
is the case on all windows distributions), or Evens' suggest by calling
python.
I cant think of an (easy) way to do it without a system call though.

On 15 September 2016 at 14:56, Sean Gillies  wrote:

> Hi Even, James:
>
> I suspect users will be find it very surprising if `which python` and the
> python interpreter used by VRT are not the same. Imagine starting python in
> an environment created by virtualenv or conda that contains extension
> modules like scikit-image, scikit-learn, &c. You import osgeo.gdal, open
> your super duper VRT with `gdal.Open()`, and get an error because your
> environment's extension modules aren't available in the VRT script. This
> could be tough to debug for a beginning to intermediate level Python
> programmer.
>
>
> On Thu, Sep 15, 2016 at 10:29 AM, James Ramm  wrote:
>
>> excellent, working setting the PYTHONSO variable. Trying to match the
>> version of python on the path would be smart, but it is probably 'good
>> enough' to give PYTHONSO some prominence in the documentation - some kind
>> of warning on how python is discovered perhaps?
>>
>> On 14 September 2016 at 20:20, James Ramm  wrote:
>>
>>> Ah, that makes sense. I'll have to try the config option in the morning,
>>> but it sounds like that could be it
>>>
>>> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
>>> wrote:
>>>
 Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
 > Yes, it is loading a different pythonThe path printed by sys.path
 is
 > different to if I open the command window and type:
 >
 > python
 >
 > >>> import sys
 > >>> print(sys.path)
 >
 > Gdal and the python bindings were compiled with vs2015 and python
 3.5, and
 > I can correctly import python in the 3.5 interpreter, yet somehow a
 > different python DLL (2.7) is being loaded at runtime.

 Might be related to the default try order in

 const char* const apszPythonSO[] = { "python27.dll",
 "python26.dll",
 "python34.dll",
 "python35.dll",
 "python36.dll",
 "python33.dll",
 "python32.dll" };

 First found, first served.

 Hum maybe we should try to match the version that issuing python on the
 command line would start. We could potentially look at the PATH to see
 if
 there's something like "bla:\pythonXX" and try the related .dll... Or
 more
 costly, but more reliable, try issuing a 'python -c "import sys;
 print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command

 That's one of the downside of requiring no dependency at build time.

 Anyway you can override the default guess by setting the PYTHONSO
 config option
 to point to the desired python dll.

 By the way I've committed the doc in
 https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected
 online in a
 few hours.


>> On 14 September 2016 at 20:20, James Ramm  wrote:
>>
>>> Ah, that makes sense. I'll have to try the config option in the morning,
>>> but it sounds like that could be it
>>>
>>> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
>>> wrote:
>>>
 Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
 > Yes, it is loading a different pythonThe path printed by sys.path
 is
 > different to if I open the command window and type:
 >
 > python
 >
 > >>> import sys
 > >>> print(sys.path)
 >
 > Gdal and the python bindings were compiled with vs2015 and python
 3.5, and
 > I can correctly import python in the 3.5 interpreter, yet somehow a
 > different python DLL (2.7) is being loaded at runtime.

 Might be related to the default try order in

 const char* const apszPythonSO[] = { "python27.dll",
 "python26.dll",
 "python34.dll",
 "python35.dll",
 "python36.dll",
 "python33.dll",
 "python32.dll" };

 First found, first served.

 Hum maybe we should try to match the version that issuing python on the
 command line would start. We could potentially look at the PATH to see
 if
 there's something like "bla:\pythonXX" and

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-15 Thread Even Rouault
Sean,

> I suspect users will be find it very surprising if `which python` and the
> python interpreter used by VRT are not the same

Yes, we should probably make some effort to use the same python version as the 
one that comes as default given the PATH. I've just opened 
https://trac.osgeo.org/gdal/ticket/6652 regarding this.

> . Imagine starting python in
> an environment created by virtualenv or conda that contains extension
> modules like scikit-image, scikit-learn, &c. You import osgeo.gdal,

or rasterio ;-)

> open
> your super duper VRT with `gdal.Open()`, and get an error because your
> environment's extension modules aren't available in the VRT script. This
> could be tough to debug for a beginning to intermediate level Python
> programmer.

In the situation you describe here, things will work as expected. External 
"libpythonXX.so" or the one specified by PYTHONSO are only tried *after* 
checking if the process has already the Python symbols loaded.

Quoting
http://gdal.org/gdal_vrttut.html#gdal_vrttut_derived_python :

"When GDAL will need to run Python code, it will first determine if the Python 
interpreter is loaded in the current process (which is the case if the program 
is a Python interpreter itself, or if another program, e.g. QGIS, has already 
loaded the CPython library). Otherwise it will look if the PYTHONSO 
configuration option is defined. [...] If the PYTHONSO configuration option is 
not defined, then a predefined list of shared objects will be tried [...]"

This order is strictly needed and was not what I implemented first. My first 
attempt honoured PYTHONSO first, but then if you did things like 
"PYTHONSO=libpython3.1.so python2.7 read_my_vrt.py", this crashed due to 
conflicting symbols being imported.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-15 Thread Sean Gillies
Hi Even, James:

I suspect users will be find it very surprising if `which python` and the
python interpreter used by VRT are not the same. Imagine starting python in
an environment created by virtualenv or conda that contains extension
modules like scikit-image, scikit-learn, &c. You import osgeo.gdal, open
your super duper VRT with `gdal.Open()`, and get an error because your
environment's extension modules aren't available in the VRT script. This
could be tough to debug for a beginning to intermediate level Python
programmer.


On Thu, Sep 15, 2016 at 10:29 AM, James Ramm  wrote:

> excellent, working setting the PYTHONSO variable. Trying to match the
> version of python on the path would be smart, but it is probably 'good
> enough' to give PYTHONSO some prominence in the documentation - some kind
> of warning on how python is discovered perhaps?
>
> On 14 September 2016 at 20:20, James Ramm  wrote:
>
>> Ah, that makes sense. I'll have to try the config option in the morning,
>> but it sounds like that could be it
>>
>> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
>> wrote:
>>
>>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
>>> > Yes, it is loading a different pythonThe path printed by sys.path
>>> is
>>> > different to if I open the command window and type:
>>> >
>>> > python
>>> >
>>> > >>> import sys
>>> > >>> print(sys.path)
>>> >
>>> > Gdal and the python bindings were compiled with vs2015 and python 3.5,
>>> and
>>> > I can correctly import python in the 3.5 interpreter, yet somehow a
>>> > different python DLL (2.7) is being loaded at runtime.
>>>
>>> Might be related to the default try order in
>>>
>>> const char* const apszPythonSO[] = { "python27.dll",
>>> "python26.dll",
>>> "python34.dll",
>>> "python35.dll",
>>> "python36.dll",
>>> "python33.dll",
>>> "python32.dll" };
>>>
>>> First found, first served.
>>>
>>> Hum maybe we should try to match the version that issuing python on the
>>> command line would start. We could potentially look at the PATH to see if
>>> there's something like "bla:\pythonXX" and try the related .dll... Or
>>> more
>>> costly, but more reliable, try issuing a 'python -c "import sys;
>>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command
>>>
>>> That's one of the downside of requiring no dependency at build time.
>>>
>>> Anyway you can override the default guess by setting the PYTHONSO config
>>> option
>>> to point to the desired python dll.
>>>
>>> By the way I've committed the doc in
>>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected
>>> online in a
>>> few hours.
>>>
>>>
> On 14 September 2016 at 20:20, James Ramm  wrote:
>
>> Ah, that makes sense. I'll have to try the config option in the morning,
>> but it sounds like that could be it
>>
>> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
>> wrote:
>>
>>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
>>> > Yes, it is loading a different pythonThe path printed by sys.path
>>> is
>>> > different to if I open the command window and type:
>>> >
>>> > python
>>> >
>>> > >>> import sys
>>> > >>> print(sys.path)
>>> >
>>> > Gdal and the python bindings were compiled with vs2015 and python 3.5,
>>> and
>>> > I can correctly import python in the 3.5 interpreter, yet somehow a
>>> > different python DLL (2.7) is being loaded at runtime.
>>>
>>> Might be related to the default try order in
>>>
>>> const char* const apszPythonSO[] = { "python27.dll",
>>> "python26.dll",
>>> "python34.dll",
>>> "python35.dll",
>>> "python36.dll",
>>> "python33.dll",
>>> "python32.dll" };
>>>
>>> First found, first served.
>>>
>>> Hum maybe we should try to match the version that issuing python on the
>>> command line would start. We could potentially look at the PATH to see if
>>> there's something like "bla:\pythonXX" and try the related .dll... Or
>>> more
>>> costly, but more reliable, try issuing a 'python -c "import sys;
>>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command
>>>
>>> That's one of the downside of requiring no dependency at build time.
>>>
>>> Anyway you can override the default guess by setting the PYTHONSO config
>>> option
>>> to point to the desired python dll.
>>>
>>> By the way I've committed the doc in
>>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected
>>> online in a
>>> few hours.
>>>
>>>
>>> > I am on a
>>> > 'inherited' PC right now, so final thing to do is to ensure that th

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-15 Thread James Ramm
excellent, working setting the PYTHONSO variable. Trying to match the
version of python on the path would be smart, but it is probably 'good
enough' to give PYTHONSO some prominence in the documentation - some kind
of warning on how python is discovered perhaps?

On 14 September 2016 at 20:20, James Ramm  wrote:

> Ah, that makes sense. I'll have to try the config option in the morning,
> but it sounds like that could be it
>
> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
> wrote:
>
>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
>> > Yes, it is loading a different pythonThe path printed by sys.path is
>> > different to if I open the command window and type:
>> >
>> > python
>> >
>> > >>> import sys
>> > >>> print(sys.path)
>> >
>> > Gdal and the python bindings were compiled with vs2015 and python 3.5,
>> and
>> > I can correctly import python in the 3.5 interpreter, yet somehow a
>> > different python DLL (2.7) is being loaded at runtime.
>>
>> Might be related to the default try order in
>>
>> const char* const apszPythonSO[] = { "python27.dll",
>> "python26.dll",
>> "python34.dll",
>> "python35.dll",
>> "python36.dll",
>> "python33.dll",
>> "python32.dll" };
>>
>> First found, first served.
>>
>> Hum maybe we should try to match the version that issuing python on the
>> command line would start. We could potentially look at the PATH to see if
>> there's something like "bla:\pythonXX" and try the related .dll... Or more
>> costly, but more reliable, try issuing a 'python -c "import sys;
>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command
>>
>> That's one of the downside of requiring no dependency at build time.
>>
>> Anyway you can override the default guess by setting the PYTHONSO config
>> option
>> to point to the desired python dll.
>>
>> By the way I've committed the doc in
>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online
>> in a
>> few hours.
>>
>>
On 14 September 2016 at 20:20, James Ramm  wrote:

> Ah, that makes sense. I'll have to try the config option in the morning,
> but it sounds like that could be it
>
> On 14 Sep 2016 8:06 p.m., "Even Rouault" 
> wrote:
>
>> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
>> > Yes, it is loading a different pythonThe path printed by sys.path is
>> > different to if I open the command window and type:
>> >
>> > python
>> >
>> > >>> import sys
>> > >>> print(sys.path)
>> >
>> > Gdal and the python bindings were compiled with vs2015 and python 3.5,
>> and
>> > I can correctly import python in the 3.5 interpreter, yet somehow a
>> > different python DLL (2.7) is being loaded at runtime.
>>
>> Might be related to the default try order in
>>
>> const char* const apszPythonSO[] = { "python27.dll",
>> "python26.dll",
>> "python34.dll",
>> "python35.dll",
>> "python36.dll",
>> "python33.dll",
>> "python32.dll" };
>>
>> First found, first served.
>>
>> Hum maybe we should try to match the version that issuing python on the
>> command line would start. We could potentially look at the PATH to see if
>> there's something like "bla:\pythonXX" and try the related .dll... Or more
>> costly, but more reliable, try issuing a 'python -c "import sys;
>> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command
>>
>> That's one of the downside of requiring no dependency at build time.
>>
>> Anyway you can override the default guess by setting the PYTHONSO config
>> option
>> to point to the desired python dll.
>>
>> By the way I've committed the doc in
>> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online
>> in a
>> few hours.
>>
>>
>> > I am on a
>> > 'inherited' PC right now, so final thing to do is to ensure that the
>> > gdal_translate I am running is the one I compiled and there isn't
>> another
>> > version lurking somewhere
>> >
>> >
>> > On 14 September 2016 at 16:50, Even Rouault > >
>> >
>> > wrote:
>> > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
>> > > > Trying to run this using a function relying on scipy.ndimage...
>> > > >
>> > > > When running gdal_translate on the VRT, I get ImportError: No module
>> > >
>> > > named
>> > >
>> > > > scipy.ndimage
>> > > > This comes after successfully import numpy. scipy.ndimage will
>> happily
>> > > > import within the python interpreter.
>> > >
>> > > Works for me for both inline or offline functions.
>> > >
>> > > Are you sure GDAL loads the same python lib

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread James Ramm
Ah, that makes sense. I'll have to try the config option in the morning,
but it sounds like that could be it

On 14 Sep 2016 8:06 p.m., "Even Rouault"  wrote:

> Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
> > Yes, it is loading a different pythonThe path printed by sys.path is
> > different to if I open the command window and type:
> >
> > python
> >
> > >>> import sys
> > >>> print(sys.path)
> >
> > Gdal and the python bindings were compiled with vs2015 and python 3.5,
> and
> > I can correctly import python in the 3.5 interpreter, yet somehow a
> > different python DLL (2.7) is being loaded at runtime.
>
> Might be related to the default try order in
>
> const char* const apszPythonSO[] = { "python27.dll",
> "python26.dll",
> "python34.dll",
> "python35.dll",
> "python36.dll",
> "python33.dll",
> "python32.dll" };
>
> First found, first served.
>
> Hum maybe we should try to match the version that issuing python on the
> command line would start. We could potentially look at the PATH to see if
> there's something like "bla:\pythonXX" and try the related .dll... Or more
> costly, but more reliable, try issuing a 'python -c "import sys;
> print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command
>
> That's one of the downside of requiring no dependency at build time.
>
> Anyway you can override the default guess by setting the PYTHONSO config
> option
> to point to the desired python dll.
>
> By the way I've committed the doc in
> https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online
> in a
> few hours.
>
>
> > I am on a
> > 'inherited' PC right now, so final thing to do is to ensure that the
> > gdal_translate I am running is the one I compiled and there isn't another
> > version lurking somewhere
> >
> >
> > On 14 September 2016 at 16:50, Even Rouault 
> >
> > wrote:
> > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> > > > Trying to run this using a function relying on scipy.ndimage...
> > > >
> > > > When running gdal_translate on the VRT, I get ImportError: No module
> > >
> > > named
> > >
> > > > scipy.ndimage
> > > > This comes after successfully import numpy. scipy.ndimage will
> happily
> > > > import within the python interpreter.
> > >
> > > Works for me for both inline or offline functions.
> > >
> > > Are you sure GDAL loads the same python lib as the python version used
> in
> > > the
> > > python interpreter ? (check the debug traces with CPL_DEBUG=ON)
> > >
> > > You can also add at the top of your script
> > >
> > > import sys
> > > print(sys.path)
> > >
> > > and check if the output points to a location where your scipy package
> can
> > > be
> > > found.
> > >
> > > > Any tips on how to track this down/debug?
> > >
> > > > The entire VRT file is as follows:
> > > I guess this is not the entire VRT since it refers to an inline
> > > definition of
> > > the script but  has empty content.
> > >
> > > > 
> > > >
> > > >   PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> > > >
> > > > 1936",DATUM["OSGB_1936",SPHEROID["Airy
> > > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG
> > >
> > > S84[446.448,-12
> > >
> > > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627
> > >
> > > 7"]],PRIMEM["Gr
> > >
> > > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532
> > >
> > > 925199433,AUTHO
> > >
> > > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["
> > >
> > > Transverse_Merca
> > >
> > > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_
> > >
> > > meridian",-2],P
> > >
> > > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti
> > >
> > > ng",40],PAR
> > >
> > > > AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["
> > >
> > > EPSG","9001"]],A
> > >
> > > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG",
> > >
> > > "27700"]]
> > >
> > > > 100180.0,5.0,0.0,1215730.0,0.0,-5.0
> > > >  > > > subClass="VRTDerivedRasterBand">
> > > >
> > > > 
> > > >
> > > >> > >
> > > > relativeToVrt="0">F:\tif_data\large_sparse.tif
> > > >
> > > >> > >
> > > > RasterXSize="111090" RasterYSize="259376"/>
> > > >
> > > > 
> > > >
> > > >   4
> > > >   TRUE
> > > >
> > > > 
> > > >
> > > > 
> > > > extract_blobs
> > > > Python
> > > > 
> > > >
> > > >   
> > > >
> > > > 5
> > > > 
> > > >
> > > >   
> > > >
> > > > 
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-
> > >
> > > band-pixel-functi
> > >
> > > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL -
> Dev
> > > > mailing l

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread Even Rouault
Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
> Yes, it is loading a different pythonThe path printed by sys.path is
> different to if I open the command window and type:
> 
> python
> 
> >>> import sys
> >>> print(sys.path)
> 
> Gdal and the python bindings were compiled with vs2015 and python 3.5, and
> I can correctly import python in the 3.5 interpreter, yet somehow a
> different python DLL (2.7) is being loaded at runtime. 

Might be related to the default try order in

const char* const apszPythonSO[] = { "python27.dll",
"python26.dll",
"python34.dll",
"python35.dll",
"python36.dll",
"python33.dll",
"python32.dll" };

First found, first served.

Hum maybe we should try to match the version that issuing python on the 
command line would start. We could potentially look at the PATH to see if 
there's something like "bla:\pythonXX" and try the related .dll... Or more 
costly, but more reliable, try issuing a 'python -c "import sys; 
print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command

That's one of the downside of requiring no dependency at build time.

Anyway you can override the default guess by setting the PYTHONSO config option 
to point to the desired python dll.

By the way I've committed the doc in 
https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a 
few hours.


> I am on a
> 'inherited' PC right now, so final thing to do is to ensure that the
> gdal_translate I am running is the one I compiled and there isn't another
> version lurking somewhere
> 
> 
> On 14 September 2016 at 16:50, Even Rouault 
> 
> wrote:
> > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> > > Trying to run this using a function relying on scipy.ndimage...
> > > 
> > > When running gdal_translate on the VRT, I get ImportError: No module
> > 
> > named
> > 
> > > scipy.ndimage
> > > This comes after successfully import numpy. scipy.ndimage will happily
> > > import within the python interpreter.
> > 
> > Works for me for both inline or offline functions.
> > 
> > Are you sure GDAL loads the same python lib as the python version used in
> > the
> > python interpreter ? (check the debug traces with CPL_DEBUG=ON)
> > 
> > You can also add at the top of your script
> > 
> > import sys
> > print(sys.path)
> > 
> > and check if the output points to a location where your scipy package can
> > be
> > found.
> > 
> > > Any tips on how to track this down/debug?
> > 
> > > The entire VRT file is as follows:
> > I guess this is not the entire VRT since it refers to an inline
> > definition of
> > the script but  has empty content.
> > 
> > > 
> > > 
> > >   PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> > > 
> > > 1936",DATUM["OSGB_1936",SPHEROID["Airy
> > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG
> > 
> > S84[446.448,-12
> > 
> > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627
> > 
> > 7"]],PRIMEM["Gr
> > 
> > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532
> > 
> > 925199433,AUTHO
> > 
> > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["
> > 
> > Transverse_Merca
> > 
> > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_
> > 
> > meridian",-2],P
> > 
> > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti
> > 
> > ng",40],PAR
> > 
> > > AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["
> > 
> > EPSG","9001"]],A
> > 
> > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG",
> > 
> > "27700"]]
> > 
> > > 100180.0,5.0,0.0,1215730.0,0.0,-5.0
> > >  > > subClass="VRTDerivedRasterBand">
> > > 
> > > 
> > > 
> > >> > 
> > > relativeToVrt="0">F:\tif_data\large_sparse.tif
> > > 
> > >> > 
> > > RasterXSize="111090" RasterYSize="259376"/>
> > > 
> > > 
> > > 
> > >   4
> > >   TRUE
> > > 
> > > 
> > > 
> > > 
> > > extract_blobs
> > > Python
> > > 
> > > 
> > >   
> > > 
> > > 5
> > > 
> > >   
> > >   
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > --
> > > View this message in context:
> > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-
> > 
> > band-pixel-functi
> > 
> > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev
> > > mailing list archive at Nabble.com.
> > > ___
> > > gdal-dev mailing list
> > > gdal-dev@lists.osgeo.org
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> > 
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
_

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread Even Rouault
Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> Trying to run this using a function relying on scipy.ndimage...
> 
> When running gdal_translate on the VRT, I get ImportError: No module named
> scipy.ndimage
> This comes after successfully import numpy. scipy.ndimage will happily
> import within the python interpreter.

Works for me for both inline or offline functions.

Are you sure GDAL loads the same python lib as the python version used in the 
python interpreter ? (check the debug traces with CPL_DEBUG=ON)

You can also add at the top of your script

import sys
print(sys.path)

and check if the output points to a location where your scipy package can be 
found.

> Any tips on how to track this down/debug?
> 
> The entire VRT file is as follows:

I guess this is not the entire VRT since it refers to an inline definition of 
the script but  has empty content.

> 
> 
>   PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> 1936",DATUM["OSGB_1936",SPHEROID["Airy
> 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-12
> 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Gr
> eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHO
> RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Merca
> tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],P
> ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PAR
> AMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],A
> XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]
> 100180.0,5.0,0.0,1215730.0,0.0,-5.0
>  subClass="VRTDerivedRasterBand">
> 
>relativeToVrt="0">F:\tif_data\large_sparse.tif
>RasterXSize="111090" RasterYSize="259376"/>
> 
>   4
>   TRUE
> 
> 
> extract_blobs
> Python
> 
>   
> 5
> 
>   
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi
> ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev
> mailing list archive at Nabble.com.
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread jramm
Trying to run this using a function relying on scipy.ndimage...

When running gdal_translate on the VRT, I get ImportError: No module named
scipy.ndimage
This comes after successfully import numpy. scipy.ndimage will happily
import within the python interpreter.
Any tips on how to track this down/debug? 

The entire VRT file is as follows:


  PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
1936",DATUM["OSGB_1936",SPHEROID["Airy
1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PARAMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]
  100180.0,5.0,0.0,1215730.0,0.0,-5.0
  

  F:\tif_data\large_sparse.tif
  
  
4
TRUE
  

extract_blobs
Python


5

  





--
View this message in context: 
http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285883.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread jramm
Trying to run this using a function relying on scipy.ndimage...

When running gdal_translate on the VRT, I get ImportError: No module named
scipy.ndimage
This comes after successfully import numpy. scipy.ndimage will happily
import within the python interpreter.
Any tips on how to track this down/debug? 

The entire VRT file is as follows:


  PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
1936",DATUM["OSGB_1936",SPHEROID["Airy
1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",40],PARAMETER["false_northing",-10],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]
  100180.0,5.0,0.0,1215730.0,0.0,-5.0
  

  F:\tif_data\large_sparse.tif
  
  
4
TRUE
  

extract_blobs
Python


5

  





--
View this message in context: 
http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285882.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread Even Rouault
Le mercredi 14 septembre 2016 09:02:07, Rutger a écrit :
> Then I guess i'm mistaken in thinking that Python would become a slowdown
> in such a case. It's been a while since i tested it. Anyway, thanks for
> your comments and efforts to improve the performance.
> 
> I couldnt find any Conda channels* which build from trunk, so i probably
> have to wait a while before i can
> try it out, something to look forward to. :)
> 
> * https://anaconda.org/search?q=gdal

You can also grab a nightly build at 
http://www.gisinternals.com/development.php

> 
> 
> Regards,
> Rutger
> 
> 
> Even Rouault-2 wrote
> 
> > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > 
> > 
> > With all that the cost of the Python layer becomes neglectable (except
> > loading
> > the Python environment the first time, if not already loaded, but for a
> > computation that will be longer than a few seconds, that's not really a
> > big
> > deal)
> > 
> > 
> > gdal-dev@.osgeo
> > 
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> --
> View this message in context:
> http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi
> ons-written-in-Python-tp5285323p5285730.html Sent from the GDAL - Dev
> mailing list archive at Nabble.com.
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-14 Thread Rutger
Then I guess i'm mistaken in thinking that Python would become a slowdown in
such a case. It's been a while since i tested it. Anyway, thanks for your
comments and efforts to improve the performance.

I couldnt find any Conda channels* which build from trunk, so i probably
have to wait a while before i can 
try it out, something to look forward to. :)

* https://anaconda.org/search?q=gdal


Regards,
Rutger


Even Rouault-2 wrote
> Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> 
> 
> With all that the cost of the Python layer becomes neglectable (except
> loading 
> the Python environment the first time, if not already loaded, but for a 
> computation that will be longer than a few seconds, that's not really a
> big 
> deal)
> 
> -- 
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> ___
> gdal-dev mailing list

> gdal-dev@.osgeo

> http://lists.osgeo.org/mailman/listinfo/gdal-dev





--
View this message in context: 
http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285730.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Even Rouault
Le mardi 13 septembre 2016 21:22:20, James Ramm a écrit :
> I think you can call SWIG with the -threads argument on the command line so
> it will always release the GIL. Could be an easy option if it works

That's mostly what I've done. See my other message : 
https://lists.osgeo.org/pipermail/gdal-dev/2016-September/045155.html

> 
> On Tuesday, 13 September 2016, Even Rouault 
> 
> wrote:
> > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > > I overlooked the fact that it still moves through Python, is that the
> > > 'only' hurdle preventing parallel IO?
> > 
> > Not sure to understand your question. But if you have several sources,
> > you could potentially do parallelized reading of them from the Python
> > code by using Python threads and GDAL Python API. But looking in the
> > SWIG generated code, it doesn't seem that SWIG releases the GIL
> > automatically before calling
> > native code. Hum... So that should probably added manually, at least
> > around GDALRasterIO() calls, otherwise you'll get zero perf
> > improvements.
> > 
> > > Since gdalwarp for example has the
> > > -multi flag, it seems as if GDAL is capable of it, or is that a
> > > specific/specialized implementation?
> > 
> > Parallelized I/O doesn't mean much by itself without more context. You
> > may want to parallelize reading of different regions of the same
> > dataset, or parallelize reading of different datasets. Due to GDAL
> > objects not being thread-safe, the first case (reading of different
> > regions of the same dataset)
> > can be solved with the second one by opening several datasets for the
> > same filename.
> > 
> > Regarding gdalwarp -multi, here's how that works. When you warp a
> > dataset, there's a list of all chunks (windows) to be processed that is
> > generated. gdalwarp -multi does the following
> > 
> > Thread I/O  Thread
> > computation Read data for chunk 1
> > Read data for chunk 2   Do calculations for chunk 1
> > Write output of chunk 1 Do calculations for chunk 2
> > Read data for chunk 3
> > Write output of chunk 2 Do calculations for chunk 3
> > 
> > > Numba has several options which might eliminate using Python during
> > > execution. There are c-callbacks:
> > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html
> > 
> > You can also use @jit(nopython=True, nogil=True) and your Python method
> > will
> > end up being pure native code (provided that you don't use too high level
> > stuff
> > otherwise the jit'ification will fail with an exception).
> > 
> > And for code that is not inlined in the VRT, you can also add cache=True
> > so that the jit'ification can be reused.
> > 
> > With all that the cost of the Python layer becomes neglectable (except
> > loading
> > the Python environment the first time, if not already loaded, but for a
> > computation that will be longer than a few seconds, that's not really a
> > big deal)
> > 
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
> > ___
> > gdal-dev mailing list
> > gdal-dev@lists.osgeo.org 
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread James Ramm
I think you can call SWIG with the -threads argument on the command line so
it will always release the GIL. Could be an easy option if it works

On Tuesday, 13 September 2016, Even Rouault 
wrote:

> Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > I overlooked the fact that it still moves through Python, is that the
> > 'only' hurdle preventing parallel IO?
>
> Not sure to understand your question. But if you have several sources, you
> could potentially do parallelized reading of them from the Python code by
> using Python threads and GDAL Python API. But looking in the SWIG generated
> code, it doesn't seem that SWIG releases the GIL automatically before
> calling
> native code. Hum... So that should probably added manually, at least around
> GDALRasterIO() calls, otherwise you'll get zero perf improvements.
>
> > Since gdalwarp for example has the
> > -multi flag, it seems as if GDAL is capable of it, or is that a
> > specific/specialized implementation?
>
> Parallelized I/O doesn't mean much by itself without more context. You may
> want to parallelize reading of different regions of the same dataset, or
> parallelize reading of different datasets. Due to GDAL objects not being
> thread-safe, the first case (reading of different regions of the same
> dataset)
> can be solved with the second one by opening several datasets for the same
> filename.
>
> Regarding gdalwarp -multi, here's how that works. When you warp a dataset,
> there's a list of all chunks (windows) to be processed that is generated.
> gdalwarp -multi does the following
>
> Thread I/O  Thread computation
> Read data for chunk 1
> Read data for chunk 2   Do calculations for chunk 1
> Write output of chunk 1 Do calculations for chunk 2
> Read data for chunk 3
> Write output of chunk 2 Do calculations for chunk 3
>
>
> >
> > Numba has several options which might eliminate using Python during
> > execution. There are c-callbacks:
> > http://numba.pydata.org/numba-doc/dev/user/cfunc.html
>
> You can also use @jit(nopython=True, nogil=True) and your Python method
> will
> end up being pure native code (provided that you don't use too high level
> stuff
> otherwise the jit'ification will fail with an exception).
>
> And for code that is not inlined in the VRT, you can also add cache=True so
> that the jit'ification can be reused.
>
> With all that the cost of the Python layer becomes neglectable (except
> loading
> the Python environment the first time, if not already loaded, but for a
> computation that will be longer than a few seconds, that's not really a big
> deal)
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org 
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Even Rouault
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the
> 'only' hurdle preventing parallel IO?

Not sure to understand your question. But if you have several sources, you 
could potentially do parallelized reading of them from the Python code by 
using Python threads and GDAL Python API. But looking in the SWIG generated 
code, it doesn't seem that SWIG releases the GIL automatically before calling 
native code. Hum... So that should probably added manually, at least around 
GDALRasterIO() calls, otherwise you'll get zero perf improvements.

> Since gdalwarp for example has the
> -multi flag, it seems as if GDAL is capable of it, or is that a
> specific/specialized implementation?

Parallelized I/O doesn't mean much by itself without more context. You may 
want to parallelize reading of different regions of the same dataset, or 
parallelize reading of different datasets. Due to GDAL objects not being 
thread-safe, the first case (reading of different regions of the same dataset) 
can be solved with the second one by opening several datasets for the same 
filename.

Regarding gdalwarp -multi, here's how that works. When you warp a dataset, 
there's a list of all chunks (windows) to be processed that is generated. 
gdalwarp -multi does the following

Thread I/O  Thread computation
Read data for chunk 1   
Read data for chunk 2   Do calculations for chunk 1
Write output of chunk 1 Do calculations for chunk 2
Read data for chunk 3   
Write output of chunk 2 Do calculations for chunk 3


> 
> Numba has several options which might eliminate using Python during
> execution. There are c-callbacks:
> http://numba.pydata.org/numba-doc/dev/user/cfunc.html

You can also use @jit(nopython=True, nogil=True) and your Python method will 
end up being pure native code (provided that you don't use too high level stuff 
otherwise the jit'ification will fail with an exception).

And for code that is not inlined in the VRT, you can also add cache=True so 
that the jit'ification can be reused.

With all that the cost of the Python layer becomes neglectable (except loading 
the Python environment the first time, if not already loaded, but for a 
computation that will be longer than a few seconds, that's not really a big 
deal)

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Rutger
I overlooked the fact that it still moves through Python, is that the 'only'
hurdle preventing parallel IO? Since gdalwarp for example has the -multi
flag, it seems as if GDAL is capable of it, or is that a
specific/specialized implementation?

Numba has several options which might eliminate using Python during
execution. There are c-callbacks:
http://numba.pydata.org/numba-doc/dev/user/cfunc.html

It probably works by using the Numpy C-API (which i have zero experience
with). I don't know if its possible that other programs, like GDAL, can also
use those compiled functions without moving to Python first.

There is also ahead-of-time compilation (AOT):
http://numba.pydata.org/numba-doc/dev/user/pycc.html

AOT has the benefit that users only need Numpy as a dependency, and don't
need Numba/llvm. There some drawbacks as well, like no longer being able to
compile optimizations for the hardware its running on.


Regards,
Rutger


Even Rouault-2 wrote
> Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :
> 
> I've only scratched up the surface of Numba (didn't know it 2 days ago). I 
> guess this might work (might only be interested if the 
> VRTDerivedRasterBand::IRasterIO() is called with a big enough region.
> Which 
> depends on the pixel access pattern of upper layers). The VRT drivers just 
> calls a Python function that takes numpy arrays and a few extra args.
> 
>> And would that give parallel
>> processing for both IO and calculations?
> 
> Only calculations. I/O is done before going to Python and after returning
> from 
> it.
> 
> Actually... if you specify zero source in the VRT, then it is up to you to
> do 
> the read operations the way you like in Python, so you could possibly 
> parallelize them there.
> 
> 
> Even
> 
> -- 
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> ___
> gdal-dev mailing list

> gdal-dev@.osgeo

> http://lists.osgeo.org/mailman/listinfo/gdal-dev





--
View this message in context: 
http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285482.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Even Rouault
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :
> Hi Even,
> 
> This is really amazing. I'm becoming more and more a fan of Numba for
> number crunching, so this certainly makes my day. As soon as i can find a
> Win x64 dev version on a Conda channel I'll give it a try.
> 
> A use case that comes to mind, and which i run into regularly, is when i
> want to do some simple aggregation before using something like gdalwarp.
> For example when you have a file containing 24 hourly temperature values,
> and you are only interested in the daily mean. Currently i either
> aggregate before warping and write the intermediates to disk, or aggregate
> after warping which is computationally inefficient. Neither is optimal.
> 
> A few questions, can you access a files metadata from within a pixel
> function? This would perhaps allow for example interpolating atmospheric
> data to the overpass time of a satellite image.

This is indeed a question I've asked myself, if the prototype of the pixel 
function contained enough information. I had not put initially the 
geotransform and whole raster dimensions for example, but found it was 
necessary to have correct behaviour for hillshading at different zoom scales. 
And I also added afterwards the xoff, yoff to be able to do Mandelbrot 
generation (I guess there might be some more valid use cases ;-)). And then 
the user provided dictionnary to have some parametrization of the algorithm 
instead of hardcoding constants in it, so that you can off-load it into a 
general lib. And I stopped at that point.

For your use case, 2 possibilities currently, using the 
 capability:
- either you get the metadata items you need at VRT generation and put them as 
arguments
- either you only pass the names of the sources as arguments, and you use the 
GDAL Python bindings themselves inside the pixel function itself to get access 
to everything needed. But I'm just thinking it might be a bit inefficient to do 
that for each RasterIO() request. 



Perhaps a compromise would be to allow, in addition to simple functions, to 
specify a class name, where the constructor would receive values that don't 
change from one call to another one, and a calc() method would receive the 
ones that change at each RasterIO() request

def MyProcessingClass:
def __init__(self, raster_xsize, raster_ysize, buf_radius, gt, 
source_filenames, **kwargs):
save above parameters that you may need in calc()
do one time things like gdal.Open()'ing sources to get metadata

def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ):
do your computation

We could possibly pass the sources as datasets themselves, since they are 
actually already opened, but that would make the osgeo.gdal bindings 
availability a requirement (well, we could as a fallback pass the filenames if 
the import fails)



> 
> Do the pixel functions also work with @numba.vectorize(), in particular
> when targeting 'parallel' or 'cuda'. 

I've only scratched up the surface of Numba (didn't know it 2 days ago). I 
guess this might work (might only be interested if the 
VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which 
depends on the pixel access pattern of upper layers). The VRT drivers just 
calls a Python function that takes numpy arrays and a few extra args.

> And would that give parallel
> processing for both IO and calculations?

Only calculations. I/O is done before going to Python and after returning from 
it.

Actually... if you specify zero source in the VRT, then it is up to you to do 
the read operations the way you like in Python, so you could possibly 
parallelize them there.

>You should give the folks at Continuum a heads up, i'm sure they appreciate
> seeing Numba used like this. 

I have no connection with them, but yes their project rocks.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Even Rouault
Le lundi 12 septembre 2016 23:38:41, James Ramm a écrit :
> Incredible
> Really cool initiative.
> 
> I had a play around and wanted a way to be able to write the pixel function
> in a 'stand alone' python module and generate the VRT from it. This would
> allow independent testing and easier maintenance of the python code. It is
> fairly easy using lxml to build up the VRT dynamically:
> 
> https://github.com/JamesRamm/gdal_pixel_functions
> 
> In the example I more-or-less recreate the hillshade example Even posted,
> although there is a lot of functionality missing from the VRT-builder
> still, so some nodes are not present.
> It would be quite easy therefore, to expand this to a sort of GDAL
> 'algorithms library' (I think it can wholesale replace what I had tried
> here: https://github.com/JamesRamm/GeoAlg).

a "-of VRT" option in current gdal_calc.py could also be cool (at least for 
simple one line operations)

Programmatic creation of VRT derived band, with the previous scope (ie calling 
a native registered pixel function), already existed. See 
https://github.com/OSGeo/gdal/blob/trunk/autotest/gdrivers/vrtderived.py#L175
I didn't extend for now the VRTDataset::AddBand implementation to set the new 
elements needed for the Python stuff, but that would certainly be doable.

> 
> One thing I would note (if I understand correctly) is that you have
> specified any "import" statement as unsafe except numpy/math and numba? It
> might be better to either allow imports or not allow them at all (unless
> the config option is set) as I don't see why those libraries would be any
> more or any less 'safe' than another library...e.g. another standard
> library module.

Well, the idea was that those modules should only do math stuff and not 
interact with the rest of the system (contrary to 'os', 'system' and the like) 
But is in the numpy case, it is not actually true since it has I/O methods, 
hence my naive attempt to blacklist those by names.

> 
> I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the
> users discretion. 

That's what I've done (see previous message)

> Many users who are working with their own local data will
> likely have no problems about just setting it to 'YES' and I imagine users
> for whom it would be a probably might already have their own rules on what
> constitutes 'safe'?
> 
> On a sidenote - does this new functionality overlap considerably with the
> gdal map algebra project?

Indeed. I think there might be several incarnations of such capability and I'm 
not sure we can come up with a single one that will satisfy all needs and 
constraints. The VRT way has some use cases where it shines (on-the-fly 
computation of a region of interest. easy integration with upper levels in the 
stack: QGIS, MapServer, etc...), whereas you could imagine something else 
efficient in doing whole raster processing with parallelism etc... 

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-13 Thread Rutger
Hi Even,

This is really amazing. I'm becoming more and more a fan of Numba for number
crunching, so this certainly makes my day. As soon as i can find a Win x64
dev version on a Conda channel I'll give it a try.

A use case that comes to mind, and which i run into regularly, is when i
want to do some simple aggregation before using something like gdalwarp. For
example when you have a file containing 24 hourly temperature values, and
you are only interested in the daily mean. Currently i either aggregate
before warping and write the intermediates to disk, or aggregate after
warping which is computationally inefficient. Neither is optimal.

A few questions, can you access a files metadata from within a pixel
function? This would perhaps allow for example interpolating atmospheric
data to the overpass time of a satellite image.

Do the pixel functions also work with @numba.vectorize(), in particular when
targeting 'parallel' or 'cuda'. And would that give parallel processing for
both IO and calculations?

You should give the folks at Continuum a heads up, i'm sure they appreciate
seeing Numba used like this. 


Regards,
Rutger






--
View this message in context: 
http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functions-written-in-Python-tp5285323p5285450.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-12 Thread James Ramm
Incredible
Really cool initiative.

I had a play around and wanted a way to be able to write the pixel function
in a 'stand alone' python module and generate the VRT from it. This would
allow independent testing and easier maintenance of the python code. It is
fairly easy using lxml to build up the VRT dynamically:

https://github.com/JamesRamm/gdal_pixel_functions

In the example I more-or-less recreate the hillshade example Even posted,
although there is a lot of functionality missing from the VRT-builder
still, so some nodes are not present.
It would be quite easy therefore, to expand this to a sort of GDAL
'algorithms library' (I think it can wholesale replace what I had tried
here: https://github.com/JamesRamm/GeoAlg).

One thing I would note (if I understand correctly) is that you have
specified any "import" statement as unsafe except numpy/math and numba? It
might be better to either allow imports or not allow them at all (unless
the config option is set) as I don't see why those libraries would be any
more or any less 'safe' than another library...e.g. another standard
library module.

I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the
users discretion. Many users who are working with their own local data will
likely have no problems about just setting it to 'YES' and I imagine users
for whom it would be a probably might already have their own rules on what
constitutes 'safe'?

On a sidenote - does this new functionality overlap considerably with the
gdal map algebra project?

On 12 September 2016 at 19:25, Even Rouault 
wrote:

> > I found http://nedbatchelder.com/blog/201206/eval_really_is_
> dangerous.html
> > to be a good intro to the risks of eval'ing untrusted Python code.
> > Mentioned in there is a notable attempt to make a secure subset of Python
> > called "pysandbox", but its developer has since declared it "broken by
> > design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough
> > about sandboxing (OS or otherwise) to say if that's right.
>
> Those links are fascinating. I wouldn't have imagined that there was valid
> Python code that could crash the Python interpreter, almost by design ! (so
> the osgeo.gdal gotchas are not so bad after all :-) )
> OK, so I've followed your suggestion:
> * IF_SAFE mode is removed (actually between #if 0 / #endif if someone
> wants to
> pursue the sandboxing effort),
> * and the default of GDAL_VRT_ENABLE_PYTHON is now NO.
>
> >
> > I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it
> > possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the
> > reader's own trust policies?
>
> No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only,
> so can
> only be set as a env variable or through CPLSetConfigOption().
>
> We could imagine to provide it as an option of the VRT indeed, but then
> there
> would be interesting situations like the following. Imagine something
> called
> "my.tif" with the following content :
>
> 
>   
> Gray
> 
>   
>   
> YES
>   
> 
>   
> 
>
> The internalized VRT would then be allowed with ENABLE_PYTHON=YES.
>
> So it's probably a good idea to not provide an open option for now. One
> could
> perhaps imagine to introduce one, provided we make it illegal as an item in
> , so it can only be provided by "top-level" GDALOpenEx()
> calls.
>
> > My ignorance of how GDAL separates "open"
> > options from "config" options might be on display in this question.
>
> They are just 2 different mechanisms of providing options. There's no
> automatic
> bridge between both concepts. Sometimes some option may exist in both
> worlds
> (because historically there was only global config option, but in some use
> cases, some options mostly make sense per dataset) or in just one of them.
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-12 Thread Even Rouault
> I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
> to be a good intro to the risks of eval'ing untrusted Python code.
> Mentioned in there is a notable attempt to make a secure subset of Python
> called "pysandbox", but its developer has since declared it "broken by
> design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough
> about sandboxing (OS or otherwise) to say if that's right.

Those links are fascinating. I wouldn't have imagined that there was valid 
Python code that could crash the Python interpreter, almost by design ! (so 
the osgeo.gdal gotchas are not so bad after all :-) )
OK, so I've followed your suggestion:
* IF_SAFE mode is removed (actually between #if 0 / #endif if someone wants to 
pursue the sandboxing effort),
* and the default of GDAL_VRT_ENABLE_PYTHON is now NO.

> 
> I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it
> possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the
> reader's own trust policies?

No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, so can 
only be set as a env variable or through CPLSetConfigOption().

We could imagine to provide it as an option of the VRT indeed, but then there 
would be interesting situations like the following. Imagine something called 
"my.tif" with the following content :


  
Gray

  
  
YES
  

  


The internalized VRT would then be allowed with ENABLE_PYTHON=YES.

So it's probably a good idea to not provide an open option for now. One could 
perhaps imagine to introduce one, provided we make it illegal as an item in 
, so it can only be provided by "top-level" GDALOpenEx() calls.

> My ignorance of how GDAL separates "open"
> options from "config" options might be on display in this question.

They are just 2 different mechanisms of providing options. There's no automatic 
bridge between both concepts. Sometimes some option may exist in both worlds 
(because historically there was only global config option, but in some use 
cases, some options mostly make sense per dataset) or in just one of them.

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-12 Thread Even Rouault
Le lundi 12 septembre 2016 16:59:34, Poughon Victor a écrit :
> Hi Even,
> 
> This is a really cool and really impressive feature!
> Calling Python code from C++ without development packages as a dependency
> sounds like black magic to me.

No black magic. Just setting function pointers using dlopen() :
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L390

> Obviously Python symbols have to be there
> at some point to execute Python code, so this is only usable from a binary
> that happens to load them already (CPython, QGIS, etc.), correct?

No, there are 2 modes:
- GDAL is loaded after something else has already loaded python (so your 
CPython, QGIS)
- GDAL cannot find any python symbols in the already available symbols of the 
process, in which case it wil try to load a few reasonable shared objects like 
"libpython2.7.so", etc...

See 
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L238
 
and below lines

> But I'm confused because later you mention
> incompatibilities issues between the CPython and Pypy API. GDAL's secret
> sauce, I guess...?

Nothing GDAL specific here. It is just that the way to embed PyPy from C is 
completely different from CPython :

http://doc.pypy.org/en/latest/embedding.html
vs
https://docs.python.org/2/extending/embedding.html

> I'm also curious why it's possible to use numba but no
> pypy, which AFAIK are both python JITs?

numba is "just" a CPython extension (a non-trivial one dragging llvmlite 
etc...), so once you support CPython, it comes for free to use it, provided 
you use the right import and function decorations in your python code.

> And finally did you consider using
> Cython (which claims pypy compatibility)?

No, I didn't investigate that.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-12 Thread Poughon Victor
Hi Even,

This is a really cool and really impressive feature!
Calling Python code from C++ without development packages as a dependency 
sounds like black magic to me. Obviously Python symbols have to be there at 
some point to execute Python code, so this is only usable from a binary that 
happens to load them already (CPython, QGIS, etc.), correct? In particular you 
say that it is "done at run-time by dynamically loading Python symbols". But 
I'm confused because later you mention incompatibilities issues between the 
CPython and Pypy API. GDAL's secret sauce, I guess...?
I'm also curious why it's possible to use numba but no pypy, which AFAIK are 
both python JITs? And finally did you consider using Cython (which claims pypy 
compatibility)?

Cheers,

Victor Poughon


De : gdal-dev [gdal-dev-boun...@lists.osgeo.org] de la part de Even Rouault 
[even.roua...@spatialys.com]
Envoyé : lundi 12 septembre 2016 14:31
À : gdal-dev@lists.osgeo.org
Objet : [gdal-dev] VRT derived band pixel functions written in Python

Hi,

I wanted to mention a new (and I think pretty cool ) feature I've added in
trunk: the possibility to define pixel functions in Python in a VRT derived
band.

Let's start with a simple example to multiply a raster by 1.5 :


  EPSG:26711
  440720,60,0,3751320,0,-60
  
multiply
Python




  byte.tif

  


or to add 2 (or more) rasters:


  EPSG:26711
  440720,60,0,3751320,0,-60
  
add
Python



  byte.tif


  byte2.tif

  



You can put any python module code inside PixelFunctionCode, with at least one
function with the following arguments :
- in_ar: list of input numpy arrays (one numpy array for each source)
- out_ar: output numpy array to fill (initialized at the right dimensions and
with the VRTRasterBand.dataType)
- xoff: pixel offset to the top left corner of the accessed region of the band
- yoff: line offset to the top left corner of the accessed region of the band
- xsize: width of the region of the accessed region of the band
- ysize: height of the region of the accessed region of the band
- raster_xsize: total with of the raster band
- raster_ysize: total height of the raster band
- buf_radius: radius of the buffer (in pixels) added to the left, right, top
and bottom of in_ar / out_ar
- gt: geotransform
- kwargs: dictionnary with user arguments defined in PixelFunctionArguments

For basic operations, you just need to care about in_ar and out_ar.

With all that, you can do interesting stuff like hillshading (code ported from
gdaldem):


  EPSG:4326
  -80.0041663,0.008,0,
44.0041663,0,-0.008
  
Gray

  n43.dt0

Python
hillshade


  

1
Int16
  


You can completely offload the python code itself into a proper my_lib.py file
and just specify
my_lib.hillshade

Technically, the interfacing with Python is done at run-time by dynamically
loading Python symbols with dlopen()/GetProcAddress(), when they are already
available in the process. For example if libgdal is loaded from a Python
interpreter, or from a binary like QGIS which has already loaded the Python
lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll",
"libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is
specified to point to a precise filename. The advantage of this approach is that
the same GDAL library binary is compatible of all Python 2.X (tested: 2.6,
2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in
the Python environment used (at compilation time you don't need any
python/numpy development package). The numpy dependency is not a critical one:
one could imagine a fallback mode where Python arrays would be used instead,
but this has likely little interest.

Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking
of the predefined set of shared object names - that are probed when no already
loaded Python environment is found - might be needed.
The implementation should be thread-safe regarding use of the Python Global
Interpreter Lock (GIL).

I've also tested with the numba Just-In-Time compiler
(http://numba.pydata.org/) and it provides major performance improvements for
highly computational code (example given below).
With numba enabled, I found that my above Python hillshading on a 10Kx10K
float32 raster was even faster than gdaldem (the reason is that the Python
version is a bit simplified as it doesn't take into account input nodata
values), whereas the non-jit'ed one is 100x slower. When removing the nodata
flag, gdaldem is only twice faster as the jit'ed python code. So this is a good
sign that such approach isn't only a toy or just for prototyping.
Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would
require a new backend as PyPy C API has nothing to do with the CPython one.

There are obvious security c

Re: [gdal-dev] VRT derived band pixel functions written in Python

2016-09-12 Thread Sean Gillies
Hi Even,

On Mon, Sep 12, 2016 at 2:31 PM, Even Rouault 
wrote:

> Hi,
>
> I wanted to mention a new (and I think pretty cool ) feature I've added in
> trunk: the possibility to define pixel functions in Python in a VRT derived
> band.
>
> ...
>
> There are obvious security concerns in allowing Python code to be run when
> getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config
> option =
> IF_SAFE / NO / YES can be set to control the behaviour. The default is
> IF_SAFE
> (can be change at compilation time by defining
> -DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can
> be
> completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be
> understood
> as: the code will not read, write, delete... files, spawn external code, do
> network activity, etc. Said otherwise, the code will only do "maths". But
> infinite looping is something definitely possible in the safe mode. The
> heuristics of the IF_SAFE mode is rather basic and I'd be grateful if
> people
> could point ways of breaking it. If any of the following strings - "import"
> (unless it is "import numpy" / "from numpy import ...", "import math" /
> "from
> math import ..." or "from numba import jit"), "eval", "compile", "open",
> "load", "file", "input", "save", "memmap", "DataSource", "genfromtxt",
> "getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere
> in
> the code, then the code is considered unsafe (there are interestingly a
> lot of
> methods in numpy to do file I/O. Hopefully I've captured them with the
> previous
> filters). Another 'unsafe' pattern is when the pixel function references an
> external module like my above my_lib.hillshade example (who knows if there
> will not be some day a shutil.reformat_your_hard_drive function with the
> right
> prototype...)
>
> This new capability isn't yet documented in the VRT doc, although this
> message
> will be a start.
>
> I'm interested in feedback you may have.
>

I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
to be a good intro to the risks of eval'ing untrusted Python code.
Mentioned in there is a notable attempt to make a secure subset of Python
called "pysandbox", but its developer has since declared it "broken by
design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough
about sandboxing (OS or otherwise) to say if that's right.

I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it
possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the
reader's own trust policies? My ignorance of how GDAL separates "open"
options from "config" options might be on display in this question.

My $.02 is that since "is safe" will be hard to guarantee (it's an
outstanding unsolved Python community issue), removing "IF_SAFE" from the
options would be a good thing and that the default for
GDAL_VRT_ENABLE_PYTHON should be "NO".

-- 
Sean Gillies
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev