Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2019-05-09 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.8.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--
Changes (by martinl):

 * milestone:  7.6.2 => 7.8.0


Comment:

 It seems that `grass.script.parallel` is not even part of trunk.

 {{{
 grass7_trunk/lib/python/script$ ls *.py
 array.py  core.py  db.py  __init__.py  raster3d.py  raster.py  setup.py
 task.py  utils.py  vector.py
 }}}

-- 
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-11-15 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by wenzeslaus):

 Replying to [comment:7 mlennert]:
 > What is the status of this ? Is there any documentation outside this
 ticket ?

 For the parallel `grass.pygrass` API, there are the comments in the code
 which go to:

 *
 
https://grass.osgeo.org/grass73/manuals/libpython/pygrass.modules.interface.html
 (`Module`, `MultiModule`, `ParallelModuleQueue`)
 *
 https://grass.osgeo.org/grass73/manuals/libpython/pygrass.modules.grid.html
 (`GridModule`)

 For the `grass.script.parallel`, it's just this ticket, example in the
 attached code, and usage in [source:grass-
 addons/grass7/raster3d/r3.count.categories/r3.count.categories.py#L168
 r3.count.categories]. I haven't had a chance to finalize it or write
 formal tests, so I don't want to commit the code yet without that.

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-11-15 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by mlennert):

 What is the status of this ? Is there any documentation outside this
 ticket ?

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-28 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by mlennert):

 I think there is definitely a place for parallel processing functions in
 grass.script and yours look really great !

 [alert: simplification] In my limited observations and own experience I
 have the feeling that grass.script caters well to the casual, generally
 functional, scientific programmer who just wants to glue together a
 specific workflow, whereas pygrass is much more pythonic and, therefore,
 caters more to those that have a pythonic, more object-oriented, way of
 thinking.[/alert: simplification]

 I guess the question is whether there might be enough common basis between
 the two implementations to not duplicate, but rather use one as the
 backend of the other, with the long-term idea of code maintainability ?

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-27 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by wenzeslaus):

 Yes, I would like to reconcile the two APIs or implementations (or both).
 At this point, I still see too many differences.

 Replying to [comment:4 huhabla]:
 > IMHO, the for-loop to setup the processing commands for the
 TiledWorkflow can be avoided when using the PyGRASS Module and MultiModule
 approach.

 The API with for-loop is actually based on the case where the user wants
 the for loop like this one:

 {{{
 #!python
 for i in range(0, 5):
 gs.run_command('r.module', num=i)
 gs.mapcalc(expr, num=i)
 }}}

 I had code like this and I wanted to parallelize the individual loop runs
 which are independent. So I just come up with the following API which is
 not changing much in the main part of the code:

 {{{
 #!python
 workflow = SeriesWorkflow()  # currently called ModuleCallList
 for i in range(0, 5):
 workflow.run_command('r.module', num=i)
 workflow.mapcalc(expr, num=i)
 workflow.execute()
 }}}

 The Python functions I used in the background have some problems with
 interrupting and failed subprocesses but they handle well a pool of
 subprocess so that there is always the given number of processes running
 (so there can be one really slow process but the others are just running
 in the mean time).

 Then I had a different case, where I didn't have any loop but I needed the
 tiling. The following API emerged from that:

 {{{
 #!python
 for namer, workflow in TiledWorkflow(width=100, height=100):
 name = namer.name('rast', i)
 workflow.run_command('r.module', num=name)
 workflow.mapcalc(expr, num=name)
 workflow.execute()
 }}}

 This was of course before r69507, but the reasons for similar API are
 still there because the non-tiled workflow just has the loop anyway (if
 desired). One argument against current `TiledWorkflow` would actually be
 that we want the API to be different from the case where the loop is
 actually desired by the user.

 > The PyGRASS Module objects allows to alter the input and output settings
 before running, so that the TiledWorkflow class could take care of the
 tile names, altering the user pre-configured Module objects. The user
 simply initiates the Modules that should be used with the original raster
 names.

 The user (at least me) uses variables anyway. With the `SeriesWorkflow`
 case, user names the outputs as needed because all are preserved. With
 `TiledWorkflow` the variables needs to be assigned with the help of the
 `TiledWorkflow`, so some work is required but not that much.

 > The PyGRASS Module allows deep copy operation to clone the existing
 Module objects, hence the TiledWorkflow can create any number of copies
 and replacing the raster names with tile names.

 I don't think it is as simple as replacing the names which is of course
 possible only with PyGRASS, not grass.script. The naming step in
 `TiledWorkflow` simply adds maps for patching. This has potential to
 handle the case for r.mapcalc expressions as well as ''some'' basename
 usages like from r.texture. I don't have this implemented, but the user
 could also not include some outputs for patching and mark them for removal
 instead.

 > > The implementation is now 300 lines. MultiModule alone has 200
 > >
 >
 > Well it is not much "Code". The doctests and the description of
 MultiModule are more than 100 lines. ;)

 Right. I guess my point is that parallel.py mostly relies on higher level
 functions from Python multiprocessing and on grass.script which is itself
 simple. Furthermore, parallel.py is more than just `TiledWorkflow`,
 although that's the longest and most complicated part. The parallel.py's
 design is to cover as many cases as possible with minimal code and the
 cost is that user needs to do something special time to time like the
 naming step for `TiledWorkflow` or the use of somehow wrapper functions
 instead of the real ones (applies to both `SeriesWorkflow` and
 `TiledWorkflow`). However, I think that `MultiModule` and others are much
 more robust at this point. parallel.py's only hope for being robust is
 that it is simple enough to become robust one day.

 I hope this clarifies a little bit more where I'm coming from. I know I
 was not specific in that private email week ago.

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-26 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by huhabla):

 IMHO, the for-loop to setup the processing commands for the TiledWorkflow
 can be avoided when using the PyGRASS Module and MultiModule approach. The
 PyGRASS Module objects allows to alter the input and output settings
 before running, so that the TiledWorkflow class could take care of the
 tile names, altering the user pre-configured Module objects. The user
 simply initiates the Modules that should be used with the original raster
 names. The PyGRASS Module allows deep copy operation to clone the existing
 Module objects, hence the TiledWorkflow can create any number of copies
 and replacing the raster names with tile names.

 Please have a look at the PyGRASS Module initialization in
 t.rast.neighbors:
 
https://trac.osgeo.org/grass/browser/grass/trunk/temporal/t.rast.neighbors/t.rast.neighbors.py#L135

 Cloning and adding to the parallel queue:
 
https://trac.osgeo.org/grass/browser/grass/trunk/temporal/t.rast.neighbors/t.rast.neighbors.py#L168

 Cite:
 >
 > The implementation is now 300 lines. MultiModule alone has 200
 >

 Well it is not much "Code". The doctests and the description of
 MultiModule are more than 100 lines. ;)

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-26 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by wenzeslaus):

 Replying to [comment:1 mlennert]:
 > Wow, this looks great !

 Note also what is in PyGRASS. Especially after r69507.

 > Could you just explain the relation

 Partial duplication. As in the case of `grass.script` and
 `grass.pygrass.modules`.

 > difference between this and the GridModule in pygrass ?

 * This supports list of modules (workflow) which are executed subsequently
 on the given tile.
 * User needs to prepare the list in a for loop (as opposed to not using
 any for loop). This is because it is in fact derived from the non-tiled
 parallelization API which is more general, so the user in fact loops over
 what needs to be done (with or without parallelization).
 * Related to that, the API for simple parallel processing, parallel
 processing of series of maps, and tiled parallel processing of series of
 maps is the same.
 * User needs to "help" the functions and objects by providing the with
 some metadata, i.e. types and names of the maps to patch (for patching),
 because no module run metadata are available in `grass.script`.
 * Some of the execution details are lost, e.g. only last command's textual
 output is preserved.
 * `GridModule` uses separate mapsets for individual tiles, this uses
 `WIND_OVERRIDE`.
 * `GridModule` uses PyGRASS ctypes wrappers for patching, this
 (potentially huge) expression `r.mapcalc` and `r3.mapcalc`.
 * The interface is like `grass.script`, not like `grass.pygrass.modules`.
 * It is not complete.
 * The implementation is now 300 lines. `MultiModule` alone has 200.
 * It uses from multiprocessing `Pool.map_async` function (which may be the
 cause of problem with interrupting).

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-26 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by huhabla):

 I think we can adapt your implementation to use the
 MultiModule/ParallelModuleQueue approach. The MultiModule class supports
 the execution of a stack of any GRASS modules in a temporary region
 environment. Hence instead of implementing a different module executor,
 you can use pygrass Module and MultiModule to define the processing. Use
 the ParallelModuleQueue to run the stacks in parallel. You have access to
 all executed modules and can investigate errors, stdout, stderr and
 input/output options.

 Hence the TiledWorkflow class would accept MutliModule objects and will
 use the ParallelModuleQueue internally to run the module stacks in
 parallel. What do you think?

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-26 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--

Comment (by mlennert):

 Wow, this looks great !

 Could you just explain the relation / difference between this and the
 GridModule in pygrass ?

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-25 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+--
  Reporter:  wenzeslaus   |  Owner:  grass-dev@…
  Type:  enhancement  | Status:  new
  Priority:  normal   |  Milestone:  7.4.0
 Component:  Python   |Version:  unspecified
Resolution:   |   Keywords:  script, parallel
   CPU:  Unspecified  |   Platform:  Unspecified
--+--
Changes (by wenzeslaus):

 * Attachment "parallel.py" added.

 Prototype of parallelization with tiling for grass.script

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

[GRASS-dev] [GRASS GIS] #3166: Parallelization with tiling for grass.script

2016-09-25 Thread GRASS GIS
#3166: Parallelization with tiling for grass.script
--+-
 Reporter:  wenzeslaus|  Owner:  grass-dev@…
 Type:  enhancement   | Status:  new
 Priority:  normal|  Milestone:  7.4.0
Component:  Python|Version:  unspecified
 Keywords:  script, parallel  |CPU:  Unspecified
 Platform:  Unspecified   |
--+-
 At the same time as r69507, I was working on a simpler approach based on
 `grass.script`. At this point, it can do tiling and patching of rasters
 and partially 3D rasters. Series of commands is executed on each tile.
 This and also the patching runs in parallel. The syntax is very similar to
 `grass.script` with a convenient function for tile naming (which is
 partially a user responsibility).

 Here is an tiling example:

 {{{
 #!python
 # this is the control object
 tiled_workflow = TiledWorkflow(nprocs=4, width=500, height=500,
 overlap=10)
 for namer, workflow in tiled_workflow:
 slope = namer.name('raster', 'slope')
 aspect = namer.name('raster','aspect')

 # now do all as usually, workflow is equivalent of `grass.script`
 workflow.run_command('r.slope.aspect', elevation='fractal_surface',
  slope=slope)
 workflow.run_command('r.slope.aspect', elevation='fractal_surface',
  aspect=aspect)
 workflow.parse_command('g.region', flags='pg')

 # nothing was actually done till now
 # do the parallel processing and patching
 results = tiled_workflow.execute()

 # iterate over the results (here from g.region)
 for result in results:
 for key, value in result.iteritems():
 print key, value
 }}}

 Example using much smaller portion of the API. Creates list of modules
 which are then executed in the background. When import of the `parallel`
 module fails, `grass.script` is used instead without any changed in the
 main part of the code.

 {{{
 #!python
 try:
 from grass.script.parallel import ModuleCallList, execute_by_module
 call = ModuleCallList()
 parallel = True
 except ImportError:
 call = gs  # fall back to grass.script
 parallel = False
 for i in range(map_min, map_max + 1):
 call.mapcalc(expr, num=i)
 if parallel:
 execute_by_module(call, nprocs=4)
 }}}

 The current code uses `r.mapcalc` for patching and PyGRASS code computing
 the tiles. One of the main issues with the current code is that it does
 not finish when there is an error in the executed module.

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev