Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

2014-03-25 Thread matt . newville
ChrisA -

 I wasn't really asking is multiprocessing appropriate? but whether
 there was a cleaner way to subclass multiprocessing.BaseManager() to 
 use a subclass of Process().  I can believe the answer is No, but 
 thought I'd ask.
 
 I've never subclassed BaseManager like this. It might be simpler to
 spin off one or more workers and not have them do any network
 communication at all; that way, you don't need to worry about the
 cache. Set up a process tree with one at the top doing only networking
 and process management (so it's always fast), and then use a
 multiprocessing.Queue or somesuch to pass info to a subprocess and
 back. Then your global connection state is all stored within the top
 process, and none of the others need care about it. You might have a
 bit of extra effort to pass info back to the parent rather than simply
 writing it to the connection, but that's a common requirement in other
 areas (eg GUI handling - it's common to push all GUI manipulation onto
 the main thread), so it's a common enough model.
 
 But if subclassing and tweaking is the easiest way, and if you don't
 mind your solution being potentially fragile (which subclassing like
 that is), then you could look into monkey-patching Process. Inject
 your code into it and then use the original. It's not perfect, but it
 may turn out easier than the subclass everything technique.
 
 ChrisA

Thanks, I agree that restricting network communications to a parent process 
would be a good recommended solution, but it's hard to enforce and easy to 
forget such a recommendation.  It seems better to provide lightweight 
library-specific subclasses of Process (and Pool) and explaining why they 
should be used.  This library (pyepics) already does similar things for 
interaction with other libraries (notably providing decorators to avoid issues 
with wxPython). 

Monkey-patching multiprocessing.Process seems more fragile than subclassing it. 
 It turned out that multiprocessing.pool.Pool was also very easy to subclass.  
But cleanly subclassing the Managers in multiprocessing.managers look much 
harder.  I'm not sure if this is intentional or not, or if it should be filed 
as an issue for multiprocessing.   For now, I'm willing to say that the 
multiprocessing managers are not yet available with the pyepics library.

Thanks again,

--Matt

-- 
https://mail.python.org/mailman/listinfo/python-list


advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

2014-03-24 Thread Matt Newville
I'm maintaining a python interface to a C library for a distributed
control system (EPICS, sort of a SCADA system) that does a large
amount of relatively light-weight network I/O.   In order to keep many
connections open and responsive, and to provide a simple interface,
the python library keeps a global store of connection state.

This works well for single processes and threads, but not so well for
multiprocessing, where the global state causes trouble.  The issue is
not too difficult to work around (ie, completely clear any such global
cache and insist that new connections be established in each process),
but easy to forget.

To make this easier, I have a function to clear the global cache,

def clear_my_cache():
 # empty global variables caching network connections
 return

and then subclass of multiprocessing.Process like:

class MyProcess(multiprocessing.Process):
def __init__(self, **kws):
multiprocessing.Process.__init__(self, **kws)

def run(self):
clear_my_cache()
mp.Process.run(self)

This works fine.  I can subclass multiprocessing.pool.Pool too, as it
uses Process as a class variable (removing doc strings):

class Pool(object):
Process = Process

def __init__(self, processes=None, initializer=None, initargs=(),
 maxtasksperchild=None):

and then uses self.Process in its Pool._repopulate_pool().  That makes
subclassing Pool is as easy as (removing doc strings):

class MyPool(multiprocssing.pool.Pool):
def __init__(self, **kws):
self.Process = MyProcess
mp_pool.Pool.__init__(self, **kws)

I'm very pleased to need so little code here!  But, I run into trouble
when I try to subclass any of the Managers().  It looks like I would
have to make a nearly-identical copy of ~30 lines of
BaseManager.start() as it calls multiprocessing.Process() to create
processes there.   In addition, it looks like subclassing
multiprocessing.managers.SyncManager would mean making a
near-identical copy of a similar amount of code.

I'd be willing to do this, but it seems like a bad idea  -- I much
prefer overwriting self.Process as for Pool.

Does anyone have any advice for the best approach here?  Should, like
Pool,  BaseManager also use a class variable (Process = Process)?

Thanks in advance for any advice.

--Matt
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

2014-03-24 Thread matt . newville
On Monday, March 24, 2014 7:19:56 PM UTC-5, Chris Angelico wrote:
 On Tue, Mar 25, 2014 at 7:24 AM, Matt Newville
 
  I'm maintaining a python interface to a C library for a distributed
  control system (EPICS, sort of a SCADA system) that does a large
  amount of relatively light-weight network I/O.   In order to keep many
  connections open and responsive, and to provide a simple interface,
  the python library keeps a global store of connection state.
 
  This works well for single processes and threads, but not so well for
  multiprocessing, where the global state causes trouble.
 
 
 From the sound of things, a single process is probably what you want
 here. Is there something you can't handle with one process?

Thanks for the reply.  I find that appreciation is greatly (perhaps infinitely) 
delayed whenever I reply X is probably not what you want to do without 
further explanation to a question of can I get some advice on how to do X?. 
So, I do thank you for your willingness to reply, even such a 
guaranteed-to-be-under-appreciated reply. 

There are indeed operations that can't be handled with a single process, such 
as simultaneously using multiple cores.  This is why we want to use 
multiprocessing instead of (or, in addition to) threading.  We're trying to do 
real-time collection of scientific data from a variety of data sources, 
generally within a LAN. The data can get largish and fast, and intermediate 
processing occasionally requires non-trivial computation time.  So being able 
to launch worker processes that can run independently on separate cores would 
be very helpful.  Ideally, we'd like to let sub-processes make calls to the 
control system too, say, read new data.

I wasn't really asking is multiprocessing appropriate? but whether there was 
a cleaner way to subclass multiprocessing.BaseManager() to use a subclass of 
Process().  I can believe the answer is No, but thought I'd ask.

Thanks again,

--Matt
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about ast.literal_eval

2013-05-20 Thread matt . newville
On Monday, May 20, 2013 2:05:48 AM UTC-5, Frank Millman wrote:
 Hi all



 I am trying to emulate a SQL check constraint in Python. Quoting from

 the PostgreSQL docs, A check constraint is the most generic constraint

 type. It allows you to specify that the value in a certain column must

 satisfy a Boolean (truth-value) expression.



 The problem is that I want to store the constraint as a string, and I

 was hoping to use ast.literal_eval to evaluate it, but it does not work.



   x = 'abc'

   x in ('abc', xyz')

 True

   b = x in ('abc', 'xyz')

   eval(b)

 True

   from ast import literal_eval

   literal_eval(b)

 ValueError: malformed node or string: _ast.Compare object at ...



 Is there a safe way to do what I want? I am using python 3.3.



 Thanks



 Frank Millman

You might find the asteval module (https://pypi.python.org/pypi/asteval) 
useful.   It provides a relatively safe eval, for example:

 import asteval
 a = asteval.Interpreter()
 a.eval('x = abc')
 a.eval('x in (abc, xyz)')
True
 a.eval('import os')
NotImplementedError
   import os
'Import' not supported
 a.eval('__import__(os)')
NameError
   __import__(os)
name '__import__' is not defined

This works by maintaining an internal namespace (a flat dictionary), and 
walking the AST generated for the expression.  It supports most Python syntax, 
including if, for, while, and try/except blocks, and function definitions, and 
with the notable exceptions of eval, exec, class, lambda, yield, and import.   
This requires Python2.6 and higher, and does work with Python3.3.

Of course, it is not guaranteed to be completely safe, but it does disallow 
imports, which seems like the biggest vulnerability concern listed here.  
Currently, there is no explicit protection against long-running calculations 
for denial of service attacks.  If you're exposing an SQL database to 
user-generated code, that may be worth considering.

Cheers,

--Matt Newville
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Matplotlib Slider Widget and changing colorbar threshold

2013-03-13 Thread matt . newville
On Tuesday, March 12, 2013 9:06:20 AM UTC-5, kevin@gmail.com wrote:
 I am currently trying to work on a program that will allow the user to 
 display their dataset in the form of a colormap and through the use of 
 sliders, it will also allow the user to adjust the threshold of the colormap 
 and thus update the colormap accordingly.  The best to describe this would be 
 through the use of a picture:  ![enter image description here][1]
 
 
 
 
 
   [1]: http://i.stack.imgur.com/1T9Qp.png
 
 
 
 
 
 This image shows how the colorbar should look before (the image on the left) 
 and after (the image on the right) the adjustment.  As the threshold values 
 of the colrobar are changed, the colormap would be updated accordingly.
 
 
 
 Now I am mainly using matplotlib and I found that matplotlib does support 
 some widgets, such as a slider.  However the area I need help in is devising 
 a piece of code which will update the colorbar and colormap (like the way 
 shown in the picture above) when the slider is adjusted.  I was wondering if 
 anyone has done this before and might have a piece of code they would be 
 willing to share and might have pointers as to how this can be achieved.

An example using wxPython and matplotlib:
   http://newville.github.com/wxmplot/imagepanel.html#examples-and-screenshots

might be close to what you're looking for.  This does allow the user to 
re-scale the intensity of the colorbar and the corresponding map, as well as 
change colormaps, smoothing, and explicitly set intensity ranges.  This image 
display frame might cover most of what you need -- if not, I'd be interested to 
hear what else might be useful.   

The specific code building the control is at 
https://github.com/newville/wxmplot/blob/master/lib/imageframe.py

Hope that helps,

--Matt Newville
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Yet another attempt at a safe eval() call

2013-01-05 Thread matt . newville
On Saturday, January 5, 2013 8:17:16 AM UTC-8, Oscar Benjamin wrote:
 On 5 January 2013 16:01, Chris Angelico ros...@gmail.com wrote:
 
  On Sun, Jan 6, 2013 at 2:56 AM, Oscar Benjamin
 
  oscar.j.benja...@gmail.com wrote:
 
  On 4 January 2013 15:53, Grant Edwards invalid@invalid.invalid wrote:
 
  On 2013-01-04, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info 
  wrote:
 
  On Thu, 03 Jan 2013 23:25:51 +, Grant Edwards wrote:
 
 
 
  * But frankly, you should avoid eval, and write your own mini-integer
 
arithmetic evaluator which avoids even the most remote possibility
 
of exploit.
 
 
 
  That's obviously the right thing to do.  I suppose I should figure
 
  out how to use the ast module.
 
 
 
  Someone has already created a module that does this called numexpr. Is
 
  there some reason why you don't want to use that?
 
 
 
  import numexpr
 
  numexpr.evaluate('2+4*5')
 
  array(22, dtype=int32)
 
  numexpr.evaluate('2+a*5', {'a':4})
 
  array(22L)
 
 
 
  Is that from PyPI? It's not in my Python 3.3 installation. Obvious
 
  reason not to use it: Unaware of it. :)
 
 
 
 My apologies. I should have at least provided a link:
 
 http://code.google.com/p/numexpr/
 
 
 
 I installed it from the ubuntu repo under the name python-numexpr. It
 
 is also on PyPI:
 
 http://pypi.python.org/pypi/numexpr
 
 
 
 numexpr is a well established project intended primarily for memory
 
 and cache efficient computations over large arrays of data. Possibly
 
 as a side effect, it can also be used to evaluate simple algebraic
 
 expressions involving ordinary scalar variables.
 
 
 
 
 
 Oscar

The asteval module http://pypi.python.org/pypi/asteval/0.9 and
http://newville.github.com/asteval/  might be another alternative.  It's not as 
fast as numexpr, but a bit more general. It uses the ast module to compile an 
expression into the AST, then walks through that, intercepting Name nodes and 
using a flat namespace of variables.  It disallows imports and does not support 
all python constructs, but it is a fairly complete in supporting python syntax.

It makes no claim at actually being safe from malicious attack, but should be 
safer than a straight eval(), and prevent accidental problems when evaluating 
user-input as code.  If anyone can find exploits within it, I'd be happy to try 
to fix them.

--Matt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: netcdf4-python

2010-02-20 Thread Matt Newville
On Feb 20, 7:47 pm, deadpickle deadpic...@gmail.com wrote:
 I'm trying to use the python module netcdf4-python to read a netcdf
 file. So far I'm trying to do the basics and just open the script:

 from netCDF4 import Dataset
 rootgrp = Dataset('20060402-201025.netcdf', 'r',
 format='NETCDF3_CLASSIC')
 print rootgrp.file_format
 rootgrp.close()

 when I do this I get the exit code error (when I run in Scite):

 pythonw -u netcdf_test.py
 Exit code: -1073741819

  or Python stops responding (in windows cmd).

 I'm not sure what is wrong so if anyone as any ideas I would gladly
 send you the netcdf file to try. Thanks.

If the file is really of NetCDF3 format, scipy.io.netcdf should work.
Replace
  netCDF4.Dataset(filename,'r',format='NETCDF3_CLASSIC')
with
  scipy.io.netcdf.netcdf_open(filename,'r')

--Matt Newville
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Arrrrgh! Another module broken

2010-01-17 Thread Matt Newville
On Jan 17, 7:25 pm, Jive Dadson notonthe...@noisp.com wrote:
 I just found another module that broke when I went to 2.6.  Gnuplot.  
 Apparently one of its routines has a parameter
 named with.  That used to be okay, and now it's not.

This was fixed in version 1.8 of Gnuplot.py

 Once I get everything to work under 2.6, I am using it
 forever or until new releases no longer break working
 code, whichever comes first.

Hey, good luck with that forever plan.

--Matt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and GUI

2007-05-24 Thread matt . newville

 Do others think like me here?

Yes!! I agree completely: Wax is not only a fantastic idea, but a very
good start at an implementation of that idea.

--Matt Newville

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coding style and else statements

2006-08-31 Thread matt . newville
 To my eyes, that's less readable than, and has no benefit over, the
 following:

 def foo(thing):
 if thing:
 result = thing+1
 else:
 result = -1
 return result

I wouldn't discount:

def foo(thing):
result = -1
if thing:
result = thing+1
return result

especially if thing is the less expected case.  This
puts the more likely result first, followed by checks
that might modify that result.

But, I also try to avoid adding 1 to things that I test
for Truth:

 x = 8
 foo(x)
9
 foo(x1)
2
 foo(x and 1)
2
 foo(x or 1)
9

--Matt

-- 
http://mail.python.org/mailman/listinfo/python-list