Filtering XArray Datasets?

2022-06-06 Thread Israel Brewster
I have some large (>100GB) datasets loaded into memory in a two-dimensional (X 
and Y) NumPy array backed XArray dataset. At one point I want to filter the 
data using a boolean array created by performing a boolean operation on the 
dataset that is, I want to filter the dataset for all points with a longitude 
value greater than, say, 50 and less than 60, just to give an example 
(hopefully that all makes sense?).

Currently I am doing this by creating a boolean array (data[‘latitude’]>50, for 
example), and then applying that boolean array to the dataset using .where(), 
with drop=True. This appears to work, but has two issues:

1) It’s slow. On my large datasets, applying where can take several minutes 
(vs. just seconds to use a boolean array to index a similarly sized numpy array)
2) It uses large amounts of memory (which is REALLY a problem when the array is 
already using 100GB+)

What it looks like is that values corresponding to True in the boolean array 
are copied to a new XArray object, thereby potentially doubling memory usage 
until it is complete, at which point the original object can be dropped, 
thereby freeing the memory.

Is there any solution for these issues? Some way to do an in-place filtering? 
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Shapely Polygon creating empty polygon

2022-01-05 Thread Israel Brewster
Found it! Apparently, it’s an import order issue. This works:

>>> from shapely.geometry import Polygon
>>> from osgeo import osr
>>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, 
>>> 54.845450778742546]
>>> print(Polygon.from_bounds(*bounds))
POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 
54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 
54.64251856269729, -164.2963582166963 54.64251856269729))

But this doesn’t:

>>> from osgeo import osr
>>> from shapely.geometry import Polygon
>>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, 
>>> 54.845450778742546]
>>> print(Polygon.from_bounds(*bounds))
POLYGON EMPTY

…So apparently I have to make sure to import shapely *before* I import anything 
from osgeo. Why? I have no idea...
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Jan 4, 2022, at 1:57 PM, Israel Brewster  wrote:
> 
> I’m running into an issue with shapely that is baffling me. Perhaps someone 
> here can help out?
> 
> When running shapely directly from a python 3.8 interpreter, it works as 
> expected:
> 
> >>> import shapely
> >>> shapely.__version__
> '1.8.0'
> >>> from shapely.geometry import Polygon
> >>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, 
> >>> 54.845450778742546]
> >>> print(Polygon.from_bounds(*bounds))
> POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 
> 54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 
> 54.64251856269729, -164.2963582166963 54.64251856269729))
> 
> However, if I put this exact same code into my Flask app (currently running 
> under the Flask development environment) as part of handling a request, I get 
> an empty polygon:
> 
> >>>import shapely
> >>>print(shapely.__version__)
> >>>from shapely.geometry import Polygon
> >>>print(Polygon.from_bounds(*bounds))
> 
> Output:
> 
> 1.8.0
> POLYGON EMPTY
> 
> In fact, *any* attempt to create a polygon gives the same result:
> >>> test = Polygon(((1, 1), (2, 1), (2, 2)))
> >>> print(test)
> POLYGON EMPTY
> 
> What am I missing here? Why doesn’t it work as part of a Flask request call?
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory 
> Geophysical Institute - UAF 
> 2156 Koyukuk Drive 
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell:  907-328-9145
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Shapely Polygon creating empty polygon

2022-01-04 Thread Israel Brewster
I’m running into an issue with shapely that is baffling me. Perhaps someone 
here can help out?

When running shapely directly from a python 3.8 interpreter, it works as 
expected:

>>> import shapely
>>> shapely.__version__
'1.8.0'
>>> from shapely.geometry import Polygon
>>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, 
>>> 54.845450778742546]
>>> print(Polygon.from_bounds(*bounds))
POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 
54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 
54.64251856269729, -164.2963582166963 54.64251856269729))

However, if I put this exact same code into my Flask app (currently running 
under the Flask development environment) as part of handling a request, I get 
an empty polygon:

>>>import shapely
>>>print(shapely.__version__)
>>>from shapely.geometry import Polygon
>>>print(Polygon.from_bounds(*bounds))

Output:

1.8.0
POLYGON EMPTY

In fact, *any* attempt to create a polygon gives the same result:
>>> test = Polygon(((1, 1), (2, 1), (2, 2)))
>>> print(test)
POLYGON EMPTY

What am I missing here? Why doesn’t it work as part of a Flask request call?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python concurrent.futures.ProcessPoolExecutor

2020-12-16 Thread Israel Brewster
> On Dec 16, 2020, at 7:04 AM, Rob Rosengard  wrote:
> 
> Warning:  I am new to this group
> Warning:  I am not an expert at Python, I've written a few small programs, 
> and spend 20 hours of online classes, and maybe a book or two. 
> Warning:  I am new to trying to use concurrent.futures.ProcessPoolExecutor
> - Prior to writing this question I updated to Python 3.9 and PyCharm 2020.3.  
> And confirmed the problem still exists. 
> - Running on Windows 10 Professional
> - I've been trying to run a simple piece of code to exactly match what I have 
> seen done in various training videos.  By I am getting a different and 
> unexpected set of results.  I.e. the instructor got different results than I 
> did on my computer.  My code is very simple:
> 
> import concurrent.futures
> import time
> 
> 
> start = time.perf_counter()
> 
> 
> def task(myarg):
>print(f'Sleeping one second...{myarg}')
>time.sleep(1)
>return 'Done sleeping...'
> 
> 
> if __name__ == '__main__':
>with concurrent.futures.ProcessPoolExecutor() as executor:
>future1 = executor.submit(task, 1)
>future2 = executor.submit(task, 2)
> finish = time.perf_counter()
> print(f'Finished in {round(finish-start,2)} seconds')
> 
> And the output is: 
> Finished in 0.0 seconds
> Finished in 0.0 seconds
> Sleeping one second...1
> Sleeping one second...2
> Finished in 1.14 seconds
> 
> Process finished with exit code 0
> 
> --- 
> QUESTIONS and CONCERNS that I have...
> It seems that both calls to task not only runs that function, but then keeps 
> executing the rest of the main line code.  I only expected it to run the 
> function and then immediately quit/disappear.   That is, I expect the output 
> to look like (i.e. not having the three lines of "Finished in x.x seconds", 
> rather, just one line like that):
> Sleeping one second...1
> Sleeping one second...2
> Finished in 1.14 seconds
> 
> Goal:  I need the executor tasks to only run that one function, and then 
> completely go away and stop.  Not keep executing more code that doesn't 
> belong to the task function. 
> 
> I've tried many iterations of this issue, and placed PRINT statements all 
> over to try to track what is going on.  And I used if/else statements in the 
> main code, which caused even more problems.  I.e. both the IF and the ELSE 
> was executed each time through the code. Which completely blows my mind. 
> 
> Any thoughts on this?  Thanks for your time and help!  

Assuming the code above is indented exactly as you run it, you have an 
indentation error. That is, the finish and print() are not indented to be part 
of the if __name__… call. As such, they run on import. When you launch a new 
process, it imports the module, which then runs those lines, since they are not 
guarded by the if statement.

Indent those last two lines to be under the if (they don’t need to be indented 
to be under the with, just the if), and it should work as intended.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> R
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing queue sharing and python3.8

2020-04-06 Thread Israel Brewster
> On Apr 6, 2020, at 12:19 PM, David Raymond  wrote:
> 
> Attempting reply as much for my own understanding.
> 
> Are you on Mac? I think this is the pertinent bit for you:
> Changed in version 3.8: On macOS, the spawn start method is now the default. 
> The fork start method should be considered unsafe as it can lead to crashes 
> of the subprocess. See bpo-33725.

Ahhh, yep, that would do it! Using spawn rather than fork completely explains 
all the issues I was suddenly seeing. Didn’t even occur to me that the os I was 
running might make a difference. And yes, forcing it back to using fork does 
indeed “fix” the issue. Of course, as is noted there, the fork start method 
should be considered unsafe, so I guess I get to re-architect everything I do 
using multiprocessing that relies on data-sharing between processes. The Queue 
example was just a minimum working example that illustrated the behavioral 
differences I was seeing :-) Thanks for the pointer!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> When you start a new process (with the spawn method) it runs the module just 
> like it's being imported. So your global " mp_comm_queue2=mp.Queue()" creates 
> a new Queue in each process. Your initialization of mp_comm_queue is also 
> done inside the main() function, which doesn't get run in each process. So 
> each process in the Pool is going to have mp_comm_queue as None, and have its 
> own version of mp_comm_queue2. The ID being the same or different is the 
> result of one or more processes in the Pool being used repeatedly for the 
> multiple steps in imap, probably because the function that the Pool is 
> executing finishes so quickly.
> 
> Add a little extra info to the print calls (and/or set up logging to stdout 
> with the process name/id included) and you can see some of this. Here's the 
> hacked together changes I did for that.
> 
> import multiprocessing as mp
> import os
> 
> mp_comm_queue = None #Will be initalized in the main function
> mp_comm_queue2 = mp.Queue() #Test pre-initalized as well
> 
> def some_complex_function(x):
>print("proc id", os.getpid())
>print("mp_comm_queue", mp_comm_queue)
>print("queue2 id", id(mp_comm_queue2))
>mp_comm_queue2.put(x)
>print("queue size", mp_comm_queue2.qsize())
>print("x", x)
>return x * 2
> 
> def main():
>global mp_comm_queue
>#initalize the Queue
>mp_comm_queue = mp.Queue()
> 
>#Set up a pool to process a bunch of stuff in parallel
>pool = mp.Pool()
>values = range(20)
>data = pool.imap(some_complex_function, values)
> 
>for val in data:
>print(f"**{val}**")
>print("final queue2 size", mp_comm_queue2.qsize())
> 
> if __name__ == "__main__":
>main()
> 
> 
> 
> When making your own Process object and stating it then the Queue should be 
> passed into the function as an argument, yes. The error text seems to be part 
> of the Pool implementation, which I'm not as familiar with enough to know the 
> best way to handle it. (Probably something using the "initializer" and 
> "initargs" arguments for Pool)(maybe)
> 
> 
> 
> -Original Message-
> From: Python-list  <mailto:python-list-bounces+david.raymond=tomtom@python.org>> On Behalf 
> Of Israel Brewster
> Sent: Monday, April 6, 2020 1:24 PM
> To: Python mailto:python-list@python.org>>
> Subject: Multiprocessing queue sharing and python3.8
> 
> Under python 3.7 (and all previous versions I have used), the following code 
> works properly, and produces the expected output:
> 
> import multiprocessing as mp
> 
> mp_comm_queue = None #Will be initalized in the main function
> mp_comm_queue2=mp.Queue() #Test pre-initalized as well
> 
> def some_complex_function(x):
>print(id(mp_comm_queue2))
>assert(mp_comm_queue is not None)
>print(x)
>return x*2
> 
> def main():
>global mp_comm_queue
>#initalize the Queue
>mp_comm_queue=mp.Queue()
> 
>#Set up a pool to process a bunch of stuff in parallel
>pool=mp.Pool()
>values=range(20)
>data=pool.imap(some_complex_function,values)
> 
>for val in data:
>print(f"**{val}**")
> 
> if __name__=="__main__":
>main()
> 
> - mp_comm_queue2 has the same ID for all iterations of some_complex_function, 
> and the assert passes (mp_comm_queue is not None). However, under python 3.8, 
> it fails - mp_comm_queue2 is a *different* object for ea

Re: Multiprocessing queue sharing and python3.8

2020-04-06 Thread Israel Brewster
> On Apr 6, 2020, at 12:27 PM, David Raymond  wrote:
> 
> Looks like this will get what you need.
> 
> 
> def some_complex_function(x):
>global q
>#stuff using q
> 
> def pool_init(q2):
>global q
>q = q2
> 
> def main():
>#initalize the Queue
>mp_comm_queue = mp.Queue()
> 
>#Set up a pool to process a bunch of stuff in parallel
>pool = mp.Pool(initializer = pool_init, initargs = (mp_comm_queue,))
>...
> 
> 

Gotcha, thanks. I’ll look more into that initializer argument and see how I can 
leverage it to do multiprocessing using spawn rather than fork in the future. 
Looks straight-forward enough. Thanks again!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> -Original Message-
> From: David Raymond 
> Sent: Monday, April 6, 2020 4:19 PM
> To: python-list@python.org
> Subject: RE: Multiprocessing queue sharing and python3.8
> 
> Attempting reply as much for my own understanding.
> 
> Are you on Mac? I think this is the pertinent bit for you:
> Changed in version 3.8: On macOS, the spawn start method is now the default. 
> The fork start method should be considered unsafe as it can lead to crashes 
> of the subprocess. See bpo-33725.
> 
> When you start a new process (with the spawn method) it runs the module just 
> like it's being imported. So your global " mp_comm_queue2=mp.Queue()" creates 
> a new Queue in each process. Your initialization of mp_comm_queue is also 
> done inside the main() function, which doesn't get run in each process. So 
> each process in the Pool is going to have mp_comm_queue as None, and have its 
> own version of mp_comm_queue2. The ID being the same or different is the 
> result of one or more processes in the Pool being used repeatedly for the 
> multiple steps in imap, probably because the function that the Pool is 
> executing finishes so quickly.
> 
> Add a little extra info to the print calls (and/or set up logging to stdout 
> with the process name/id included) and you can see some of this. Here's the 
> hacked together changes I did for that.
> 
> import multiprocessing as mp
> import os
> 
> mp_comm_queue = None #Will be initalized in the main function
> mp_comm_queue2 = mp.Queue() #Test pre-initalized as well
> 
> def some_complex_function(x):
>print("proc id", os.getpid())
>print("mp_comm_queue", mp_comm_queue)
>print("queue2 id", id(mp_comm_queue2))
>mp_comm_queue2.put(x)
>print("queue size", mp_comm_queue2.qsize())
>print("x", x)
>return x * 2
> 
> def main():
>global mp_comm_queue
>#initalize the Queue
>mp_comm_queue = mp.Queue()
> 
>#Set up a pool to process a bunch of stuff in parallel
>pool = mp.Pool()
>values = range(20)
>data = pool.imap(some_complex_function, values)
> 
>for val in data:
>print(f"**{val}**")
>print("final queue2 size", mp_comm_queue2.qsize())
> 
> if __name__ == "__main__":
>main()
> 
> 
> 
> When making your own Process object and stating it then the Queue should be 
> passed into the function as an argument, yes. The error text seems to be part 
> of the Pool implementation, which I'm not as familiar with enough to know the 
> best way to handle it. (Probably something using the "initializer" and 
> "initargs" arguments for Pool)(maybe)
> 
> 
> 
> -Original Message-
> From: Python-list  
> On Behalf Of Israel Brewster
> Sent: Monday, April 6, 2020 1:24 PM
> To: Python 
> Subject: Multiprocessing queue sharing and python3.8
> 
> Under python 3.7 (and all previous versions I have used), the following code 
> works properly, and produces the expected output:
> 
> import multiprocessing as mp
> 
> mp_comm_queue = None #Will be initalized in the main function
> mp_comm_queue2=mp.Queue() #Test pre-initalized as well
> 
> def some_complex_function(x):
>print(id(mp_comm_queue2))
>assert(mp_comm_queue is not None)
>print(x)
>return x*2
> 
> def main():
>global mp_comm_queue
>#initalize the Queue
>mp_comm_queue=mp.Queue()
> 
>#Set up a pool to process a bunch of stuff in parallel
>pool=mp.Pool()
>values=range(20)
>data=pool.imap(some_complex_function,values)
> 
>for val in data:
>print(f"**{val}**")
> 
> if __name__=="__main__":
>main()
> 
> - mp_comm_queue2 has the same ID for all iterations of some_compl

Multiprocessing queue sharing and python3.8

2020-04-06 Thread Israel Brewster
Under python 3.7 (and all previous versions I have used), the following code 
works properly, and produces the expected output:

import multiprocessing as mp

mp_comm_queue = None #Will be initalized in the main function
mp_comm_queue2=mp.Queue() #Test pre-initalized as well

def some_complex_function(x):
print(id(mp_comm_queue2))
assert(mp_comm_queue is not None)
print(x)
return x*2

def main():
global mp_comm_queue
#initalize the Queue
mp_comm_queue=mp.Queue()

#Set up a pool to process a bunch of stuff in parallel
pool=mp.Pool()
values=range(20)
data=pool.imap(some_complex_function,values)

for val in data:
print(f"**{val}**")

if __name__=="__main__":
main()

- mp_comm_queue2 has the same ID for all iterations of some_complex_function, 
and the assert passes (mp_comm_queue is not None). However, under python 3.8, 
it fails - mp_comm_queue2 is a *different* object for each iteration, and the 
assert fails. 

So what am I doing wrong with the above example block? Assuming that it broke 
in 3.8 because I wasn’t sharing the Queue properly, what is the proper way to 
share a Queue object among multiple processes for the purposes of inter-process 
communication?

The documentation 
(https://docs.python.org/3.8/library/multiprocessing.html#exchanging-objects-between-processes
 
<https://docs.python.org/3.8/library/multiprocessing.html#exchanging-objects-between-processes>)
 appears to indicate that I should pass the queue as an argument to the 
function to be executed in parallel, however that fails as well (on ALL 
versions of python I have tried) with the error:

Traceback (most recent call last):
  File "test_multi.py", line 32, in 
main()
  File "test_multi.py", line 28, in main
for val in data:
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py",
 line 748, in next
raise value
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py",
 line 431, in _handle_tasks
put(task)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py",
 line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/reduction.py",
 line 51, in dumps
cls(buf, protocol).dump(obj)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/queues.py",
 line 58, in __getstate__
context.assert_spawning(self)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py",
 line 356, in assert_spawning
' through inheritance' % type(obj).__name__
RuntimeError: Queue objects should only be shared between processes through 
inheritance

after I add the following to the code to try passing the queue rather than 
having it global:

#Try by passing queue
values=[(x,mp_comm_queue) for x in range(20)]
data=pool.imap(some_complex_function,values)
for val in data:
print(f"**{val}**")   

So if I can’t pass it as an argument, and having it global is incorrect (at 
least starting with 3.8), what is the proper method of getting multiprocessing 
queues to child processes?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Multiprocessing, join(), and crashed processes

2020-02-05 Thread Israel Brewster
In a number of places I have constructs where I launch several processes using 
the multiprocessing library, then loop through said processes calling join() on 
each one to wait until they are all complete. In general, this works well, with 
the *apparent* exception of if something causes one of the child processes to 
crash (not throw an exception, actually crash). In that event, it appears that 
the call to join() hangs indefinitely. How can I best handle this? Should I put 
a timeout on the join, and put it in a loop, such that every 5 seconds or so it 
breaks, checks to see if the process is still actually running, and if so goes 
back and calls join again? Or is there a better option to say “wait until this 
process is done, however long that may be, unless it crashes”?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Make warning an exception?

2019-12-06 Thread Israel Brewster
I was running some code and I saw this pop up in the console:

2019-12-06 11:53:54.087 Python[85524:39651849] WARNING: nextEventMatchingMask 
should only be called from the Main Thread! This will throw an exception in the 
future.

The only problem is, I have no idea what is generating that warning - I never 
call nextEventMatchingMask directly, so it must be getting called from one of 
the libraries I’m calling. Is there some way I can force python to throw an 
exception now, so my debugger can catch it and let me know where in my code the 
originating call is? I’ve tried stepping through the obvious options, with no 
luck so far.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Proper way to pass Queue to process when using multiprocessing.imap()?

2019-09-04 Thread Israel Brewster
> 
> On Sep 3, 2019, at 11:09 AM, Israel Brewster  wrote:
> 
>> 
>> On Sep 3, 2019, at 10:49 AM, Peter Otten <__pete...@web.de> wrote:
>> 
>> Israel Brewster wrote:
>> 
>>> When using pool.imap to apply a function over a list of values, what is
>>> the proper way to pass additional arguments to the function, specifically
>>> in my case a Queue that the process can use to communicate back to the
>>> main thread (for the purpose of reporting progress)? I have seen
>>> suggestions of using starmap, but this doesn’t appear to have a “lazy”
>>> variant, which I have found to be very beneficial in my use case. The
>>> Queue is the same one for all processes, if that makes a difference.
>>> 
>>> I could just make the Queue global, but I have always been told not too.
>>> Perhaps this is an exception?
>> 
>> How about wrapping the function into another function that takes only one 
>> argument? A concise way is to do that with functools.partial():
>> 
>> def f(value, queue): ...
>> 
>> pool.imap(partial(f, queue=...), values)
> 
> That looks like exactly what I was looking for. I’ll give it a shot. Thanks!

So as it turns out, this doesn’t work after all. I get an error stating that 
“Queue objects should only be shared between processes through inheritance”. 
Still a good technique to know though!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory 
> Geophysical Institute - UAF 
> 2156 Koyukuk Drive 
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell:  907-328-9145
> 
>> 
>> 
>> 
>>> 
>>> ---
>>> Israel Brewster
>>> Software Engineer
>>> Alaska Volcano Observatory
>>> Geophysical Institute - UAF
>>> 2156 Koyukuk Drive
>>> Fairbanks AK 99775-7320
>>> Work: 907-474-5172
>>> cell:  907-328-9145
>>> 
>> 
>> 
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list 
>> <https://mail.python.org/mailman/listinfo/python-list>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Proper way to pass Queue to process when using multiprocessing.imap()?

2019-09-03 Thread Israel Brewster
> 
> On Sep 3, 2019, at 9:27 AM, Rob Gaddi  
> wrote:
> 
> On 9/3/19 10:17 AM, Israel Brewster wrote:
>> When using pool.imap to apply a function over a list of values, what is the 
>> proper way to pass additional arguments to the function, specifically in my 
>> case a Queue that the process can use to communicate back to the main thread 
>> (for the purpose of reporting progress)? I have seen suggestions of using 
>> starmap, but this doesn’t appear to have a “lazy” variant, which I have 
>> found to be very beneficial in my use case. The Queue is the same one for 
>> all processes, if that makes a difference.
>> I could just make the Queue global, but I have always been told not too. 
>> Perhaps this is an exception?
>>  ---
>> Israel Brewster
>> Software Engineer
>> Alaska Volcano Observatory
>> Geophysical Institute - UAF
>> 2156 Koyukuk Drive
>> Fairbanks AK 99775-7320
>> Work: 907-474-5172
>> cell:  907-328-9145
> 
> The first rule is to never use global variables.  The second is to never put 
> too much stock in sweeping generalizations.  So long as you can keep that 
> Queue's usage pattern fairly well constrained, go ahead and make it global.
> 
> One thing to think about that might make this all easier though; have you 
> looked at the concurrent.futures module?  I find it does a fantastic job of 
> handling this sort of parallelization in a straightforward way.

I’ve only briefly looked at it in other situations. I’ll go ahead and take 
another look for this one. Thanks for the suggestion!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> -- 
> Rob Gaddi, Highland Technology -- www.highlandtechnology.com
> Email address domain is currently out of order.  See above to fix.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Proper way to pass Queue to process when using multiprocessing.imap()?

2019-09-03 Thread Israel Brewster
> 
> On Sep 3, 2019, at 10:49 AM, Peter Otten <__pete...@web.de> wrote:
> 
> Israel Brewster wrote:
> 
>> When using pool.imap to apply a function over a list of values, what is
>> the proper way to pass additional arguments to the function, specifically
>> in my case a Queue that the process can use to communicate back to the
>> main thread (for the purpose of reporting progress)? I have seen
>> suggestions of using starmap, but this doesn’t appear to have a “lazy”
>> variant, which I have found to be very beneficial in my use case. The
>> Queue is the same one for all processes, if that makes a difference.
>> 
>> I could just make the Queue global, but I have always been told not too.
>> Perhaps this is an exception?
> 
> How about wrapping the function into another function that takes only one 
> argument? A concise way is to do that with functools.partial():
> 
> def f(value, queue): ...
> 
> pool.imap(partial(f, queue=...), values)

That looks like exactly what I was looking for. I’ll give it a shot. Thanks!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> 
> 
>> 
>> ---
>> Israel Brewster
>> Software Engineer
>> Alaska Volcano Observatory
>> Geophysical Institute - UAF
>> 2156 Koyukuk Drive
>> Fairbanks AK 99775-7320
>> Work: 907-474-5172
>> cell:  907-328-9145
>> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Proper way to pass Queue to process when using multiprocessing.imap()?

2019-09-03 Thread Israel Brewster
When using pool.imap to apply a function over a list of values, what is the 
proper way to pass additional arguments to the function, specifically in my 
case a Queue that the process can use to communicate back to the main thread 
(for the purpose of reporting progress)? I have seen suggestions of using 
starmap, but this doesn’t appear to have a “lazy” variant, which I have found 
to be very beneficial in my use case. The Queue is the same one for all 
processes, if that makes a difference.

I could just make the Queue global, but I have always been told not too. 
Perhaps this is an exception?
 
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Multiprocessing and memory management

2019-07-03 Thread Israel Brewster
I have a script that benefits greatly from multiprocessing (it’s generating a 
bunch of images from data). Of course, as expected each process uses a chunk of 
memory, and the more processes there are, the more memory used. The amount used 
per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, 
depending on the amount of data being processed (usually closer to 10GB, the 
40/50 is fairly rare). This puts me in a position of needing to balance the 
number of processes with memory usage, such that I maximize resource 
utilization (running one process at a time would simply take WAY to long) while 
not overloading RAM (which at best would slow things down due to swap). 

Obviously this process will be run on a machine with lots of RAM, but as I 
don’t know how large the datasets that will be fed to it are, I wanted to see 
if I could build some intelligence into the program such that it doesn’t 
overload the memory. A couple of approaches I thought of:

1) Determine the total amount of RAM in the machine (how?), assume an average 
of 10GB per process, and only launch as many processes as calculated to fit. 
Easy, but would run the risk of under-utilizing the processing capabilities and 
taking longer to run if most of the processes were using significantly less 
than 10GB

2) Somehow monitor the memory usage of the various processes, and if one 
process needs a lot, pause the others until that one is complete. Of course, 
I’m not sure if this is even possible.

3) Other approaches?


---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: uWISGI with Qt for Python

2019-03-13 Thread Israel Brewster
Never mind this request. I realized that for what I am doing, the web server 
was unnecessary. I could just load local HTML files directly into the 
QWebEngineView with no need of an intermediate server. Thanks anyway, and sorry 
for the noise!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Mar 13, 2019, at 1:42 PM, Israel Brewster  wrote:
> 
> I’m working on a Qt for python app that needs to run a local web server. For 
> the web server portion I’m using flask and uWISGI, and at the moment I have 
> my application launching uWISGI using subprocess before firing off the Qt 
> QApplication instance and entering the Qt event loop. Some sample code to 
> illustrate the process:
> 
> If __name__ ==“__main__”:
> CUR_DIRECTORY = os.path.dirname(__file__)
> 
> UWSGI_CONFIG = os.path.realpath(os.path.join(CUR_DIRECTORY, 'Other 
> Files/TROPOMI.ini'))
> UWSGI_EXE = os.path.realpath(os.path.join(CUR_DIRECTORY, 'bin/uwsgi'))
> uwsgi_proc = subprocess.Popen([UWSGI_EXE, UWSGI_CONFIG])
> 
> qt_app = QApplication(sys.argv)
> ….
> res = qt_app.exec_()
> 
> 
> Now this works, but it strikes me as kinda kludgy, as the uWISGI is 
> effectively a separate application needed. More to the point, however, it’s a 
> bit fragile, in that if the main application crashes (really, ANY sort of 
> unclean exit), you get stray uWISGI processes hanging around that prevent 
> proper functioning of the app the next time you try to launch it. 
> Unfortunately as the app is still in early days, this happens occasionally. 
> So I have two questions:
> 
> 1) Is there a “better way”? This GitHub repo: 
> https://github.com/unbit/uwsgi-qtloop <https://github.com/unbit/uwsgi-qtloop> 
> seems to indicate that it should be possible to run a Qt event loop from 
> within a uWSGI app, thus eliminating the extra “subprocess” spinoff, but it 
> hasn’t been updated in 5 years and I have been unable to get it to work with 
> my current Qt/Python/OS setup
> 
> 2) Baring any “better way”, is there a way to at least ensure that the 
> subprocess is killed in the event of parent death, or alternately to look for 
> and kill any such lingering processes on application startup?
> 
> P.S. The purpose of running the web server is to be able to load and use 
> Plotly charts in my app (via a QWebEngineView). So a “better way” may be 
> using a different plotting library that can essentially “cut out” the middle 
> man. I’ve tried Matplotlib, but I found its performance to be worse than 
> Plotly - given the size of my data sets, performance matters. Also I had some 
> glitches with it when using a lasso selector (plot going black). Still, with 
> some work, it may be an option.
> 
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory 
> Geophysical Institute - UAF 
> 2156 Koyukuk Drive 
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell:  907-328-9145
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


uWISGI with Qt for Python

2019-03-13 Thread Israel Brewster
I’m working on a Qt for python app that needs to run a local web server. For 
the web server portion I’m using flask and uWISGI, and at the moment I have my 
application launching uWISGI using subprocess before firing off the Qt 
QApplication instance and entering the Qt event loop. Some sample code to 
illustrate the process:

If __name__ ==“__main__”:
CUR_DIRECTORY = os.path.dirname(__file__)

UWSGI_CONFIG = os.path.realpath(os.path.join(CUR_DIRECTORY, 'Other 
Files/TROPOMI.ini'))
UWSGI_EXE = os.path.realpath(os.path.join(CUR_DIRECTORY, 'bin/uwsgi'))
uwsgi_proc = subprocess.Popen([UWSGI_EXE, UWSGI_CONFIG])

qt_app = QApplication(sys.argv)
….
res = qt_app.exec_()


Now this works, but it strikes me as kinda kludgy, as the uWISGI is effectively 
a separate application needed. More to the point, however, it’s a bit fragile, 
in that if the main application crashes (really, ANY sort of unclean exit), you 
get stray uWISGI processes hanging around that prevent proper functioning of 
the app the next time you try to launch it. Unfortunately as the app is still 
in early days, this happens occasionally. So I have two questions:

1) Is there a “better way”? This GitHub repo: 
https://github.com/unbit/uwsgi-qtloop seems to indicate that it should be 
possible to run a Qt event loop from within a uWSGI app, thus eliminating the 
extra “subprocess” spinoff, but it hasn’t been updated in 5 years and I have 
been unable to get it to work with my current Qt/Python/OS setup

2) Baring any “better way”, is there a way to at least ensure that the 
subprocess is killed in the event of parent death, or alternately to look for 
and kill any such lingering processes on application startup?

P.S. The purpose of running the web server is to be able to load and use Plotly 
charts in my app (via a QWebEngineView). So a “better way” may be using a 
different plotting library that can essentially “cut out” the middle man. I’ve 
tried Matplotlib, but I found its performance to be worse than Plotly - given 
the size of my data sets, performance matters. Also I had some glitches with it 
when using a lasso selector (plot going black). Still, with some work, it may 
be an option.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-21 Thread Israel Brewster
Actually not a ’toy example’ at all. It is simply the first step in gridding 
some data I am working with - a problem that is solved by tools like SatPy, but 
unfortunately I can’t use SatPy because it doesn’t recognize my file format, 
and you can’t load data directly. Writing a custom file importer for SatPy is 
probably my next step.

That said, the entire process took around 60 seconds to run. As this step was 
taking 10, I figured it would be low-hanging fruit for speeding up the process. 
Obviously I was wrong. For what it’s worth, I did manage to re-factor the code, 
so instead of generating the entire grid up-front, I generate the boxes as 
needed to calculate the overlap with the data grid. This brought the processing 
time down to around 40 seconds, so a definite improvement there.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Feb 20, 2019, at 4:30 PM, DL Neil  wrote:
> 
> George
> 
> On 21/02/19 1:15 PM, george trojan wrote:
>> def create_box(x_y):
>> return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)
>> x_range = range(1, 1001)
>> y_range = range(1, 801)
>> x_y_range = list(itertools.product(x_range, y_range))
>> grid = list(map(create_box, x_y_range))
>> Which creates and populates an 800x1000 “grid” (represented as a flat list
>> at this point) of “boxes”, where a box is a shapely.geometry.box(). This
>> takes about 10 seconds to run.
>> Looking at this, I am thinking it would lend itself well to
>> parallelization. Since the box at each “coordinate" is independent of all
>> others, it seems I should be able to simply split the list up into chunks
>> and process each chunk in parallel on a separate core. To that end, I
>> created a multiprocessing pool:
> 
> 
> I recall a similar discussion when folk were being encouraged to move away 
> from monolithic and straight-line processing to modular functions - it is 
> more (CPU-time) efficient to run in a straight line; than it is to repeatedly 
> call, set-up, execute, and return-from a function or sub-routine! ie there is 
> an over-head to many/all constructs!
> 
> Isn't the 'problem' that it is a 'toy example'? That the amount of computing 
> within each parallel process is small in relation to the inherent 'overhead'.
> 
> Thus, if the code performed a reasonable analytical task within each box 
> after it had been defined (increased CPU load), would you then notice the 
> expected difference between the single- and multi-process implementations?
> 
> 
> 
> From AKL to AK
> -- 
> Regards =dn
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-18 Thread Israel Brewster


> On Feb 18, 2019, at 6:37 PM, Ben Finney  wrote:
> 
> I don't have anything to add regarding your experiments with
> multiprocessing, but:
> 
> Israel Brewster  writes:
> 
>> Which creates and populates an 800x1000 “grid” (represented as a flat
>> list at this point) of “boxes”, where a box is a
>> shapely.geometry.box(). This takes about 10 seconds to run.
> 
> This seems like the kind of task NumPy http://www.numpy.org/> is
> designed to address: Generating and manipulating large-to-huge arrays of
> numbers, especially numbers that are representable directly in the
> machine's basic number types (such as moderate-scale integers).
> 
> Have you tried using that library and timing the result?

Sort of. I am using that library, and in fact once I get the result I am 
converting it to a NumPy array for further use/processing, however I am still a 
NumPy newbie and have not been able to find a function that generates a numpy 
array from a function. There is the numpy.fromfunction() command, of course, 
but “…the function is called with … each parameter representing the coordinates 
of the array varying along a specific axis…”, which basically means (if my 
understanding/inital testing is correct) that my function would need to work 
with *arrays* of x,y coordinates. But the geometry.box() function needs 
individual x,y coordinates, not arrays, so I’d have to loop through the arrays 
and append to a new one or something to produce the output that numpy needs, 
which puts me back pretty much to the same code I already have.

There may be a way to make it work, but so far I haven’t been able to figure it 
out any better than the code I’ve got followed by converting to a numpy array. 
You do bring up a good point though: there is quite possibly a better way to do 
this, and knowing that would be just as good as knowing why multiprocessing 
doesn’t improve performance. Thanks!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> -- 
> \ “You don't need a book of any description to help you have some |
>  `\kind of moral awareness.” —Dr. Francesca Stavrakoloulou, bible |
> _o__)  scholar, 2011-05-08 |
> Ben Finney
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Multiprocessing performance question

2019-02-18 Thread Israel Brewster
I have the following code running in python 3.7:

def create_box(x_y):
return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)

x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))

grid = list(map(create_box, x_y_range))

Which creates and populates an 800x1000 “grid” (represented as a flat list at 
this point) of “boxes”, where a box is a shapely.geometry.box(). This takes 
about 10 seconds to run.

Looking at this, I am thinking it would lend itself well to parallelization. 
Since the box at each “coordinate" is independent of all others, it seems I 
should be able to simply split the list up into chunks and process each chunk 
in parallel on a separate core. To that end, I created a multiprocessing pool:

pool = multiprocessing.Pool()

And then called pool.map() rather than just “map”. Somewhat to my surprise, the 
execution time was virtually identical. Given the simplicity of my code, and 
the presumable ease with which it should be able to be parallelized, what could 
explain why the performance did not improve at all when moving from the 
single-process map() to the multiprocess map()?

I am aware that in python3, the map function doesn’t actually produce a result 
until needed, but that’s why I wrapped everything in calls to list(), at least 
for testing.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: AssertionError without traceback?

2019-01-17 Thread Israel Brewster
> On Jan 14, 2019, at 10:40 PM, dieter  wrote:
> 
> Israel Brewster  writes:
>> I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi 
>> 2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in 
>> /tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as 
>> various other third-party modules, all installed via pip in a virtualenv. 
>> The environment was set up using pip just a couple of days ago, so 
>> everything should be fully up-to-date. The application *appears* to be 
>> running properly (it is in moderate use and there have been no reports of 
>> issues, nor has my testing turned up any problems), however I keep getting 
>> entries like the following in the error log:
>> 
>> AssertionError
>> 2019-01-14T19:16:32Z  failed with 
>> AssertionError
> 
> I would try to find out where the log message above has been generated
> and ensure it does not only log the information above but also the
> associated traceback.
> 
> I assume that the log comes from some framework -- maybe "uwsgi"
> or "gevent". It is a weakness to log exceptions without the
> associated traceback.

After extensive debugging, it would appear the issue arrises due to a 
combination of my use of gevent.spawn to run a certain function, and the 
portion of that function that sends web socket messages. If I remove either the 
gevent.spawn and just call the function directly, or keep the gevent.spawn but 
don't try to send any messages via web sockets, the error goes away. With the 
combination, I *occasionally* get the message - most of the time it works. So I 
guess I just run everything synchronously for now, and as log as the 
performance isn't hurt noticeably, call it good.

I still find it strange that this never happened on CentOS 6, but whatever. The 
gevent.spawn calls were probably pre-mature optimization anyway.

> 
>> There is no additional information provided, just that. I was running the 
>> same app (checked out from a GIT repository, so exact same code) on CentOS 6 
>> for years without issue, it was only since I moved to CentOS 7 that I've 
>> seen the errors. I have not so far been able to correlate this error with 
>> any specific request. Has anyone seen anything like this before such that 
>> you can give me some pointers to fixing this? As the application *appears* 
>> to be functioning normally, it may not be a big issue, but it has locked up 
>> once since the move (no errors in the log, just not responding on the 
>> socket), so I am a bit concerned.
>> ---
>> Israel Brewster
>> Systems Analyst II
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> ---
>> 
>> [cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com]
>> 
>> 
>> 
>> [cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com]
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: AssertionError without traceback?

2019-01-17 Thread Israel Brewster


---
Israel Brewster
Systems Analyst II
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

[cid:bfa5c323-b100-481d-96b1-fc256ef2eb39@flyravn.com]



[cid:8c891973-9e67-47b3-aa14-5f58b9b93607@flyravn.com]







On Jan 14, 2019, at 10:40 PM, dieter 
mailto:die...@handshake.de>> wrote:

Israel Brewster mailto:ibrews...@flyravn.com>> writes:
I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi 
2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in 
/tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as 
various other third-party modules, all installed via pip in a virtualenv. The 
environment was set up using pip just a couple of days ago, so everything 
should be fully up-to-date. The application *appears* to be running properly 
(it is in moderate use and there have been no reports of issues, nor has my 
testing turned up any problems), however I keep getting entries like the 
following in the error log:

AssertionError
2019-01-14T19:16:32Z  failed with 
AssertionError

I would try to find out where the log message above has been generated
and ensure it does not only log the information above but also the
associated traceback.

Any tips as to how? I tried putting in additional logging at a couple places 
where I called gevent.spawn() to see if that additional logging lined up with 
the assertions, but no luck. I guess I could just start peppering my code with 
logging commands, and hope something pops, but this seems quite...inelegant. I 
have not been able to reproduce the error, unfortunately.


I assume that the log comes from some framework -- maybe "uwsgi"
or "gevent". It is a weakness to log exceptions without the
associated traceback.


Fully agreed on both points. The reference to the callback for some reason puts 
me in mind of C code, but of course AssertionError is python, so maybe not.

For what it's worth, the issue only seems to happen when the server is under 
relatively heavy load. During the night, when it is mostly idle, I don't get 
many (if any) errors. And this has only been happening since I upgraded to 
CentOS7 and the latest versions of all the frameworks. Hopefully it isn't a 
version incompatibility...

There is no additional information provided, just that. I was running the same 
app (checked out from a GIT repository, so exact same code) on CentOS 6 for 
years without issue, it was only since I moved to CentOS 7 that I've seen the 
errors. I have not so far been able to correlate this error with any specific 
request. Has anyone seen anything like this before such that you can give me 
some pointers to fixing this? As the application *appears* to be functioning 
normally, it may not be a big issue, but it has locked up once since the move 
(no errors in the log, just not responding on the socket), so I am a bit 
concerned.
---
Israel Brewster
Systems Analyst II
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

[cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com]



[cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com]

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


AssertionError without traceback?

2019-01-14 Thread Israel Brewster
I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi 
2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in 
/tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as 
various other third-party modules, all installed via pip in a virtualenv. The 
environment was set up using pip just a couple of days ago, so everything 
should be fully up-to-date. The application *appears* to be running properly 
(it is in moderate use and there have been no reports of issues, nor has my 
testing turned up any problems), however I keep getting entries like the 
following in the error log:

AssertionError
2019-01-14T19:16:32Z  failed with 
AssertionError

There is no additional information provided, just that. I was running the same 
app (checked out from a GIT repository, so exact same code) on CentOS 6 for 
years without issue, it was only since I moved to CentOS 7 that I've seen the 
errors. I have not so far been able to correlate this error with any specific 
request. Has anyone seen anything like this before such that you can give me 
some pointers to fixing this? As the application *appears* to be functioning 
normally, it may not be a big issue, but it has locked up once since the move 
(no errors in the log, just not responding on the socket), so I am a bit 
concerned.
---
Israel Brewster
Systems Analyst II
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

[cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com]



[cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com]







-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Packaging uwsgi flask app for non-programmers?

2018-02-13 Thread Israel Brewster
> On Feb 13, 2018, at 10:02 AM, Dan Stromberg <drsali...@gmail.com> wrote:
> 
> On Tue, Feb 13, 2018 at 9:28 AM, Israel Brewster <isr...@ravnalaska.net> 
> wrote:
>> As such, I'm considering three possible solutions:
>> 
>> 1) Make some sort of installer package that includes the python3 installer
>> 2) Somehow automate the download and install of Python3, or
>> 3) re-write my code to be python 2 compatible (since python 2 is included 
>> with the OS)
>> 
>> If anyone has any suggestions on how I could accomplish 1 or 2, I'd 
>> appreciate it. Thanks!
> 
> Would using homebrew help?
> 
> http://docs.python-guide.org/en/latest/starting/install3/osx/ 
> <http://docs.python-guide.org/en/latest/starting/install3/osx/>

That's a thought. I could offer the user the option of either a) automatically 
installing homebrew and then installing python3 via homebrew, or b) manually 
downloading and running the python3 installer.

> 
> BTW, you might use curl  | bash to get the ball rolling.

On that note, is there a fixed url that will always get the latest python3 
installer? Of course, I might not want to do that, for example after 3.7 or 4.0 
(I know, not for a while) is released, just in case something breaks with a 
newer release.

> 
> I wouldn't recommend moving from 3.x to 2.x, unless perhaps you use a
> common subset.

Yeah, that idea kinda left a sour taste in my mouth, but I figured I'd throw it 
out there as it would solve the python install issue.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Packaging uwsgi flask app for non-programmers?

2018-02-13 Thread Israel Brewster
> 
> On Feb 6, 2018, at 12:12 PM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> I have been working on writing an Alexa skill which, as part of it, requires 
> a local web server on the end users machine - the Alexa skill sends commands 
> to this server, which runs them on the local machine. I wrote this local 
> server in Flask, and run it using uwsgi, using a command like: "uwsgi 
> serverconfig.ini".
> 
> The problem is that in order for this to work, the end user must:
> 
> 1) Install python 3.6 (or thereabouts)
> 2) Install a number of python modules, and
> 3) run a command line (from the appropriate directory)
> 
> Not terribly difficult, but when I think of my target audience (Alexa users), 
> I could easily see even these steps being "too complicated". I was looking at 
> pyinstaller to create a simple double-click application, but it appears that 
> pyinstaller needs a python script as the "base" for the application, whereas 
> my "base" is uwsgi. Also, I do need to leave a config file accessible for the 
> end user to be able to edit. Is there a way to use pyinstaller in this 
> scenario, or perhaps some other option that might work better to package 
> things up?

So at the moment, since there have been no suggestions for packaging, I'm 
getting by with a bash script that:

a) Makes sure python 3 is installed, prompting the user to install it if not
b) Makes sure pip and virtualenv are installed, and installs them if needed
c) Sets up a virtualenv in the distribution directory
d) Installs all needed modules in the virtualenv - this step requires that dev 
tools are installed, a separate install.
e) modifies the configuration files to match the user and directory, and 
f) Installs a launchd script to run the uwsgi application

This actually seems to work fairly well, and by giving the script a .command 
extension, which automatically gets associated with terminal under OS X, the 
end user can simply double-click setup.command without having to go into 
terminal themselves. The main stumbling block then is the install of python3 - 
the user still has to manually download and install it in addition to my code, 
which I'd prefer to avoid - having to install my code separate from the Alexa 
skill is already an annoyance. As such, I'm considering three possible 
solutions:

1) Make some sort of installer package that includes the python3 installer
2) Somehow automate the download and install of Python3, or
3) re-write my code to be python 2 compatible (since python 2 is included with 
the OS)

If anyone has any suggestions on how I could accomplish 1 or 2, I'd appreciate 
it. Thanks!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
> 
> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Packaging uwsgi flask app for non-programmers?

2018-02-07 Thread Israel Brewster



> On Feb 6, 2018, at 8:24 PM, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> 
> On Tue, 6 Feb 2018 12:12:26 -0900, Israel Brewster <isr...@ravnalaska.net>
> declaimed the following:
> 
>> I have been working on writing an Alexa skill which, as part of it, requires 
>> a local web server on the end users machine - the Alexa skill sends commands 
>> to this server, which runs them on the local machine. I wrote this local 
>> server in Flask, and run it using uwsgi, using a command like: "uwsgi 
>> serverconfig.ini".
>> 
> 
>   
> 
>> Not terribly difficult, but when I think of my target audience (Alexa 
>> users), I could easily see even these steps being "too complicated". I was 
>> looking at pyinstaller to create a simple double-click application, but it 
>> appears that pyinstaller needs a python script as the "base" for the 
>> application, whereas my "base" is uwsgi. Also, I do need to leave a config 
>> file accessible for the end user to be able to edit. Is there a way to use 
>> pyinstaller in this scenario, or perhaps some other option that might work 
>> better to package things up?
> 
>   Not mentioned is getting your end-user to possibly have to open up
> fire-wall rules to allow INBOUND connections (even if, somehow, limited to
> LAN -- don't want to leave a WAN port open).

Not mentioned because it's not needed - I establish a ngrok tunnel to provide 
external https access to the local server. I just include the ngrok binary with 
my package, and run it using subprocess.Popen. Since it doesn't even require 
you to have an account to use it, that bypasses the need to set up 
port-forwards and firewall rules quite nicely. Also solves the problem of 
dynamic IP's without having to burden the end user with dyndns or the like - I 
just "register" the URL you get when connecting. Admittedly though, that was a 
large concern of mine until I was pointed to ngrok as a solution.

Ideally, I'd just access across the local network, Alexa device to local 
machine, but that's not an option - at least, not yet.

> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
>wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Packaging uwsgi flask app for non-programmers?

2018-02-07 Thread Israel Brewster
On Feb 6, 2018, at 12:12 PM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> I have been working on writing an Alexa skill which, as part of it, requires 
> a local web server on the end users machine - the Alexa skill sends commands 
> to this server, which runs them on the local machine. I wrote this local 
> server in Flask, and run it using uwsgi, using a command like: "uwsgi 
> serverconfig.ini".
> 
> The problem is that in order for this to work, the end user must:
> 
> 1) Install python 3.6 (or thereabouts)
> 2) Install a number of python modules, and
> 3) run a command line (from the appropriate directory)
> 
> Not terribly difficult, but when I think of my target audience (Alexa users), 
> I could easily see even these steps being "too complicated". I was looking at 
> pyinstaller to create a simple double-click application, but it appears that 
> pyinstaller needs a python script as the "base" for the application, whereas 
> my "base" is uwsgi. Also, I do need to leave a config file accessible for the 
> end user to be able to edit. Is there a way to use pyinstaller in this 
> scenario, or perhaps some other option that might work better to package 
> things up?

A related question, should a way to create a full package not be available, 
would be Is there a way to do a "local" (as in, in the same directory) install 
of Python3.6, and to do it in such a way as I could script it from the shell 
(or python, whatever)? The idea would then be to basically set up a fully 
self-contained virtualenv on the users machine, such that they just have to run 
a "setup.sh" script or the like.

BTW, this would be on a Mac - my local skill server works using AppleScript, so 
it's not actually portable to other OS's :-P

> 
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
> 
> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Packaging uwsgi flask app for non-programmers?

2018-02-06 Thread Israel Brewster
I have been working on writing an Alexa skill which, as part of it, requires a 
local web server on the end users machine - the Alexa skill sends commands to 
this server, which runs them on the local machine. I wrote this local server in 
Flask, and run it using uwsgi, using a command like: "uwsgi serverconfig.ini".

The problem is that in order for this to work, the end user must:

1) Install python 3.6 (or thereabouts)
2) Install a number of python modules, and
3) run a command line (from the appropriate directory)

Not terribly difficult, but when I think of my target audience (Alexa users), I 
could easily see even these steps being "too complicated". I was looking at 
pyinstaller to create a simple double-click application, but it appears that 
pyinstaller needs a python script as the "base" for the application, whereas my 
"base" is uwsgi. Also, I do need to leave a config file accessible for the end 
user to be able to edit. Is there a way to use pyinstaller in this scenario, or 
perhaps some other option that might work better to package things up?
 
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


String matching based on sound?

2018-01-29 Thread Israel Brewster
I am working on a python program that, at one step, takes an input (string), 
and matches it to songs/artists in a users library. I'm having some difficulty, 
however, figuring out how to match when the input/library contains 
numbers/special characters. For example, take the group "All-4-One". In my 
library it might be listed exactly like that. I need to match this to ANY of 
the following inputs:

• all-4-one (of course)
• all 4 one (no dashes)
• all 4 1 (all numbers)
• all four one (all spelled out)
• all for one

Or, really, any other combination that sounds the same. The reasoning for this 
is that the input comes from a speech recognition system, so the user speaking, 
for example, "4", could be recognized as "for", "four" or "4". I'd imagine that 
Alexa/Siri/Google all do things like this (since you can ask them to play 
songs/artists), but I want to implement this in Python.

In initial searching, I did find the "fuzzy" library, which at first glance 
appeared to be what I was looking for, but it, apparently, ignores numbers, 
with the result that "all 4 one" gave the same output as "all in", but NOT the 
same output as "all 4 1" - even though "all 4 1" sounds EXACTLY the same, while 
"all in" is only similar if you ignore the 4.

So is there something similar that works with strings containing numbers? And 
that would only give me a match if the two strings sound identical? That is, 
even ignoring the numbers, I should NOT get a match between "all one" and "all 
in" - they are similar, but not identical, while "all one" and "all 1" would be 
identical.





-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread safety issue (I think) with defaultdict

2017-11-03 Thread Israel Brewster

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




> On Nov 3, 2017, at 7:11 AM, Rhodri James <rho...@kynesim.co.uk> wrote:
> 
> On 03/11/17 14:50, Chris Angelico wrote:
>> On Fri, Nov 3, 2017 at 10:26 PM, Rhodri James <rho...@kynesim.co.uk> wrote:
>>> On 02/11/17 20:24, Chris Angelico wrote:
>>>> 
>>>> Thank you. I've had this argument with many people, smart people (like
>>>> Steven), people who haven't grokked that all concurrency has costs -
>>>> that threads aren't magically more dangerous than other options.
>>> 
>>> 
>>> I'm with Steven.  To be fair, the danger with threads is that most people
>>> don't understand thread-safety, and in particular don't understand either
>>> that they have a responsibility to ensure that shared data access is done
>>> properly or what the cost of that is.  I've seen far too much thread-based
>>> code over the years that would have been markedly less buggy and not much
>>> slower if it had been written sequentially.
>> Yes, but what you're seeing is that *concurrent* code is more
>> complicated than *sequential* code. Would the code in question have
>> been less buggy if it had used multiprocessing instead of
>> multithreading? What if it used explicit yield points?
> 
> My experience with situations where I can do a reasonable comparison is 
> limited, but the answer appears to be "Yes".
> Multiprocessing
>> brings with it a whole lot of extra complications around moving data
>> around.
> 
> People generally understand how to move data around, and the mistakes are 
> usually pretty obvious when they happen.  

I think the existence of this thread indicates otherwise :-) This mistake was 
far from obvious, and clearly I didn't understand properly how to move data 
around *between processes*. Unless you are just saying I am ignorant or 
something? :-)

> People may not understand how to move data around efficiently, but that's a 
> separate argument.
> 
> Multithreading brings with it a whole lot of extra
>> complications around NOT moving data around.
> 
> I think this involves more subtle bugs that are harder to spot.  

Again, the existence of this thread indicates otherwise. This bug was quite 
subtile and hard to spot. It was only when I started looking at how many times 
a given piece of code was called (specifically, the part that handled data 
coming in for which there wasn't an entry in the dictionary) that I spotted the 
problem. If I hadn't had logging in place in that code block, I would have 
never realized it wasn't working as intended. You don't get much more subtile 
than that. And, furthermore, it only existed because I *wasn't* using threads. 
This bug simply doesn't exist in a threaded model, only in a multiprocessing 
model. Yes, the *explanation* of the bug is simple enough - each process "sees" 
a different value, since memory isn't shared - but the bug in my code was 
neither obvious or easy to spot, at least until you knew what was happening.

> People seem to find it harder to reason about atomicity and realising that 
> widely separated pieces of code may interact unexpectedly.
> 
> Yield points bring with
>> them the risk of locking another thread out unexpectedly (particularly
>> since certain system calls aren't async-friendly on certain OSes).
> 
> I've got to admit I find coroutines straightforward, but I did cut my teeth 
> on a cooperative OS.  It certainly makes the atomicity issues easier to deal 
> with.

I still can't claim to understand them. Threads? No problem. Obviously I'm 
still lacking some understanding of how data works in the multiprocessing 
model, however.

> 
> All
>> three models have their pitfalls.
> 
> Assuredly.  I just think threads are soggier and hard to light^W^W^W^W^W 
> prone to subtler and more mysterious-looking bugs.

And yet, this thread exists because of a subtle and mysterious-looking bug with 
multiple *processes* that doesn't exist with multiple *threads*. Thus the point 
- threads are no *worse* - just different - than any other concurrency model.

> 
> -- 
> Rhodri James *-* Kynesim Ltd
> -- 
> https://mail.python.org/mailman/listinfo/python-list 
> <https://mail.python.org/mailman/listinfo/python-list>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Share unpickleable object across processes

2017-11-02 Thread Israel Brewster
On Nov 2, 2017, at 12:36 PM, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Fri, Nov 3, 2017 at 7:35 AM, Israel Brewster <isr...@ravnalaska.net 
> <mailto:isr...@ravnalaska.net>> wrote:
>> On Nov 2, 2017, at 12:30 PM, Chris Angelico <ros...@gmail.com> wrote:
>>> 
>>> On Fri, Nov 3, 2017 at 5:54 AM, Israel Brewster <isr...@ravnalaska.net> 
>>> wrote:
>>>> I have a Flask/UWSGI web app that serves up web socket connections. When a 
>>>> web socket connection is created, I want to store a reference to said web 
>>>> socket so I can do things like write messages to every connected 
>>>> socket/disconnect various sockets/etc. UWSGI, however, launches multiple 
>>>> child processes which handle incoming connections, so the data structure 
>>>> that stores the socket connections needs to be shared across all said 
>>>> processes. How can I do this?
>>>> 
>>> 
>>> You're basically going to need to have a single process that manages
>>> all the socket connections. Do you actually NEED multiple processes to
>>> do your work? If you can do it with multiple threads in a single
>>> process, you'll be able to share your socket info easily. Otherwise,
>>> you could have one process dedicated to managing the websockets, and
>>> all the others message that process saying "please send this to all
>>> processes".
>> 
>> Ok, that makes sense, but again: it's UWSGI that creates the processes, not 
>> me. I'm not creating *any* processes or threads. Aside from telling UWSGI to 
>> only use a single worker, I have no control over what happens where. But 
>> maybe that's what I need to do?
>> 
> 
> That's exactly what I mean, yeah. UWSGI should be able to be told to
> use threads instead of processes. I don't know it in detail, but a
> cursory look at the docos suggests that it's happy to use either (or
> even both).

Gotcha, thanks. The hesitation I have there is that the UWSGI config is a user 
setting. Sure, I can set up my install to only run one process, but what if 
someone else tries to use my code, and they set up UWSGI to run multiple? I 
hate the idea of my code being so fragile that a simple user setting change 
which I have no control over can break it. But it is what it is, and if that's 
the only option, I'll just put a note in the readme to NEVER, under any 
circumstances, set UWSGI to use multiple processes when running this app and 
call it good :-)

> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list 
> <https://mail.python.org/mailman/listinfo/python-list>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Share unpickleable object across processes

2017-11-02 Thread Israel Brewster
On Nov 2, 2017, at 11:15 AM, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> 
> Israel Brewster <isr...@ravnalaska.net> writes:
>> the data structure that stores the socket connections needs
>> to be shared across all said processes.
> 
>  IIRC that's the difference between threads and
>  processes: threads share a common memory.
> 
>  You can use the standard module mmap to share
>  data between processes.
> 
>  If it's not pickleable, but if you can write code
>  to serialize it to a text format yourself, you
>  can share that text representation via, er, sockets.

If I could serialize it to a text format, then I could pickle said text format 
and store it in redis/some other third party store. :-)

> 
>> In C I might do something like store a void pointer to the
>> object, then cast it to the correct object type
> 
>  Restrictions of the OS or MMU even apply to
>  C code.

Sure, I was just talking in general "ideas". I'm not saying I tried it or it 
would work.

> 
>> , but that's not an option in python. So how can I get around
>> this issue?
> 
>  You can always write parts of a CPython program
>  in C, for example, using Cython.

True, but I just need to be able to share this piece of data - I don't want to 
reinvent the wheel just to write an app that uses web sockets!

I *must* be thinking about this wrong. Take even a basic chat app that uses 
websockets. Client a, which connected to process 1, sends a message to the 
server. There are three other clients connected, each of which needs to receive 
said message. Given that the way UWSGI works said clients could have connected 
to any one of the worker processes, how can the server push the message out to 
*all* clients? What am I missing here?

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Share unpickleable object across processes

2017-11-02 Thread Israel Brewster
On Nov 2, 2017, at 12:30 PM, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Fri, Nov 3, 2017 at 5:54 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
>> I have a Flask/UWSGI web app that serves up web socket connections. When a 
>> web socket connection is created, I want to store a reference to said web 
>> socket so I can do things like write messages to every connected 
>> socket/disconnect various sockets/etc. UWSGI, however, launches multiple 
>> child processes which handle incoming connections, so the data structure 
>> that stores the socket connections needs to be shared across all said 
>> processes. How can I do this?
>> 
> 
> You're basically going to need to have a single process that manages
> all the socket connections. Do you actually NEED multiple processes to
> do your work? If you can do it with multiple threads in a single
> process, you'll be able to share your socket info easily. Otherwise,
> you could have one process dedicated to managing the websockets, and
> all the others message that process saying "please send this to all
> processes".

Ok, that makes sense, but again: it's UWSGI that creates the processes, not me. 
I'm not creating *any* processes or threads. Aside from telling UWSGI to only 
use a single worker, I have no control over what happens where. But maybe 
that's what I need to do?

> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Share unpickleable object across processes

2017-11-02 Thread Israel Brewster
I have a Flask/UWSGI web app that serves up web socket connections. When a web 
socket connection is created, I want to store a reference to said web socket so 
I can do things like write messages to every connected socket/disconnect 
various sockets/etc. UWSGI, however, launches multiple child processes which 
handle incoming connections, so the data structure that stores the socket 
connections needs to be shared across all said processes. How can I do this?

Tried so far:

1) using a multiprocessing Manager(), from which I have gotten a dict(). This 
just gives me "BlockingIOError: [Errno 35] Resource temporarily unavailable" 
errors whenever I try to access the dict object.
2) Using redis/some other third-party store. This fails because it requires you 
to be able to pickle the object, and the web socket objects I'm getting are not 
pickle able.

In C I might do something like store a void pointer to the object, then cast it 
to the correct object type, but that's not an option in python. So how can I 
get around this issue?



-------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread safety issue (I think) with defaultdict

2017-11-02 Thread Israel Brewster

> On Nov 1, 2017, at 4:53 PM, Steve D'Aprano <steve+pyt...@pearwood.info> wrote:
> 
> On Thu, 2 Nov 2017 05:53 am, Israel Brewster wrote:
> 
> [...]
>> So the end result is that the thread that "updates" the dictionary, and the
>> thread that initially *populates* the dictionary are actually running in
>> different processes.
> 
> If they are in different processes, that would explain why the second
> (non)thread sees an empty dict even after the first thread has populated it:
> 
> 
> # from your previous post
>> Length at get AC:  54 ID: 4524152200  Time: 2017-11-01 09:41:24.474788
>> Length At update:  1 ID: 4524152200  Time: 2017-11-01 09:41:24.784399
>> Length At update:  2 ID: 4524152200  Time: 2017-11-01 09:41:25.228853
> 
> 
> You cannot rely on IDs being unique across different processes. Its an
> unfortunately coincidence(!) that they end up with the same ID.

I think it's more than a coincidence, given that it is 100% reproducible. Plus, 
in an earlier debug test I was calling print() on the defaultdict object, which 
gives output like "", where presumably the 
0x1066467f0 is a memory address (correct me if I am wrong in that). In every 
case, that address was the same. So still a bit puzzling.

> 
> Or possibly there's some sort of weird side-effect or bug in Flask that, when
> it shares the dict between two processes (how?) it clears the dict.

Well, it's UWSGI that is creating the processes, not Flask, but that's 
semantics :-) The real question though is "how does python handle such 
situations?" because, really, there would be no difference (I wouldn't think) 
between what is happening here and what is happening if you were to create a 
new process using the multiprocessing library and reference a variable created 
outside that process.

In fact, I may have to try exactly that, just to see what happens.

> 
> Or... have you considered the simplest option, that your update thread clears
> the dict when it is first called? Since you haven't shared your code with us,
> I cannot rule out a simple logic error like this:
> 
> def launch_update_thread(dict):
>dict.clear()
># code to start update thread

Actually, I did share my code. It's towards the end of my original message. I 
cut stuff out for readability/length, but nothing having to do with the 
dictionary in question. So no, clear is never called, nor any other operation 
that could clear the dict.

> 
> 
>> In fact, any given request could be in yet another 
>> process, which would seem to indicate that all bets are off as to what data
>> is seen.
>> 
>> Now that I've thought through what is really happening, I think I need to
>> re-architect things a bit here. 
> 
> Indeed. I've been wondering why you are using threads at all, since there
> doesn't seem to be any benefit to initialising the dict and updating it in
> different thread. Now I learn that your architecture is even more complex. I
> guess some of that is unavailable, due to it being a web app, but still.

What it boils down to is this: I need to update this dictionary in real time as 
data flows in. Having that update take place in a separate thread enables this 
update to happen without interfering with the operation of the web app, and 
offloads the responsibility for deciding when to switch to the OS. There *are* 
other ways to do this, such as using gevent greenlets or asyncio, but simply 
spinning off a separate thread is the easiest/simplest option, and since it is 
a long-running thread the overhead of spinning off the thread (as opposed to a 
gevent style interlacing) is of no consequence.

As far as the initialization, that happens in response to a user request, at 
which point I am querying the data anyway (since the user asked for it). The 
idea is I already have the data, since the user asked for it, why not save it 
in this dict rather than waiting to update the dict until new data comes in? I 
could, of course, do a separate request for the data in the same thread that 
updates the dict, but there doesn't seem to be any purpose in that, since until 
someone requests the data, I don't need it for anything.

> 
> 
>> For one thing, the update thread should be 
>> launched from the main process, not an arbitrary UWSGI worker. I had
>> launched it from the client connection because there is no point in having
>> it running if there is no one connected, but I may need to launch it from
>> the __init__.py file instead. For another thing, since this dictionary will
>> need to be accessed from arbitrary worker processes, I'm thinking I may need 
>> to move it to some sort of external storage, such as a redis database
> 
> That sounds awful. What if the arbitrary worker decides to remove a bunch of
>

Re: Thread safety issue (I think) with defaultdict

2017-11-01 Thread Israel Brewster
On Nov 1, 2017, at 9:58 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> 
> On Tue, Oct 31, 2017 at 11:38 AM, Israel Brewster <isr...@ravnalaska.net> 
> wrote:
>> A question that has arisen before (for example, here: 
>> https://mail.python.org/pipermail/python-list/2010-January/565497.html 
>> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
>> the question of "is defaultdict thread safe", with the answer generally 
>> being a conditional "yes", with the condition being what is used as the 
>> default value: apparently default values of python types, such as list, are 
>> thread safe,
> 
> I would not rely on this. It might be true for current versions of
> CPython, but I don't think there's any general guarantee and you could
> run into trouble with other implementations.

Right, completely agreed. Kinda feels "dirty" to rely on things like this to me.

> 
>> [...]
> 
> [...] You could use a regular dict and just check if
> the key is present, perhaps with the additional argument to .get() to
> return a default value.

True. Using defaultdict is simply saves having to stick the same default in 
every call to get(). DRY principal and all. That said, see below - I don't 
think the defaultdict is the issue.

> 
> Individual lookups and updates of ordinary dicts are atomic (at least
> in CPython). A lookup followed by an update is not, and this would be
> true for defaultdict as well.
> 
>> [...]
>> 1) Is this what it means to NOT be thread safe? I was thinking of race 
>> conditions where individual values may get updated wrong, but this 
>> apparently is overwriting the entire dictionary.
> 
> No, a thread-safety issue would be something like this:
> 
>account[user] = account[user] + 1
> 
> where the value of account[user] could potentially change between the
> time it is looked up and the time it is set again.

That's what I thought - changing values/different values from expected, not 
missing values.

All that said, I just had a bit of an epiphany: the main thread is actually a 
Flask app, running through UWSGI with multiple *processes*, and using the 
flask-uwsgi-websocket plugin, which further uses greenlets. So what I was 
thinking was simply a separate thread was, in reality, a completely separate 
*process*. I'm sure that makes a difference. So what's actually happening here 
is the following:

1) the main python process starts, which initializes the dictionary (since it 
is at a global level)
2) uwsgi launches off a bunch of child worker processes (10 to be exact, each 
of which is set up with 10 gevent threads)
3a) a client connects (web socket connection to be exact). This connection is 
handled by an arbitrary worker, and an arbitrary green thread within that 
worker, based on UWSGI algorithms.
3b) This connection triggers launching of a *true* thread (using the python 
threading library) which, presumably, is now a child thread of that arbitrary 
uwsgi worker. <== BAD THING, I would think
4) The client makes a request for the list, which is handled by a DIFFERENT 
(presumably) arbitrary worker process and green thread.

So the end result is that the thread that "updates" the dictionary, and the 
thread that initially *populates* the dictionary are actually running in 
different processes. In fact, any given request could be in yet another 
process, which would seem to indicate that all bets are off as to what data is 
seen.

Now that I've thought through what is really happening, I think I need to 
re-architect things a bit here. For one thing, the update thread should be 
launched from the main process, not an arbitrary UWSGI worker. I had launched 
it from the client connection because there is no point in having it running if 
there is no one connected, but I may need to launch it from the __init__.py 
file instead. For another thing, since this dictionary will need to be accessed 
from arbitrary worker processes, I'm thinking I may need to move it to some 
sort of external storage, such as a redis database. Oy, I made my life 
complicated :-)

> That said it's not
> obvious to me what your problem actually is.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread safety issue (I think) with defaultdict

2017-11-01 Thread Israel Brewster
On Nov 1, 2017, at 9:04 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> Let me rephrase the question, see if I can simplify it. I need to be able to 
> access a defaultdict from two different threads - one thread that responds to 
> user requests which will populate the dictionary in response to a user 
> request, and a second thread that will keep the dictionary updated as new 
> data comes in. The value of the dictionary will be a timestamp, with the 
> default value being datetime.min, provided by a lambda:
> 
> lambda: datetime.min
> 
> At the moment my code is behaving as though each thread has a *separate* 
> defaultdict, even though debugging shows the same addresses - the background 
> update thread never sees the data populated into the defaultdict by the main 
> thread. I was thinking race conditions or the like might make it so one 
> particular loop of the background thread occurs before the main thread, but 
> even so subsequent loops should pick up on the changes made by the main 
> thread.
> 
> How can I *properly* share a dictionary like object between two threads, with 
> both threads seeing the updates made by the other?

For what it's worth, if I insert a print statement in both threads (which I am 
calling "Get AC", since that is the function being called in the first thread, 
and "update", since that is the purpose of the second thread), I get the 
following output:

Length at get AC:  54 ID: 4524152200  Time: 2017-11-01 09:41:24.474788
Length At update:  1 ID: 4524152200  Time: 2017-11-01 09:41:24.784399
Length At update:  2 ID: 4524152200  Time: 2017-11-01 09:41:25.228853
Length At update:  3 ID: 4524152200  Time: 2017-11-01 09:41:25.530434
Length At update:  4 ID: 4524152200  Time: 2017-11-01 09:41:25.532073
Length At update:  5 ID: 4524152200  Time: 2017-11-01 09:41:25.682161
Length At update:  6 ID: 4524152200  Time: 2017-11-01 09:41:26.807127
...

So the object ID hasn't changed as I would expect it to if, in fact, we have 
created a separate object for the thread. And the first call that populates it 
with 54 items happens "well" before the first update call - a full .3 seconds, 
which I would think would be an eternity is code terms. So it doesn't even look 
like it's a race condition causing the issue.

It seems to me this *has* to be something to do with the use of threads, but 
I'm baffled as to what.

> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
> 
> 
> 
> 
>> On Oct 31, 2017, at 9:38 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
>> 
>> A question that has arisen before (for example, here: 
>> https://mail.python.org/pipermail/python-list/2010-January/565497.html 
>> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
>> the question of "is defaultdict thread safe", with the answer generally 
>> being a conditional "yes", with the condition being what is used as the 
>> default value: apparently default values of python types, such as list, are 
>> thread safe, whereas more complicated constructs, such as lambdas, make it 
>> not thread safe. In my situation, I'm using a lambda, specifically:
>> 
>> lambda: datetime.min
>> 
>> So presumably *not* thread safe.
>> 
>> My goal is to have a dictionary of aircraft and when they were last "seen", 
>> with datetime.min being effectively "never". When a data point comes in for 
>> a given aircraft, the data point will be compared with the value in the 
>> defaultdict for that aircraft, and if the timestamp on that data point is 
>> newer than what is in the defaultdict, the defaultdict will get updated with 
>> the value from the datapoint (not necessarily current timestamp, but rather 
>> the value from the datapoint). Note that data points do not necessarily 
>> arrive in chronological order (for various reasons not applicable here, it's 
>> just the way it is), thus the need for the comparison.
>> 
>> When the program first starts up, two things happen:
>> 
>> 1) a thread is started that watches for incoming data points and updates the 
>> dictionary as per above, and
>> 2) the dictionary should get an initial population (in the main thread) from 
>> hard storage.
>> 
>> The behavior I'm seeing, however, is that when step 2 happens (which 
>> generally happens before the thread gets any updates), the dictionary gets 
>> populated with 56 entries, as expected. However, none of those entries are 
>> visible

Re: Thread safety issue (I think) with defaultdict

2017-11-01 Thread Israel Brewster
Let me rephrase the question, see if I can simplify it. I need to be able to 
access a defaultdict from two different threads - one thread that responds to 
user requests which will populate the dictionary in response to a user request, 
and a second thread that will keep the dictionary updated as new data comes in. 
The value of the dictionary will be a timestamp, with the default value being 
datetime.min, provided by a lambda:

lambda: datetime.min

At the moment my code is behaving as though each thread has a *separate* 
defaultdict, even though debugging shows the same addresses - the background 
update thread never sees the data populated into the defaultdict by the main 
thread. I was thinking race conditions or the like might make it so one 
particular loop of the background thread occurs before the main thread, but 
even so subsequent loops should pick up on the changes made by the main thread.

How can I *properly* share a dictionary like object between two threads, with 
both threads seeing the updates made by the other?
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




> On Oct 31, 2017, at 9:38 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> A question that has arisen before (for example, here: 
> https://mail.python.org/pipermail/python-list/2010-January/565497.html 
> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
> the question of "is defaultdict thread safe", with the answer generally being 
> a conditional "yes", with the condition being what is used as the default 
> value: apparently default values of python types, such as list, are thread 
> safe, whereas more complicated constructs, such as lambdas, make it not 
> thread safe. In my situation, I'm using a lambda, specifically:
> 
> lambda: datetime.min
> 
> So presumably *not* thread safe.
> 
> My goal is to have a dictionary of aircraft and when they were last "seen", 
> with datetime.min being effectively "never". When a data point comes in for a 
> given aircraft, the data point will be compared with the value in the 
> defaultdict for that aircraft, and if the timestamp on that data point is 
> newer than what is in the defaultdict, the defaultdict will get updated with 
> the value from the datapoint (not necessarily current timestamp, but rather 
> the value from the datapoint). Note that data points do not necessarily 
> arrive in chronological order (for various reasons not applicable here, it's 
> just the way it is), thus the need for the comparison.
> 
> When the program first starts up, two things happen:
> 
> 1) a thread is started that watches for incoming data points and updates the 
> dictionary as per above, and
> 2) the dictionary should get an initial population (in the main thread) from 
> hard storage.
> 
> The behavior I'm seeing, however, is that when step 2 happens (which 
> generally happens before the thread gets any updates), the dictionary gets 
> populated with 56 entries, as expected. However, none of those entries are 
> visible when the thread runs. It's as though the thread is getting a separate 
> copy of the dictionary, although debugging says that is not the case - 
> printing the variable from each location shows the same address for the 
> object.
> 
> So my questions are:
> 
> 1) Is this what it means to NOT be thread safe? I was thinking of race 
> conditions where individual values may get updated wrong, but this apparently 
> is overwriting the entire dictionary.
> 2) How can I fix this?
> 
> Note: I really don't care if the "initial" update happens after the thread 
> receives a data point or two, and therefore overwrites one or two values. I 
> just need the dictionary to be fully populated at some point early in 
> execution. In usage, the dictionary is used to see of an aircraft has been 
> seen "recently", so if the most recent datapoint gets overwritten with a 
> slightly older one from disk storage, that's fine - it's just if it's still 
> showing datetime.min because we haven't gotten in any datapoint since we 
> launched the program, even though we have "recent" data in disk storage thats 
> a problem. So I don't care about the obvious race condition between the two 
> operations, just that the end result is a populated dictionary. Note also 
> that as datapoint come in, they are being written to disk, so the disk 
> storage doesn't lag significantly anyway.
> 
> The framework of my code is below:
> 
> File: watcher.py
> 
> last_points = defaultdict(lambda:datetime.min)
> 
> # This function 

Thread safety issue (I think) with defaultdict

2017-10-31 Thread Israel Brewster
A question that has arisen before (for example, here: 
https://mail.python.org/pipermail/python-list/2010-January/565497.html 
<https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
the question of "is defaultdict thread safe", with the answer generally being a 
conditional "yes", with the condition being what is used as the default value: 
apparently default values of python types, such as list, are thread safe, 
whereas more complicated constructs, such as lambdas, make it not thread safe. 
In my situation, I'm using a lambda, specifically:

lambda: datetime.min

So presumably *not* thread safe.

My goal is to have a dictionary of aircraft and when they were last "seen", 
with datetime.min being effectively "never". When a data point comes in for a 
given aircraft, the data point will be compared with the value in the 
defaultdict for that aircraft, and if the timestamp on that data point is newer 
than what is in the defaultdict, the defaultdict will get updated with the 
value from the datapoint (not necessarily current timestamp, but rather the 
value from the datapoint). Note that data points do not necessarily arrive in 
chronological order (for various reasons not applicable here, it's just the way 
it is), thus the need for the comparison.

When the program first starts up, two things happen:

1) a thread is started that watches for incoming data points and updates the 
dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from 
hard storage.

The behavior I'm seeing, however, is that when step 2 happens (which generally 
happens before the thread gets any updates), the dictionary gets populated with 
56 entries, as expected. However, none of those entries are visible when the 
thread runs. It's as though the thread is getting a separate copy of the 
dictionary, although debugging says that is not the case - printing the 
variable from each location shows the same address for the object.

So my questions are:

1) Is this what it means to NOT be thread safe? I was thinking of race 
conditions where individual values may get updated wrong, but this apparently 
is overwriting the entire dictionary.
2) How can I fix this?

Note: I really don't care if the "initial" update happens after the thread 
receives a data point or two, and therefore overwrites one or two values. I 
just need the dictionary to be fully populated at some point early in 
execution. In usage, the dictionary is used to see of an aircraft has been seen 
"recently", so if the most recent datapoint gets overwritten with a slightly 
older one from disk storage, that's fine - it's just if it's still showing 
datetime.min because we haven't gotten in any datapoint since we launched the 
program, even though we have "recent" data in disk storage thats a problem. So 
I don't care about the obvious race condition between the two operations, just 
that the end result is a populated dictionary. Note also that as datapoint come 
in, they are being written to disk, so the disk storage doesn't lag 
significantly anyway.

The framework of my code is below:

File: watcher.py

last_points = defaultdict(lambda:datetime.min)

# This function is launched as a thread using the threading module when the 
first client connects
def watch():
while true:

pointtime= 
if last_points[] < pointtime:

last_points[]=pointtime
#DEBUGGING
print("At update:", len(last_points))


File: main.py:

from .watcher import last_points

# This function will be triggered by a web call from a client, so could happen 
at any time
# Client will call this function immediately after connecting, as well as in 
response to various user actions.
def getac():


for record in aclist:
last_points[]=record_timestamp
#DEBUGGING
    print("At get AC:", len(last_points))


---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Save non-pickleable variable?

2017-10-20 Thread Israel Brewster
On Oct 20, 2017, at 11:09 AM, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> 
> Israel Brewster <isr...@ravnalaska.net> writes:
>> Given that, is there any way I can write out the "raw" binary
>> data to a file
> 
>  If you can call into the Java SE library, you can try
> 
> docs.oracle.com/javase/9/docs/api/java/io/ObjectOutputStream.html#writeObject-java.lang.Object-
> 
>  , e.g.:
> 
> public static void save
> ( final java.lang.String path, final java.lang.Object object )
> { try
>  { final java.io.FileOutputStream fileOutputStream
>= new java.io.FileOutputStream( path );
> 
>final java.io.ObjectOutputStream objectOutputStream
>= new java.io.ObjectOutputStream( fileOutputStream );
> 
>objectOutputStream.writeObject( object );
> 
>objectOutputStream.close(); }
> 
>  catch( final java.io.IOException iOException )
>  { /* application-specific code */ }}
> 
> 
>> , and read it back in later?
> 
>  There's a corresponding »readObject« method in
>  »java.io.ObjectInputStream«. E.g.,
> 
> public static java.lang.Object load( final java.lang.String path )
> {
>  java.io.FileInputStream fileInputStream = null;
> 
>  java.io.ObjectInputStream objectInputStream = null;
> 
>  java.lang.Object object = null;
> 
>  try
>  { fileInputStream = new java.io.FileInputStream( path );
> 
>objectInputStream = new java.io.ObjectInputStream
>( fileInputStream );
> 
>object = objectInputStream.readObject();
> 
>objectInputStream.close(); }
> 
>  catch( final java.io.IOException iOException )
>  { java.lang.System.out.println( iOException ); }
> 
>  catch
>  ( final java.lang.ClassNotFoundException classNotFoundException )
>  { java.lang.System.out.println( classNotFoundException ); }
> 
>  return object; }
> 
>  However, it is possible that not all objects can be
>  meaningfully saved and restored in that way.

Thanks for the information. In addition to what you suggested, it may be 
possible that the Java library itself has methods for saving this object - I 
seem to recall the methods for displaying the data having options to read from 
files (rather than from the Java object directly like I'm doing), and it 
wouldn't make sense to load from a file unless you could first create said file 
by some method. I'll investigate solutions java-side.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Save non-pickleable variable?

2017-10-20 Thread Israel Brewster
tldr: I have an object that can't be picked. Is there any way to do a "raw" 
dump of the binary data to a file, and re-load it later?

Details: I am using a java (I know, I know - this is a python list. I'm not 
asking about the java - honest!) library (Jasper Reports) that I access from 
python using py4j (www.py4j.org <http://www.py4j.org/>). At one point in my 
code I call a java function which, after churning on some data in a database, 
returns an object (a jasper report object populated with the final report data) 
that I can use (via another java call) to display the results in a variety of 
formats (HTML, PDF, XLS, etc). At the time I get the object back, I use it to 
display the results in HTML format for quick display, but the user may or may 
not also want to get a PDF copy in the near future. 

Since it can take some time to generate this object, and also since the data 
may change between when I do the HTML display and when the user requests a PDF 
(if they do at all), I would like to save this object for potential future 
re-use. Because it might be large, and there is actually a fairly good chance 
the user won't need it again, I'd like to save it in a temp file (tat would be 
deleted when the user logs out) rather than in memory. Unfortunately, since 
this is an object created by and returned from a java function, not a native 
python object, it is not able to be pickled (as the suggestion typically is), 
at least to my knowledge.

Given that, is there any way I can write out the "raw" binary data to a file, 
and read it back in later? Or some other way to be able to save this object? It 
is theoretically possible that I could do it on the java side, i.e. the library 
may have some way of writing out the file, but obviously I wouldn't expect 
anyone here to know anything about that - I'm just asking about the python side 
:-)

-------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Efficient counting of results

2017-10-20 Thread Israel Brewster
is entire month (including the given date)? What about 
this year (again, including the given date and month)? How about arrivals - 
same questions. 

As you can hopefully see now, if a departure happened this week, it probably 
also happened this month (although that is not necessarily the case, since 
weeks can cross month boundaries), and if it happened this date or this month, 
it *definitely* happened this year. As such, a given departure *likely* will be 
counted in multiple date "groups", if you will.

The end result should be a table like the one I posted in the original 
question: time frame covered on the horizontal axis (YTD, MTD etc.), and "late" 
groups for T1 and T2 on the vertical.

> You want to process all your records, and decide "as of
> now, how late is each record", and then report *cumulative* subtotals for a
> number of arbitrary groups: not late yet, five minutes late, one day late,
> one year late, etc.

Just to clarify, as stated, the late groups are not-late, 1-5 minutes late, and 
6-15 minutes late. Also as stated in the original message, anything over 15 
minutes late is dealt with separately, and therefore ignored for the purposes 
of this report.

> 
> Suggestion:
> 
> Start with just the "activation time" and "now", and calculate the difference.
> If they are both given in seconds, you can just subtract:
> 
>lateness = now - activation_time
> 
> to determine how late that record is. If they are Datetime objects, use a
> Timedelta object.
> 
> That *single* computed field, the lateness, is enough to determine which
> subtotals need to be incremented.

Well, *two* computed fields, one for T1 and one for T2, which are counted 
separately.

> Start by dividing all of time into named
> buckets, in numeric order:
> 
> ...
> for record in records:
>lateness = now - record.activation_date
>for end, bucket in buckets:
>if lateness <= end:
>bucket.append(record)
>else:
>break
> 
> And you're done!
> 
> If you want the *number of records* in a particular bucket, you say:
> 
> len(bucket)
> 
> If you want the total record amount, you say:
> 
> sum(record.total for record in bucket)
> 
> 
> (assuming your records also have a "total" field, if they're invoices say).
> 
> 
> I hope that's even vaguely helpful.

In a sense, in that it supports my initial approach. 

As Stefan Ram pointed out, there is nothing wrong with the solution I have: 
simply using if statements around the calculated lateness of t1 and t2 to 
increment the appropriate counters. I was just thinking there might be tools to 
make the job easier/cleaner/more efficient. From the responses I have gotten, 
it would seem that that is likely not the case, so I'll just say "thank you all 
for your time", and let the matter rest.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> 
> 
> 
>> Maybe that will help clear things up. Or not. :-)
> 
> 
> Not even a tiny bit :-(
> 
> 
> 
> 
> 
> -- 
> Steve
> “Cheer up,” they said, “things could be worse.” So I cheered up, and sure
> enough, things got worse.
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Efficient counting of results

2017-10-19 Thread Israel Brewster

> On Oct 19, 2017, at 10:02 AM, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> 
> Israel Brewster <isr...@ravnalaska.net> writes:
>> t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
>> increment the appropriate bin counts using a bunch of if statements.
> 
>  I can't really completely comprehend your requirements
>  specification, you might have perfectly described it all and
>  it's just too complicated for me to comprehend, but I just
>  would like to add that there are several ways to implement a
>  "two-dimensional" matrix. You can also imagine your
>  dictionary like this:
> 
> example =
> { 'd10': 0, 'd15': 0, 'd20': 0, 'd215': 0,
>  'w10': 0, 'w15': 0, 'w20': 0, 'w215': 0,
>  'm10': 0, 'm15': 0, 'm20': 0, 'm215': 0,
>  'y10': 0, 'y15': 0, 'y20': 0, 'y215': 0 }
> 
>  Then, when the categories are already in two variables, say,
>  »a« (»d«, »w«, »m«, or »y«) and »b« (»10«, »15«, »20«, or
>  »215«), you can address the appropriate bin as

Oh, I probably was a bit weak on the explanation somewhere. I'm still wrapping 
*my* head around some of the details. That's what makes it fun :-) If it helps, 
my data would look something like this:

[ (date, key, t1, t2), 
 (date, key, t1, t2)
.
.
]

Where the date and the key are what is used to determine what "on-time" is for 
the record, and thus which "late" bin to put it in. So if the date of the first 
record was today, t1 was on-time, and t2 was 5 minutes late, then I would need 
to increment ALL of the following (using your data structure from above):

d10, w10, m10, y10, d25, w25, m25 AND y25

Since this record counts not just for the current day, but also for 
week-to-date, month-to-date and year-to-date. Basically, as the time categories 
get larger, the percentage of the total records included in that date group 
also gets larger. The year-to-date group will include all records, grouped by 
lateness, the daily group will only include todays records.

Maybe that will help clear things up. Or not. :-)
> 
> example[ a + b ]+= 1

Not quite following the logic here. Sorry.

> 
>  . (And to not have to initialized the entries to zero,
>  class collections.defaultdict might come in handy.)

Yep, those are handy in many places. Thanks for the suggestion.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Efficient counting of results

2017-10-19 Thread Israel Brewster

> On Oct 19, 2017, at 9:40 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> I am working on developing a report that groups data into a two-dimensional 
> array based on date and time. More specifically, date is grouped into 
> categories:
> 
> day, week-to-date, month-to-date, and year-to-date
> 
> Then, for each of those categories, I need to get a count of records that 
> fall into the following categories:
> 
> 0 minutes late, 1-5 minutes late, and 6-15 minutes late
> 
> where minutes late will be calculated based on a known scheduled time and the 
> time in the record. To further complicate things, there are actually two 
> times in each record, so under the day, week-to-date, month-to-date etc 
> groups, there will be two sets of "late" bins, one for each time. In table 
> form it would look  something like this:
> 
>| day  |  week-to-date | month-to-date |  year-to-date  |
> 
> t1 0min| 
> t1 1-5 min| ...
> t1 6-15 min  | ...
> t2 0min| ...
> t2 1-5 min| ...
> t2 6-15 min  | ...
> 
> So in the extreme scenario of a record that is for the current day, it will 
> be counted into 8 bins: once each for day, week-to-date, month-to-date and 
> year-to-date under the proper "late" bin for the first time in the record, 
> and once each into each of the time groups under the proper "late" bin for 
> the second time in the record. An older record may only be counted twice, 
> under the year-to-date group. A record with no matching schedule is 
> discarded, as is any record that is "late" by more than 15 minutes (those are 
> gathered into a separate report)
> 
> My initial approach was to simply make dictionaries for each "row" in the 
> table, like so:
> 
> t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> .
> .
> t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> 
> then loop through the records, find the schedule for that record (if any, if 
> not move on as mentioned earlier), compare t1 and t2 against the schedule, 
> and increment the appropriate bin counts using a bunch of if statements. 
> Functional, if ugly. But then I got to thinking: I keep hearing about all 
> these fancy numerical analysis tools for python like pandas and numpy - could 
> something like that help? Might there be a way to simply set up a table with 
> "rules" for the columns and rows, and drop my records into the table, having 
> them automatically counted into the proper bins or something? Or am I over 
> thinking this, and the "simple", if ugly approach is best?

I suppose I should mention: my data source is the results of a psycopg2 query, 
so a "record" is a tuple or dictionary (depending on how I want to set up the 
cursor)

> 
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
> 
> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Efficient counting of results

2017-10-19 Thread Israel Brewster
I am working on developing a report that groups data into a two-dimensional 
array based on date and time. More specifically, date is grouped into 
categories:

day, week-to-date, month-to-date, and year-to-date

Then, for each of those categories, I need to get a count of records that fall 
into the following categories:

0 minutes late, 1-5 minutes late, and 6-15 minutes late

where minutes late will be calculated based on a known scheduled time and the 
time in the record. To further complicate things, there are actually two times 
in each record, so under the day, week-to-date, month-to-date etc groups, there 
will be two sets of "late" bins, one for each time. In table form it would look 
 something like this:

| day  |  week-to-date | month-to-date |  year-to-date  |

t1 0min| 
t1 1-5 min| ...
t1 6-15 min  | ...
t2 0min| ...
t2 1-5 min| ...
t2 6-15 min  | ...

So in the extreme scenario of a record that is for the current day, it will be 
counted into 8 bins: once each for day, week-to-date, month-to-date and 
year-to-date under the proper "late" bin for the first time in the record, and 
once each into each of the time groups under the proper "late" bin for the 
second time in the record. An older record may only be counted twice, under the 
year-to-date group. A record with no matching schedule is discarded, as is any 
record that is "late" by more than 15 minutes (those are gathered into a 
separate report)

My initial approach was to simply make dictionaries for each "row" in the 
table, like so:

t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
.
.
t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}

then loop through the records, find the schedule for that record (if any, if 
not move on as mentioned earlier), compare t1 and t2 against the schedule, and 
increment the appropriate bin counts using a bunch of if statements. 
Functional, if ugly. But then I got to thinking: I keep hearing about all these 
fancy numerical analysis tools for python like pandas and numpy - could 
something like that help? Might there be a way to simply set up a table with 
"rules" for the columns and rows, and drop my records into the table, having 
them automatically counted into the proper bins or something? Or am I over 
thinking this, and the "simple", if ugly approach is best?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 249 Compliant error handling

2017-10-18 Thread Israel Brewster
On Oct 17, 2017, at 12:02 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> 
> On 2017-10-17 20:25, Israel Brewster wrote:
>> 
>>> On Oct 17, 2017, at 10:35 AM, MRAB <pyt...@mrabarnett.plus.com 
>>> <mailto:pyt...@mrabarnett.plus.com>> wrote:
>>> 
>>> On 2017-10-17 18:26, Israel Brewster wrote:
>>>> I have written and maintain a PEP 249 compliant (hopefully) DB API for the 
>>>> 4D database, and I've run into a situation where corrupted string data 
>>>> from the database can cause the module to error out. Specifically, when 
>>>> decoding the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't 
>>>> decode bytes in position 86-87: illegal UTF-16 surrogate" error. This 
>>>> makes sense, given that the string data got corrupted somehow, but the 
>>>> question is "what is the proper way to deal with this in the module?" 
>>>> Should I just throw an error on bad data? Or would it be better to set the 
>>>> errors parameter to something like "replace"? The former feels a bit more 
>>>> "proper" to me (there's an error here, so we throw an error), but leaves 
>>>> the end user dead in the water, with no way to retrieve *any* of the data 
>>>> (from that row at least, and perhaps any rows after it as well). The 
>>>> latter option sort of feels like sweeping the problem under the rug, but 
>>>> does at least leave an error character in the s
>>> tring to
>>> l
>>>>  et them know there was an error, and will allow retrieval of any good 
>>>> data.
>>>> Of course, if this was in my own code I could decide on a case-by-case 
>>>> basis what the proper action is, but since this a module that has to work 
>>>> in any situation, it's a bit more complicated.
>>> If a particular text field is corrupted, then raising UnicodeDecodeError 
>>> when trying to get the contents of that field as a Unicode string seems 
>>> reasonable to me.
>>> 
>>> Is there a way to get the contents as a bytestring, or to get the contents 
>>> with a different errors parameter, so that the user has the means to fix it 
>>> (if it's fixable)?
>> 
>> That's certainly a possibility, if that behavior conforms to the DB API 
>> "standards". My concern in this front is that in my experience working with 
>> other PEP 249 modules (specifically psycopg2), I'm pretty sure that columns 
>> designated as type VARCHAR or TEXT are returned as strings (unicode in 
>> python 2, although that may have been a setting I used), not bytes. The 
>> other complication here is that the 4D database doesn't use the UTF-8 
>> encoding typically found, but rather UTF-16LE, and I don't know how well 
>> this is documented. So not only is the bytes representation completely 
>> unintelligible for human consumption, I'm not sure the average end-user 
>> would know what decoding to use.
>> 
>> In the end though, the main thing in my mind is to maintain "standards" 
>> compatibility - I don't want to be returning bytes if all other DB API 
>> modules return strings, or visa-versa for that matter. There may be some 
>> flexibility there, but as much as possible I want to conform to the 
>> majority/standard/whatever
>> 
> The average end-user might not know which encoding is being used, but 
> providing a way to read the underlying bytes will give a more experienced 
> user the means to investigate and possibly fix it: get the bytes, figure out 
> what the string should be, update the field with the correctly decoded string 
> using normal DB instructions.

I agree, and if I was just writing some random module I'd probably go with it, 
or perhaps with the suggestion offered by Karsten Hilbert. However, neither 
answer addresses my actual question, which is "how does the STANDARD (PEP 249 
in this case) say to handle this, or, baring that (since the standard probably 
doesn't explicitly say), how do the MAJORITY of PEP 249 compliant modules 
handle this?" Not what is the *best* way to handle it, but rather what is the 
normal, expected behavior for a Python DB API module when presented with bad 
data? That is, how does psycopg2 behave? pyodbc? pymssql (I think)? Etc. Or is 
that portion of the behavior completely arbitrary and different for every 
module?

It may well be that one of the suggestions *IS* the normal, expected, behavior, 
but it sounds more like you are suggesting how you think would be best to 
handle it, which is appreciated but not actually what I'm asking :-) Sorry if I 
am being difficult.

> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 249 Compliant error handling

2017-10-18 Thread Israel Brewster


> On Oct 18, 2017, at 1:46 AM, Abdur-Rahmaan Janhangeer <arj.pyt...@gmail.com> 
> wrote:
> 
> all corruption systematically ignored but data piece logged in for analysis

Thanks. Can you expound a bit on what you mean by "data piece logged in" in 
this context? I'm not aware of any logging specifications in the PEP 249, and 
would think that would be more end-user configured rather than module level.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> Abdur-Rahmaan Janhangeer,
> Mauritius
> abdurrahmaanjanhangeer.wordpress.com 
> <http://abdurrahmaanjanhangeer.wordpress.com/>
> 
> On 17 Oct 2017 21:43, "Israel Brewster" <isr...@ravnalaska.net 
> <mailto:isr...@ravnalaska.net>> wrote:
> I have written and maintain a PEP 249 compliant (hopefully) DB API for the 4D 
> database, and I've run into a situation where corrupted string data from the 
> database can cause the module to error out. Specifically, when decoding the 
> string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in 
> position 86-87: illegal UTF-16 surrogate" error. This makes sense, given that 
> the string data got corrupted somehow, but the question is "what is the 
> proper way to deal with this in the module?" Should I just throw an error on 
> bad data? Or would it be better to set the errors parameter to something like 
> "replace"? The former feels a bit more "proper" to me (there's an error here, 
> so we throw an error), but leaves the end user dead in the water, with no way 
> to retrieve *any* of the data (from that row at least, and perhaps any rows 
> after it as well). The latter option sort of feels like sweeping the problem 
> under the rug, but does at least leave an error character in the string to
  l
>  et them know there was an error, and will allow retrieval of any good data.
> 
> Of course, if this was in my own code I could decide on a case-by-case basis 
> what the proper action is, but since this a module that has to work in any 
> situation, it's a bit more complicated.
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293 <tel:%28907%29%20450-7293>
> ---
> 
> 
> 
> 
> --
> https://mail.python.org/mailman/listinfo/python-list 
> <https://mail.python.org/mailman/listinfo/python-list>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 249 Compliant error handling

2017-10-17 Thread Israel Brewster

> On Oct 17, 2017, at 10:35 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> 
> On 2017-10-17 18:26, Israel Brewster wrote:
>> I have written and maintain a PEP 249 compliant (hopefully) DB API for the 
>> 4D database, and I've run into a situation where corrupted string data from 
>> the database can cause the module to error out. Specifically, when decoding 
>> the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode 
>> bytes in position 86-87: illegal UTF-16 surrogate" error. This makes sense, 
>> given that the string data got corrupted somehow, but the question is "what 
>> is the proper way to deal with this in the module?" Should I just throw an 
>> error on bad data? Or would it be better to set the errors parameter to 
>> something like "replace"? The former feels a bit more "proper" to me 
>> (there's an error here, so we throw an error), but leaves the end user dead 
>> in the water, with no way to retrieve *any* of the data (from that row at 
>> least, and perhaps any rows after it as well). The latter option sort of 
>> feels like sweeping the problem under the rug, but does at least leave an 
>> error character in the s
> tring to
> l
>>  et them know there was an error, and will allow retrieval of any good data.
>> Of course, if this was in my own code I could decide on a case-by-case basis 
>> what the proper action is, but since this a module that has to work in any 
>> situation, it's a bit more complicated.
> If a particular text field is corrupted, then raising UnicodeDecodeError when 
> trying to get the contents of that field as a Unicode string seems reasonable 
> to me.
> 
> Is there a way to get the contents as a bytestring, or to get the contents 
> with a different errors parameter, so that the user has the means to fix it 
> (if it's fixable)?

That's certainly a possibility, if that behavior conforms to the DB API 
"standards". My concern in this front is that in my experience working with 
other PEP 249 modules (specifically psycopg2), I'm pretty sure that columns 
designated as type VARCHAR or TEXT are returned as strings (unicode in python 
2, although that may have been a setting I used), not bytes. The other 
complication here is that the 4D database doesn't use the UTF-8 encoding 
typically found, but rather UTF-16LE, and I don't know how well this is 
documented. So not only is the bytes representation completely unintelligible 
for human consumption, I'm not sure the average end-user would know what 
decoding to use.

In the end though, the main thing in my mind is to maintain "standards" 
compatibility - I don't want to be returning bytes if all other DB API modules 
return strings, or visa-versa for that matter. There may be some flexibility 
there, but as much as possible I want to conform to the 
majority/standard/whatever

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


PEP 249 Compliant error handling

2017-10-17 Thread Israel Brewster
I have written and maintain a PEP 249 compliant (hopefully) DB API for the 4D 
database, and I've run into a situation where corrupted string data from the 
database can cause the module to error out. Specifically, when decoding the 
string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in 
position 86-87: illegal UTF-16 surrogate" error. This makes sense, given that 
the string data got corrupted somehow, but the question is "what is the proper 
way to deal with this in the module?" Should I just throw an error on bad data? 
Or would it be better to set the errors parameter to something like "replace"? 
The former feels a bit more "proper" to me (there's an error here, so we throw 
an error), but leaves the end user dead in the water, with no way to retrieve 
*any* of the data (from that row at least, and perhaps any rows after it as 
well). The latter option sort of feels like sweeping the problem under the rug, 
but does at least leave an error character in the string to l
 et them know there was an error, and will allow retrieval of any good data.

Of course, if this was in my own code I could decide on a case-by-case basis 
what the proper action is, but since this a module that has to work in any 
situation, it's a bit more complicated.
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Psycopg2 pool clarification

2017-06-08 Thread Israel Brewster

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




> On Jun 7, 2017, at 10:31 PM, dieter <die...@handshake.de> wrote:
> 
> israel <isr...@ravnalaska.net> writes:
>> On 2017-06-06 22:53, dieter wrote:
>> ...
>> As such, using psycopg2's pool is essentially
>> worthless for me (plenty of use for it, i'm sure, just not for me/my
>> use case).
> 
> Could you not simply adjust the value for the "min" parameter?
> If you want at least "n" open connections, then set "min" to "n".

Well, sure, if I didn't care about wasting resources (which, I guess many 
people don't). I could set "n" to some magic number that would always give 
"enough" connections, such that my application never has to open additional 
connections, then adjust that number every few months as usage changes. In 
fact, now that I know how the logic of the pool works, that's exactly what I'm 
doing until I am confident that my caching replacement is solid. 

Of course, in order to avoid having to open/close a bunch of connections during 
the times when it is most critical - that is, when the server is under heavy 
load - I have to set that number arbitrarily high. Furthermore, that means that 
much of the time many, if not most, of those connections would be idle. Each 
connection uses a certain amount of RAM on the server, not to mention using up 
limited connection slots, so now I've got to think about if my server is sized 
properly to be able to handle that load not just occasionally, but constantly - 
when reducing server load by reducing the frequency of connections being 
opened/closed was the goal in the first place. So all I've done is trade 
dynamic load for static load - increasing performance at the cost of resources, 
rather than more intelligently using the available resources. All-in-all, not 
the best solution, though it does work. Maybe if load was fairly constant it 
would make more sense though. So like I said *my* use case, which
  is a number of web apps with varying loads, loads that also vary from 
day-to-day and hour-to-hour.

On the other hand, a pool that caches connections using the logic I laid out in 
my original post would avoid the issue. Under heavy load, it could open 
additional connections as needed - a performance penalty for the first few 
users over the min threshold, but only the first few, rather than all the users 
over a certain threshold ("n"). Those connections would then remain available 
for the duration of the load, so it doesn't need to open/close numerous 
connections. Then, during periods of lighter load, the unused connections can 
drop off, freeing up server resources for other uses. A well-written pool could 
even do something like see that the available connection pool is running low, 
and open a few more connections in the background, thus completely avoiding the 
connection overhead on requests while never having more than a few "extra" 
connections at any given time. Even if you left of the expiration logic, it 
would still be an improvement, because while unused connections wouldn't d
 rop, the "n" open connections could scale up dynamically until you have 
"enough" connections, without having to figure out and hard-code that "magic 
number" of open connections.

Why wouldn't I want something like that? It's not like its hard to code - took 
me about an hour and a half to get to a working prototype yesterday. Still need 
to write tests and add some polish, but it works. Perhaps, though, the common 
thought is just "throw more hardware at it and keep a lot of connections open 
at all time?" Maybe I was raised to conservatively, or the company I work for 
is too poor :-D

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Psycopg2 pool clarification

2017-06-02 Thread Israel Brewster
I've been using the psycopg2 pool class for a while now, using code similar to 
the following:

>>> pool=ThreadedConnectionPool(0,5,)
>>> conn1=pool.getconn()
>>> 
>>> pool.putconn(conn1)
 repeat later, or perhaps "simultaneously" in a different thread.

and my understanding was that the pool logic was something like the following:

- create a "pool" of connections, with an initial number of connections equal 
to the "minconn" argument
- When getconn is called, see if there is an available connection. If so, 
return it. If not, open a new connection and return that (up to "maxconn" total 
connections)
- When putconn is called, return the connection to the pool for re-use, but do 
*not* close it (unless the close argument is specified as True, documentation 
says default is False)
- On the next request to getconn, this connection is now available and so no 
new connection will be made
- perhaps (or perhaps not), after some time, unused connections would be closed 
and purged from the pool to prevent large numbers of only used once connections 
from laying around.

However, in some testing I just did, this doesn't appear to be the case, at 
least based on the postgresql logs. Running the following code:

>>> pool=ThreadedConnectionPool(0,5,)
>>> conn1=pool.getconn()
>>> conn2=pool.getconn()
>>> pool.putconn(conn1)
>>> pool.putconn(conn2)
>>> conn3=pool.getconn()
>>> pool.putconn(conn3)

produced the following output in the postgresql log:

2017-06-02 14:30:26 AKDT LOG:  connection received: host=::1 port=64786
2017-06-02 14:30:26 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:30:35 AKDT LOG:  connection received: host=::1 port=64788
2017-06-02 14:30:35 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:30:46 AKDT LOG:  disconnection: session time: 0:00:19.293 
user=logger database=flightlogs host=::1 port=64786
2017-06-02 14:30:53 AKDT LOG:  disconnection: session time: 0:00:17.822 
user=logger database=flightlogs host=::1 port=64788
2017-06-02 14:31:15 AKDT LOG:  connection received: host=::1 port=64790
2017-06-02 14:31:15 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:31:20 AKDT LOG:  disconnection: session time: 0:00:05.078 
user=logger database=flightlogs host=::1 port=64790

Since I set the maxconn parameter to 5, and only used 3 connections, I wasn't 
expecting to see any disconnects - and yet as soon as I do putconn, I *do* see 
a disconnection. Additionally, I would have thought that when I pulled 
connection 3, there would have been two connections available, and so it 
wouldn't have needed to connect again, yet it did. Even if I explicitly say 
close=False in the putconn call, it still closes the connection and has to open

What am I missing? From this testing, it looks like I get no benefit at all 
from having the connection pool, unless you consider an upper limit to the 
number of simultaneous connections a benefit? :-) Maybe a little code savings 
from not having to manually call connect and close after each connection, but 
that's easily gained by simply writing a context manager. I could get *some* 
limited benefit by raising the minconn value, but then I risk having 
connections that are *never* used, yet still taking resources on the DB server.

Ideally, it would open as many connections as are needed, and then leave them 
open for future requests, perhaps with an "idle" timeout. Is there any way to 
achieve this behavior?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


CherryPy Session object creation logic

2017-06-02 Thread Israel Brewster
I have a CherryPy app, for which I am using a PostgreSQL session. To be more 
exact, I modified a MySQL session class I found to work with PostgreSQL 
instead, and then I put this line in my code:

cherrypy.lib.sessions.PostgresqlSession = PostgreSQLSession

And this works fine. One thing about its behavior is bugging me, however: 
accessing a page instantiates (and deletes) *many* instances of this class, all 
for the same session. Doing some debugging, I counted 21 calls to the __init__ 
function when loading a single page. Logging in and displaying the next page 
hit it an additional 8 times. My theory is that essentially every time I try to 
read from or write to the session, CherryPy is instantiating a new 
PostgreSQLSession object, performing the request, and deleting the session 
object. In that simple test, that means 29 connections to the database, 29 
instantiations, etc - quite a bit of overhead, not to mention the load on my 
database server making/breaking those connections (although it handles it fine).

Is this "normal" behavior? Or did I mess something up with my session class? 
I'm thinking that ideally CherryPy would only create one object - and 
therefore, one DB connection - for a given session, and then simply hold on to 
that object until that session expired. But perhaps not?
-------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Proper way to run CherryPy app as a daemon?

2017-03-28 Thread Israel Brewster
I am wanting to run a CherryPy app as a daemon on CentOS 6 using an init.d 
script. By subscribing to the "Daemonizer" and PIDFile cherrypy plugins, I have 
been able to write an init.d script that starts and stops my CherryPy 
application. There's only one problem: it would appear that the program 
daemonizes, thus allowing the init.d script to return a good start, as soon as 
I call cherrypy.engine.start(), but *before* the cherrypy app has actually 
started. Particularly, this occurs before cherrypy has bound to the desired 
port. The end result is that running "service  start" returns OK, 
indicating that the app is now running, even when it cannot bind to the port, 
thus preventing it from actually starting. This is turn causes issues with my 
clustering software which thinks it started just fine, when in fact it never 
*really* started.

As such, is there a way to delay the demonization until I call 
cherrypy.engine.block()? Or some other way to prevent the init.d script from 
indicating a successful start until the process has actually bound to the 
needed port and fully started? What is the proper way of doing this? 

Thanks! 
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Error handling in context managers

2017-01-17 Thread Israel Brewster
On Jan 16, 2017, at 11:34 PM, Peter Otten <__pete...@web.de> wrote:
> 
> Gregory Ewing wrote:
> 
>> Israel Brewster wrote:
>>> The problem is that, from time to time, I can't get a connection, the
>>> result being that cursor is None,
>> 
>> That's your problem right there -- you want a better-behaved
>> version of psql_cursor().
>> 
>> def get_psql_cursor():
>>c = psql_cursor()
>>if c is None:
>>   raise CantGetAConnectionError()
>>return c
>> 
>> with get_psql_cursor() as c:
>>...
> 
> You still need to catch the error -- which leads to option (3) in my zoo, 
> the only one that is actually usable. If one contextmanager cannot achieve 
> what you want, use two:
> 
> $ cat conditional_context_raise.py
> import sys
> from contextlib import contextmanager
> 
> class NoConnection(Exception):
>pass
> 
> class Cursor:
>def execute(self, sql):
>print("EXECUTING", sql)
> 
> @contextmanager
> def cursor():
>if "--fail" in sys.argv:
>raise NoConnection
>yield Cursor()
> 
> @contextmanager
> def no_connection():
>try:
>yield
>except NoConnection:
>print("no connection")
> 
> with no_connection(), cursor() as cs:
>cs.execute("insert into...")
> $ python3 conditional_context_raise.py
> EXECUTING insert into...
> $ python3 conditional_context_raise.py --fail
> no connection
> 
> If you want to ignore the no-connection case use 
> contextlib.suppress(NoConnection) instead of the custom no_connection() 
> manager.

Fun :-) I'll have to play around with that. Thanks! :-)

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Error handling in context managers

2017-01-17 Thread Israel Brewster
On Jan 16, 2017, at 8:01 PM, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:
> 
> Israel Brewster wrote:
>> The problem is that, from time to time, I can't get a connection, the result
>> being that cursor is None,
> 
> That's your problem right there -- you want a better-behaved
> version of psql_cursor().
> 
> def get_psql_cursor():
>   c = psql_cursor()
>   if c is None:
>  raise CantGetAConnectionError()
>   return c
> 
> with get_psql_cursor() as c:
>   ...
> 

Ok, fair enough. So I get a better exception, raised at the proper time. This 
is, in fact, better - but doesn't actually change how I would *handle* the 
exception :-)
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> -- 
> Greg
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Error handling in context managers

2017-01-17 Thread Israel Brewster
On Jan 16, 2017, at 1:27 PM, Terry Reedy <tjre...@udel.edu> wrote:
> 
> On 1/16/2017 1:06 PM, Israel Brewster wrote:
>> I generally use context managers for my SQL database connections, so I can 
>> just write code like:
>> 
>> with psql_cursor() as cursor:
>>
>> 
>> And the context manager takes care of making a connection (or getting a 
>> connection from a pool, more likely), and cleaning up after the fact (such 
>> as putting the connection back in the pool), even if something goes wrong. 
>> Simple, elegant, and works well.
>> 
>> The problem is that, from time to time, I can't get a connection, the result 
>> being that cursor is None,
> 
> This would be like open('bad file') returning None instead of raising 
> FileNotFoundError.
> 
>> and attempting to use it results in an AttributeError.
> 
> Just as None.read would.
> 
> Actually, I have to wonder about your claim.  The with statement would look 
> for cursor.__enter__ and then cursor.__exit__, and None does not have those 
> methods.  In other words, the expression following 'with' must evaluate to a 
> context manager and None is not a context manager.
> 
> >>> with None: pass
> 
> Traceback (most recent call last):
>  File "<pyshell#3>", line 1, in 
>with None: pass
> AttributeError: __enter__
> 
> Is psql_cursor() returning a fake None object with __enter__ and __exit__ 
> methods?

No, the *context manager*, which I call in the with *does* have __enter__ and 
__exit__ methods. It's just that the __enter__ method returns None when it 
can't get a connection. So the expression following with *does* evaluate to a 
context manager, but the expression following as evaluates to None.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> -- 
> Terry Jan Reedy
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Error handling in context managers

2017-01-16 Thread Israel Brewster
I generally use context managers for my SQL database connections, so I can just 
write code like:

with psql_cursor() as cursor:


And the context manager takes care of making a connection (or getting a 
connection from a pool, more likely), and cleaning up after the fact (such as 
putting the connection back in the pool), even if something goes wrong. Simple, 
elegant, and works well.

The problem is that, from time to time, I can't get a connection, the result 
being that cursor is None, and attempting to use it results in an 
AttributeError. So my instinctive reaction is to wrap the potentially offending 
code in a try block, such that if I get that AttributeError I can decide how I 
want to handle the "no connection" case. This, of course, results in code like:

try:
with psql_cursor() as cursor:

except AttributeError as e:


I could also wrap the code within the context manager in an if block checking 
for if cursor is not None, but while perhaps a bit clearer as to the purpose, 
now I've got an extra check that will not be needed most of the time (albeit a 
quite inexpensive check).

The difficulty I have with either of these solutions, however, is that they 
feel ugly to me - and wrapping the context manager in a try block almost seems 
to defeat the purpose of the context manager in the first place - If I'm going 
to be catching errors anyway, why not just do the cleanup there rather than 
hiding it in the context manager?

Now don't get me wrong - neither of these issues is terribly significant to me. 
I'll happily wrap all the context manager calls in a try block and move on with 
life if that it in fact the best option. It's just my gut says "there should be 
a better way", so I figured I'd ask: *is* there a better way? Perhaps some way 
I could handle the error internally to the context manager, such that it just 
dumps me back out? Of course, that might not work, given that I may need to do 
something different *after* the context manager, depending on if I was able to 
get a connection, but it's a thought. Options?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Testing POST in cherrypy

2016-10-04 Thread Israel Brewster
When testing CherryPy using a cherrypy.text.helper.CPWebCase subclass, I can 
test a page request by calling "self.getPage()", and in that call I can specify 
a method (GET/POST etc). When specifying a POST, how do I pass the parameters? 
I know for a POST the parameters are in the body of the request, but in what 
format? Do I just url lib.urlencode() a dictionary and pass that as the body, 
or is there some other method I should use? Thanks!
-------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Handling transactions in Python DBI module

2016-02-11 Thread Israel Brewster
On Feb 10, 2016, at 8:14 PM, Chris Angelico  wrote:
> 
> On Thu, Feb 11, 2016 at 4:06 PM, Frank Millman  wrote:
>> A connection has 2 possible states - 'in transaction', or 'not in
>> transaction'. When you create the connection it starts off as 'not'.
>> 
>> When you call cur.execute(), it checks to see what state it is in. If the
>> state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing
>> your statement. This applies for SELECT as well as other statements.
>> 
>> All subsequent statements form part of the transaction, until you issue
>> either conn.commit() or conn.rollback(). This performs the required action,
>> and resets the state to 'not'.
>> 
>> I learned the hard way that it is important to use conn.commit() and not
>> cur.execute('commit'). Both succeed in committing, but the second does not
>> reset the state, therefore the next statement does not trigger a 'BEGIN',
>> with possible unfortunate side-effects.
> 
> When I advise my students on basic databasing concepts, I recommend
> this structure:
> 
> conn = psycopg2.connect(...)
> 
> with conn, conn.cursor() as cur:
>cur.execute(...)

And that is the structure I tend to use in my programs as well. I could, of 
course, roll the transaction control into that structure. However, that is a 
usage choice of the end user, whereas I am looking at the design of the 
connection/cursor itself. If I use psycopg, I get the transaction - even if I 
don't use a with block.

> 
> The transaction block should always start at the 'with' block and end
> when it exits. As long as you never nest them (including calling other
> database-using functions from inside that block), it's easy to reason
> about the database units of work - they always correspond perfectly to
> the code blocks.
> 
> Personally, I'd much rather the structure were "with
> conn.transaction() as cur:", because I've never been able to
> adequately explain what a cursor is/does. It's also a bit weird that
> "with conn:" doesn't close the connection at the end (just closes the
> transaction within that connection). But I guess we don't need a
> "Python DB API 3.0".

In my mind, cursors are simply query objects containing (potentially) result 
sets - so you could have two cursors, and loop through them something like "for 
result_1,result_2 in zip(cursor_1,cursor_2): ". Personally, I've never had a 
need for more than one cursor, but if you are working with large data sets, and 
need to work with multiple queries simultaneously without the overhead of 
loading the results into memory, I could see them being useful.

Of course, someone else might have a completely different explanation :-)
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Handling transactions in Python DBI module

2016-02-11 Thread Israel Brewster
On Feb 10, 2016, at 8:06 PM, Frank Millman <fr...@chagford.com> wrote:
> 
> "Israel Brewster"  wrote in message 
> news:92d3c964-0323-46ee-b770-b89e7e7e6...@ravnalaska.net...
> 
>> I am working on implementing a Python DB API module, and am hoping I can get 
>> some help with figuring out the workflow of handling transactions. In my 
>> experience (primarily with
>> psycopg2) the workflow goes like this:
>> 
>> - When you open a connection (or is it when you get a cursor? I *think* it 
>> is on opening a connection), a new transaction is started
>> - When you close a connection, an implicit ROLLBACK is performed
>> - After issuing SQL statements that modify the database, you call commit() 
>> on the CONNECTION object, not the cursor.
>> 
>> My primary confusion is that at least for the DB I am working on, to 
>> start/rollback/commit a transaction, you execute the appropriate SQL 
>> statement (the c library I'm using doesn't
>> have any transactional commands, not that it should). However, to execute 
>> the statement, you need a cursor. So how is this *typically* handled? Does 
>> the connection object keep an > internal cursor that it uses to manage 
>> transactions?
>> 
>> I'm assuming, since it is called on the connection, not the cursor, that any 
>> COMMIT/ROLLBACK commands called affect all cursors on that connection. Is 
>> that correct? Or is this DB
>> specific?
>> 
>> Finally, how do other DB API modules, like psycopg2, ensure that ROLLBACK is 
>> called if the user never explicitly calls close()?
> 
> Rather than try to answer your questions point-by-point, I will describe the 
> results of some investigations I carried out into this subject a while ago.
> 
> I currently support 3 databases, so I use 3 DB API modules - 
> PostgreSQL/psycopg2, Sql Server/pyodbc, and sqlite3/sqlite3. The following 
> applies specifically to psycopg2, but I applied the lessons learned to the 
> other 2 as well, and have had no issues.
> 
> A connection has 2 possible states - 'in transaction', or 'not in 
> transaction'. When you create the connection it starts off as 'not'.
> 
> When you call cur.execute(), it checks to see what state it is in. If the 
> state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing 
> your statement. This applies for SELECT as well as other statements.
> 
> All subsequent statements form part of the transaction, until you issue 
> either conn.commit() or conn.rollback(). This performs the required action, 
> and resets the state to 'not'.
> 
> I learned the hard way that it is important to use conn.commit() and not 
> cur.execute('commit'). Both succeed in committing, but the second does not 
> reset the state, therefore the next statement does not trigger a 'BEGIN', 
> with possible unfortunate side-effects.

Thanks - that is actually quite helpful. So the way I am looking at it now is 
that the connection would have an internal cursor as I suggested. From your 
response, I'll add a "state" flag as well. If the state flag is not set when 
execute is called on a cursor, the cursor itself will start a transaction and 
set the flag (this could happen from any cursor, though, so that could 
potentially cause a race condition, correct?). In any case, there is now a 
transaction open, until such a time as commit() or rollback() is called on the 
connection, or close is called, which executes a rollback(), using the 
connection's internal cursor.

Hopefully that all sounds kosher. 

> 
> HTH
> 
> Frank Millman
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Handling transactions in Python DBI module

2016-02-10 Thread Israel Brewster
I am working on implementing a Python DB API module, and am hoping I can get 
some help with figuring out the workflow of handling transactions. In my 
experience (primarily with psycopg2) the workflow goes like this:

- When you open a connection (or is it when you get a cursor? I *think* it is 
on opening a connection), a new transaction is started
- When you close a connection, an implicit ROLLBACK is performed
- After issuing SQL statements that modify the database, you call commit() on 
the CONNECTION object, not the cursor.

My primary confusion is that at least for the DB I am working on, to 
start/rollback/commit a transaction, you execute the appropriate SQL statement 
(the c library I'm using doesn't have any transactional commands, not that it 
should). However, to execute the statement, you need a cursor. So how is this 
*typically* handled? Does the connection object keep an internal cursor that it 
uses to manage transactions?

I'm assuming, since it is called on the connection, not the cursor, that any 
COMMIT/ROLLBACK commands called affect all cursors on that connection. Is that 
correct? Or is this DB specific?

Finally, how do other DB API modules, like psycopg2, ensure that ROLLBACK is 
called if the user never explicitly calls close()?

Thanks for any assistance that can be provided.
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Designing DBI compliant SQL parameters for module

2015-11-23 Thread Israel Brewster
My company uses a database (4th dimension) for which there was no python DBI 
compliant driver available (I had to use ODBC, which I felt was cludgy). 
However, I did discover that the company had a C driver available, so I went 
ahead and used CFFI to wrap this driver into a DBI compliant python module 
(https://pypi.python.org/pypi/p4d). This works well (still need to make it 
python 3.x compatible), but since the underlying C library uses "qmark" style 
parameter markers, that's all I implemented in my module.

I would like to expand the module to be able to use the more-common (or at 
least easier for me) "format" and "pyformat" parameter markers, as indicated in 
the footnote to PEP-249 (https://www.python.org/dev/peps/pep-0249/#id2 at least 
for the pyformat markers). Now I am fairly confidant that I can write code to 
convert such placeholders into the qmark style markers that the underlying 
library provides, but before I go and re-invent the wheel, is there already 
code that does this which I can simply use, or modify?
-------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bi-directional sub-process communication

2015-11-23 Thread Israel Brewster
On Nov 23, 2015, at 12:45 PM, Cameron Simpson <c...@zip.com.au> wrote:
> 
> On 23Nov2015 12:22, Israel Brewster <isr...@ravnalaska.net> wrote:
>> On Nov 23, 2015, at 11:51 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>>> Concurrency, ugh.
> 
> I'm a big concurrency fan myself.
> 
>>> It's probably better just to have a Condition/Event per thread and
>>> have the response thread identify the correct one to notify, rather
>>> than just notify a single shared Condition and hope the threads wake
>>> up in the right order.
>> 
>> Tell me about it :-) I've actually never worked with conditions or 
>> notifications (actually even this bi-drectional type of communication is new 
>> to me), so I'll have to look into that and figure it out. Thanks for the 
>> information!
> 
> I include a tag with every request, and have the responses include the tag; 
> the request submission function records the response hander in a mapping by 
> tag and the response handing thread looks up the mapping and passes the 
> response to the right handler.
> 
> Works just fine and avoids all the worrying about ordering etc.
> 
> Israel, do you have control over the protocol between you and your 
> subprocess?  If so, adding tags is easy and effective.

I do, and the basic concept makes sense. The one difficulty I am seeing is 
getting back to the thread that requested the data.  Let me know if this makes 
sense or I am thinking about it wrong:

- When a thread requests some data, it sends the request as a dictionary 
containing a tag (unique to the thread) as well as the request
- When the child processes the request, it encodes the response as a dictionary 
containing the tag and the response data
- A single, separate thread on the "master" side parses out responses as they 
come in and puts them into a dictionary keyed by tag
- The requesting threads, after putting the request into the Queue, would then 
block waiting for data to appear under their key in the dictionary

Of course, that last step could be interesting - implementing the block in such 
a way as to not tie up the processor, while still getting the data "as soon" as 
it is available. Unless there is some sort of built-in notification system I 
could use for that? I.e. the thread would "subscribe" to a notification based 
on its tag, and then wait for notification. When the master processing thread 
receives data with said tag, it adds it to the dictionary and "publishes" a 
notification to that tag. Or perhaps the notification itself could contain the 
payload? 

Thanks for the information!

> 
> Cheers,
> Cameron Simpson <c...@zip.com.au>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bi-directional sub-process communication

2015-11-23 Thread Israel Brewster
On Nov 23, 2015, at 1:43 PM, Chris Kaynor <ckay...@zindagigames.com> wrote:
> 
> On Mon, Nov 23, 2015 at 2:18 PM, Israel Brewster <isr...@ravnalaska.net>
> wrote:
> 
>> Of course, that last step could be interesting - implementing the block in
>> such a way as to not tie up the processor, while still getting the data "as
>> soon" as it is available. Unless there is some sort of built-in
>> notification system I could use for that? I.e. the thread would "subscribe"
>> to a notification based on its tag, and then wait for notification. When
>> the master processing thread receives data with said tag, it adds it to the
>> dictionary and "publishes" a notification to that tag. Or perhaps the
>> notification itself could contain the payload?
> 
> 
> There are a few ways I could see handling this, without having the threads
> spinning and consuming CPU:
> 
>   1. Don't worry about having the follow-up code run in the same thread,
>   and use a simple callback. This callback could be dispatched to a thread
>   via a work queue, however you may not get the same thread as the one that
>   made the request. This is probably the most efficient method to use, as the
>   threads can continue doing other work while waiting for a reply, rather
>   than blocking. It does make it harder to maintain state between the pre-
>   and post-request functions, however.
>   2. Have a single, global, event variable that wakes all threads waiting
>   on a reply, each of which then checks to see if the reply is for it, or
>   goes back to sleep. This is good if most of the time, only a few threads
>   will be waiting for a reply, and checking if the correct reply came in is
>   cheap. This is probably good enough, unless you have a LOT of threads
>   (hundreds).
>   3. Have an event per thread. This will use less CPU than the second
>   option, however does require more memory and OS resources, and so will not
>   be viable for huge numbers of threads, though if you hit the limit, you are
>   probably using threads wrong.
>   4. Have an event per request. This is only better than #3 if a single
>   thread may make multiple requests at once, and can do useful work when any
>   of them get a reply back (if they need all, it will make no difference).
> 
> Generally, I would use option #1 or #2. Option 2 has the advantage of
> making it easy to write the functions that use the functionality, while
> option 1 will generally use fewer resources, and allows threads to continue
> to be used while waiting for replies. How much of a benefit that is depends
> on exactly what you are doing.

While I would agree with #1 in general, the threads, in this case, are CherryPy 
threads, so I need to get the data and return it to the client in the same 
function call, which of course means the thread needs to block until the data 
is ready - it can't return and let the result be processed "later".

Essentially there are times that the web client needs some information that 
only the Child process has. So the web client requests the data from the master 
process, and the master process then turns around and requests the data from 
the child, but it needs to get the data back before it can return it to the web 
client. So it has to block waiting for the data.

Thus we come to option #2 (or 3), which sounds good but I have no clue how to 
implement :-) Maybe something like http://pubsub.sourceforge.net ? I'll dig 
into that.

> 
> Option #4 would probably be better implemented using option #1 in all cases
> to avoid problems with running out of OS memory - threading features
> generally require more limited OS resources than memory. Option #3 will
> also often run into the same issues as option #4 in the cases it will
> provide any benefit over option #2.
> 
> Chris
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bi-directional sub-process communication

2015-11-23 Thread Israel Brewster
On Nov 23, 2015, at 3:05 PM, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> 
> On Mon, 23 Nov 2015 08:54:38 -0900, Israel Brewster <isr...@ravnalaska.net>
> declaimed the following:
> 
>> Concern: Since the master process is multi-threaded, it seems likely enough 
>> that multiple threads on the master side would make requests at the same 
>> time. I understand that the Queue class has locks that make
> 
>   Multiple "master" threads, to me, means you do NOT have a "master
> process".

But I do: the CherryPy "application", which has multiple threads - one per 
request (and perhaps a few more) to be exact. It's these request threads that 
generate the calls to the child process.

> 
>   Let there be a Queue for EVERY LISTENER.
> 
>   Send the Queue as part of the request packet.

No luck: "RuntimeError: Queue objects should only be shared between processes 
through inheritance"

This IS a master process, with multiple threads, trying to communicate with a 
child process. That said, with some modifications this sort of approach could 
still work.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


> 
>   Let the subthread reply to the queue that was provided via the packet
> 
>   Voila! No intermixing of "master/slave" interaction; each slave only
> replies to the master that sent it a command; each master only receives
> replies from slaves it has commanded. Slaves can still be shared, as they
> are given the information of which master they need to speak with.
> 
>   
> 
> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
>wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Bi-directional sub-process communication

2015-11-23 Thread Israel Brewster
I have a multi-threaded python app (CherryPy WebApp to be exact) that launches 
a child process that it then needs to communicate with bi-driectionally. To 
implement this, I have used a pair of Queues: a child_queue which I use for 
master->child communication, and a master_queue which is used for child->master 
communication.

The way I have the system set up, the child queue runs a loop in a tread that 
waits for messages on child_queue, and when received responds appropriately 
depending on the message received, which sometimes involves posting a message 
to master_queue.

On the master side, when it needs to communicate with the child process, it 
posts a message to child_queue, and if the request requires a response it will 
then immediately start waiting for a message on master_queue, typically with a 
timeout.

While this process works well in testing, I do have one concern (maybe 
unfounded) and a real-world issue

Concern: Since the master process is multi-threaded, it seems likely enough 
that multiple threads on the master side would make requests at the same time. 
I understand that the Queue class has locks that make this fine (one thread 
will complete posting the message before the next is allowed to start), and 
since the child process only has a single thread processing messages from the 
queue, it should process them in order and post the responses (if any) to the 
master_queue in order. But now I have multiple master processes all trying to 
read master_queue at the same time. Again, the locks will take care of this and 
prevent any overlapping reads, but am I guaranteed that the threads will obtain 
the lock and therefore read the responses in the right order? Or is there a 
possibility that, say, thread three will get the response that should have been 
for thread one? Is this something I need to take into consideration, and if so, 
how?

Real-world problem: While as I said this system worked well in testing, Now 
that I have gotten it out into production I've occasionally run into a problem 
where the master thread waiting for a response on master_queue times out while 
waiting. This causes a (potentially) two-fold problem, in that first off the 
master process doesn't get the information it had requested, and secondly that 
I *could* end up with an "orphaned" message on the queue that could cause 
problems the next time I try to read something from it.

I currently have the timeout set to 3 seconds. I can, of course, increase that, 
but that could lead to a bad user experience - and might not even help the 
situation if something else is going on. The actual exchange is quite simple:

On the master side, I have this code:

config.socket_queue.put('GET_PORT')
try:
port = config.master_queue.get(timeout=3)  #wait up to three seconds for a 
response
except Empty:
 port = 5000  # default. Can't hurt to try.

Which, as you might have been able to guess, tries to ask the child process (an 
instance of a tornado server, btw) what port it is listening on. The child 
process then, on getting this message from the queue, runs the following code:

elif item == 'GET_PORT':
  port = utils.config.getint('global', 'tornado.port')
  master_queue.put(port)

So nothing that should take any significant time. Of course, since this is a 
single thread handling any number of requests, it is possible that the thread 
is tied up responding to a different request (or that the GIL is preventing the 
thread from running at all, since another thread might be commandeering the 
processor), but I find it hard to believe that it could be tied up for more 
than three seconds.

So is there a better way to do sub-process bi-directional communication that 
would avoid these issues? Or do I just need to increase the timeout (or remove 
it altogether, at the risk of potentially causing the thread to hang if no 
message is posted)? And is my concern justified, or just paranoid? Thanks for 
any information that can be provided!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bi-directional sub-process communication

2015-11-23 Thread Israel Brewster
On Nov 23, 2015, at 11:51 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> 
> On Mon, Nov 23, 2015 at 12:55 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>> On Mon, Nov 23, 2015 at 10:54 AM, Israel Brewster <isr...@ravnalaska.net> 
>> wrote:
>>> Concern: Since the master process is multi-threaded, it seems likely enough 
>>> that multiple threads on the master side would make requests at the same 
>>> time. I understand that the Queue class has locks that make this fine (one 
>>> thread will complete posting the message before the next is allowed to 
>>> start), and since the child process only has a single thread processing 
>>> messages from the queue, it should process them in order and post the 
>>> responses (if any) to the master_queue in order. But now I have multiple 
>>> master processes all trying to read master_queue at the same time. Again, 
>>> the locks will take care of this and prevent any overlapping reads, but am 
>>> I guaranteed that the threads will obtain the lock and therefore read the 
>>> responses in the right order? Or is there a possibility that, say, thread 
>>> three will get the response that should have been for thread one? Is this 
>>> something I need to take into consideration, and if so, how?
>> 
>> Yes, if multiple master threads are waiting on the queue, it's
>> possible that a master thread could get a response that was not
>> intended for it. As far as I know there's no guarantee that the
>> waiting threads will be woken up in the order that they called get(),
>> but even if there are, consider this case:
>> 
>> Thread A enqueues a request.
>> Thread B preempts A and enqueues a request.
>> Thread B calls get on the response queue.
>> Thread A calls get on the response queue.
>> The response from A's request arrives and is given to B.
>> 
>> Instead of having the master threads pull objects off the response
>> queue directly, you might create another thread whose sole purpose is
>> to handle the response queue. That could look like this:
>> 
>> 
>> request_condition = threading.Condition()
>> response_global = None
>> 
>> def master_thread():
>>global response_global
>>with request_condition:
>>request_queue.put(request)
>>request_condition.wait()
>># Note: the Condition should remain acquired until
>> response_global is reset.
>>response = response_global
>>response_global = None
>>if wrong_response(response):
>>raise RuntimeError("got a response for the wrong request")
>>handle_response(response)
>> 
>> def response_thread():
>>global response_global
>>while True:
>>response = response_queue.get()
>>with request_condition:
>>response_global = response
>>request_condition.notify()
> 
> Actually I realized that this fails because if two threads get
> notified at about the same time, they could reacquire the Condition in
> the wrong order and so get the wrong responses.
> 
> Concurrency, ugh.
> 
> It's probably better just to have a Condition/Event per thread and
> have the response thread identify the correct one to notify, rather
> than just notify a single shared Condition and hope the threads wake
> up in the right order.

Tell me about it :-) I've actually never worked with conditions or 
notifications (actually even this bi-drectional type of communication is new to 
me), so I'll have to look into that and figure it out. Thanks for the 
information!

> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CherryPy cpstats and ws4py

2015-11-04 Thread Israel Brewster
Ok, let me ask a different question: the impression I have gotten when trying 
to find help with CherryPy in general and ws4py specifically is that these 
frameworks are not widely used or well supported. Is that a fair assessment, or 
do I just have issues that are outside the realm of experience for other users? 
If it is a fair assessment, should I be looking at a different product for my 
next project? I know there are a number of options, CherryPy was simply the 
first one suggested to me, and ws4py is what is listed in their docs as the 
framework to use for Web Sockets.

Thanks for any feedback that can be provided.
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


 

> On Nov 3, 2015, at 8:05 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> I posted this to the CherryPy and ws4py mailing lists, but in the week since 
> I did that I've only gotten two or three views on each list, and no 
> responses, so as a last-ditch effort I thought I'd post here. Maybe someone 
> with more general python knowledge than me can figure out the traceback and 
> from there a solution.
> 
> Is it possible to use ws4py in conjunction with the cpstats CherryPy tool? I 
> have a CherryPy (3.8.0) web app that uses web sockets via ws4py. Tested and 
> working. I am now trying to get a little more visibility into the functioning 
> of the server, so to that end I enabled the cpstats tool by adding the 
> following line to my '/' configuration:
> 
> tools.cpstats.on=True
> 
> Unfortunately, as soon as I do that, attempts to connect a web socket start 
> failing with the following traceback:
> 
> [28/Oct/2015:08:18:48]  
> Traceback (most recent call last):
>  File 
> "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
>  line 104, in run
>hook()
>  File 
> "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
>  line 63, in __call__
>return self.callback(**self.kwargs)
>  File "build/bdist.macosx-10.10-intel/egg/ws4py/server/cherrypyserver.py", 
> line 200, in upgrade
>ws_conn = get_connection(request.rfile.rfile)
>  File "build/bdist.macosx-10.10-intel/egg/ws4py/compat.py", line 43, in 
> get_connection
>return fileobj._sock
> AttributeError: 'KnownLengthRFile' object has no attribute '_sock'
> [28/Oct/2015:08:18:48] HTTP 
> Request Headers:
>  PRAGMA: no-cache
>  COOKIE: autoTabEnabled=true; fleetStatusFilterCompany=7H; 
> fleetStatusFilterLocation=ALL; fleetStatusRefreshInterval=5; inputNumLegs=5; 
> session_id=5c8303896aff419c175c79dfadbfdc9d75e6c45a
>  UPGRADE: websocket
>  HOST: flbubble.ravnalaska.net:8088
>  ORIGIN: http://flbubble.ravnalaska.net
>  CONNECTION: Upgrade
>  CACHE-CONTROL: no-cache
>  SEC-WEBSOCKET-VERSION: 13
>  SEC-WEBSOCKET-EXTENSIONS: x-webkit-deflate-frame
>  USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) 
> AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7
>  SEC-WEBSOCKET-KEY: Szh6Uoe+WzqKR1DgW8JcXA==
>  Remote-Addr: 10.9.1.59
> [28/Oct/2015:08:18:48] HTTP 
> Traceback (most recent call last):
>  File 
> "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
>  line 661, in respond
>self.hooks.run('before_request_body')
>  File 
> "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
>  line 114, in run
>raise exc
> AttributeError: 'KnownLengthRFile' object has no attribute '_sock'
> 
> Disable tools.cpstats.on, and the sockets start working again. Is there some 
> way I can fix this so I can use sockets as well as gather stats from my 
> application? Thanks.
> 
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


CherryPy cpstats and ws4py

2015-11-03 Thread Israel Brewster
I posted this to the CherryPy and ws4py mailing lists, but in the week since I 
did that I've only gotten two or three views on each list, and no responses, so 
as a last-ditch effort I thought I'd post here. Maybe someone with more general 
python knowledge than me can figure out the traceback and from there a solution.

Is it possible to use ws4py in conjunction with the cpstats CherryPy tool? I 
have a CherryPy (3.8.0) web app that uses web sockets via ws4py. Tested and 
working. I am now trying to get a little more visibility into the functioning 
of the server, so to that end I enabled the cpstats tool by adding the 
following line to my '/' configuration:

tools.cpstats.on=True

Unfortunately, as soon as I do that, attempts to connect a web socket start 
failing with the following traceback:

[28/Oct/2015:08:18:48]  
Traceback (most recent call last):
  File 
"/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
 line 104, in run
hook()
  File 
"/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
 line 63, in __call__
return self.callback(**self.kwargs)
  File "build/bdist.macosx-10.10-intel/egg/ws4py/server/cherrypyserver.py", 
line 200, in upgrade
ws_conn = get_connection(request.rfile.rfile)
  File "build/bdist.macosx-10.10-intel/egg/ws4py/compat.py", line 43, in 
get_connection
return fileobj._sock
AttributeError: 'KnownLengthRFile' object has no attribute '_sock'
[28/Oct/2015:08:18:48] HTTP 
Request Headers:
  PRAGMA: no-cache
  COOKIE: autoTabEnabled=true; fleetStatusFilterCompany=7H; 
fleetStatusFilterLocation=ALL; fleetStatusRefreshInterval=5; inputNumLegs=5; 
session_id=5c8303896aff419c175c79dfadbfdc9d75e6c45a
  UPGRADE: websocket
  HOST: flbubble.ravnalaska.net:8088
  ORIGIN: http://flbubble.ravnalaska.net
  CONNECTION: Upgrade
  CACHE-CONTROL: no-cache
  SEC-WEBSOCKET-VERSION: 13
  SEC-WEBSOCKET-EXTENSIONS: x-webkit-deflate-frame
  USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) 
AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7
  SEC-WEBSOCKET-KEY: Szh6Uoe+WzqKR1DgW8JcXA==
  Remote-Addr: 10.9.1.59
[28/Oct/2015:08:18:48] HTTP 
Traceback (most recent call last):
  File 
"/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
 line 661, in respond
self.hooks.run('before_request_body')
  File 
"/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py",
 line 114, in run
raise exc
AttributeError: 'KnownLengthRFile' object has no attribute '_sock'

Disable tools.cpstats.on, and the sockets start working again. Is there some 
way I can fix this so I can use sockets as well as gather stats from my 
application? Thanks.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 4:05 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> 
> On 2015-10-23 17:35, Israel Brewster wrote:
>> tl;dr: I've been using the multiprocessing module to run some
>> calculations in the background of my CherryPy web app, but apparently
>> this process sometimes gets stuck, causing problems with open sockets
>> piling up and blocking the app. Is there a better way?
>> 
>> The (rather wordy) details:
>> 
>> I have a moderately busy web app written in python using the CherryPy
>> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One
>> of the primary purposes of this web app is to track user-entered flight
>> logs, and keep a running tally of hours/cycles/landings for each
>> aircraft. To that end, whenever a user enters or modifies a log, I
>> "recalculate" the totals for that aircraft, and update all records with
>> the new totals. There are probably ways to optimize this process, but so
>> far I haven't seen a need to spend the time.
>> 
>> Ideally, this recalculation process would happen in the background.
>> There is no need for the user to wait around while the system crunches
>> numbers - they should be able to move on with entering another log or
>> whatever else they need to do. To that end, I implemented the call to
>> the recalc function using the multiprocessing module, so it could start
>> in the background and the main process move on.
>> 
>> Lately, though, I've been running into a problem where, when looking at
>> the process list on my server (Mac OS X 10.10.5), I'll see two or more
>> "copies" of my server process running - one master and one or more child
>> processes. As the above described process is the only place I am using
>> the multiprocessing module, I am making the assumption that this is what
>> these additional processes are. If they were only there for a few
>> minutes I would think this is normal, and it wouldn't be a problem.
>> 
>> However, what I am seeing is that from time to time (once or twice every
>> couple of days) these additional processes will get "stuck", and when
>> that happens sockets opened by the web app don't get properly closed and
>> start piling up. Looking at a list of open sockets on the server when I
>> have one of these "hung" processes shows a steadily increasing number of
>> sockets in a "CLOSE_WAIT" state (normally I see none in that state).
>> Killing off the hung process(es) clears out these sockets, but if I
>> don't catch it quickly enough these sockets can build up to the point
>> that I am unable to open any more, and the server starts rejecting
>> connections.
>> 
>> I'm told this happens because the process retains a reference to all
>> open files/sockets from the parent process, thus preventing the sockets
>> from closing until the process terminates. Regardless of the reason, it
>> can cause a loss of service if I don't catch it quickly enough. As such,
>> I'm wondering if there is a better way. Should I be looking at using the
>> threading library rather than the multiprocessing library? My
>> understanding is that the GIL would prevent that approach from being of
>> any real benefit for a calculation intensive type task, but maybe since
>> the rest of the application is CherryPy threads, it would still work
>> well?. Or perhaps there is a way to not give the child process any
>> references to the parent's files/sockets - although that may not help
>> with the process hanging? Maybe there is a way to "monitor" the process,
>> and automatically kill it if it stops responding? Or am I totally
>> barking up the wrong tree here?
>> 
> It sounds like the multiprocessing module is forking the new process,
> which inherits the handles.
> 
> Python 3.4 added the ability to spawn the new process, which won't inherit 
> the handles.

Well, that might be a reason to look at moving to 3 then. It's been on my to-do 
list :-)

> 
> It's unfortunate that you're using Python 2.7.6!
> 
> Could you start the background process early, before any of those
> sockets have been opened, and then communicate with it via queues?

Possibly. Simply have the process always running, and tell it to kick off 
calculations as needed via queues. It's worth investigating for sure.

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 3:40 PM, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> 
> On Fri, 23 Oct 2015 08:35:06 -0800, Israel Brewster <isr...@ravnalaska.net>
> declaimed the following:
> 
>> tl;dr: I've been using the multiprocessing module to run some calculations 
>> in the background of my CherryPy web app, but apparently this process 
>> sometimes gets stuck, causing problems with open sockets piling up and 
>> blocking the app. Is there a better way?
>> 
>> The (rather wordy) details:
>> 
>   The less wordy first impression...
> 
>> I have a moderately busy web app written in python using the CherryPy 
>> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of 
>> the primary purposes of this web app is to track user-entered flight logs, 
>> and keep a running tally of hours/cycles/landings for each aircraft. To that 
>> end, whenever a user enters or modifies a log, I "recalculate" the totals 
>> for that aircraft, and update all records with the new totals. There are 
>> probably ways to optimize this process, but so far I haven't seen a need to 
>> spend the time.
>> 
>   Off-hand -- this sounds like something that should be in a database...
> Unless your calculations are really nasty, rather than just aggregates, a
> database engine should be able to apply them in SQL queries or stored
> procedures.

Sounds like a potentially valid approach. Would require some significant 
re-tooling, but could work. I'll look into it.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
>wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 6:48 PM, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Sat, Oct 24, 2015 at 3:35 AM, Israel Brewster <isr...@ravnalaska.net> 
> wrote:
>> 
>> Ideally, this recalculation process would happen in the background. There is
>> no need for the user to wait around while the system crunches numbers - they
>> should be able to move on with entering another log or whatever else they
>> need to do. To that end, I implemented the call to the recalc function using
>> the multiprocessing module, so it could start in the background and the main
>> process move on.
> 
> One way to get around this would be to separate the processes
> completely, and simply alert the other process (maybe via a socket) to
> ask it to do the recalculation. That way, the background process would
> never have any of the main process's sockets, and can't affect them in
> any way.

Sounds similar to MRAB's suggestion of starting the process before any sockets 
have been opened. Certainly worth investigating, and I think it should be 
doable. Thanks!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Best way to do background calculations?

2015-10-25 Thread Israel Brewster
tl;dr: I've been using the multiprocessing module to run some calculations in the background of my CherryPy web app, but apparently this process sometimes gets stuck, causing problems with open sockets piling up and blocking the app. Is there a better way?The (rather wordy) details:I have a moderately busy web app written in python using the CherryPy framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of the primary purposes of this web app is to track user-entered flight logs, and keep a running tally of hours/cycles/landings for each aircraft. To that end, whenever a user enters or modifies a log, I "recalculate" the totals for that aircraft, and update all records with the new totals. There are probably ways to optimize this process, but so far I haven't seen a need to spend the time.Ideally, this recalculation process would happen in the background. There is no need for the user to wait around while the system crunches numbers - they should be able to move on with entering another log or whatever else they need to do. To that end, I implemented the call to the recalc function using the multiprocessing module, so it could start in the background and the main process move on.Lately, though, I've been running into a problem where, when looking at the process list on my server (Mac OS X 10.10.5), I'll see two or more "copies" of my server process running - one master and one or more child processes. As the above described process is the only place I am using the multiprocessing module, I am making the assumption that this is what these additional processes are. If they were only there for a few minutes I would think this is normal, and it wouldn't be a problem. However, what I am seeing is that from time to time (once or twice every couple of days) these additional processes will get "stuck", and when that happens sockets opened by the web app don't get properly closed and start piling up. Looking at a list of open sockets on the server when I have one of these "hung" processes shows a steadily increasing number of sockets in a "CLOSE_WAIT" state (normally I see none in that state). Killing off the hung process(es) clears out these sockets, but if I don't catch it quickly enough these sockets can build up to the point that I am unable to open any more, and the server starts rejecting connections.I'm told this happens because the process retains a reference to all open files/sockets from the parent process, thus preventing the sockets from closing until the process terminates. Regardless of the reason, it can cause a loss of service if I don't catch it quickly enough. As such, I'm wondering if there is a better way. Should I be looking at using the threading library rather than the multiprocessing library? My understanding is that the GIL would prevent that approach from being of any real benefit for a calculation intensive type task, but maybe since the rest of the application is CherryPy threads, it would still work well?. Or perhaps there is a way to not give the child process any references to the parent's files/sockets - although that may not help with the process hanging? Maybe there is a way to "monitor" the process, and automatically kill it if it stops responding? Or am I totally barking up the wrong tree here?Thanks for any insight anyone can provide!
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Track down SIGABRT

2015-01-13 Thread Israel Brewster
On Jan 13, 2015, at 6:27 AM, William Ray Wing w...@mac.com wrote:

 
 On Jan 9, 2015, at 12:40 PM, Israel Brewster isr...@ravnalaska.net wrote:
 
 I have a long-running python/CherryPy Web App server process that I am 
 running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I 
 have the code in place to change over to 64 bit, but need to schedule the 
 downtime to do it). On the 6th of this month, during normal operation from 
 what I can tell, and after around 33 days of trouble-free uptime, the python 
 process crashed with a SIGABRT. I restarted the process, and everything 
 looked good again until yesterday, when it again crashed with a SIGABRT. The 
 crash dump the system gave me doesn't tell me much, other than that it looks 
 like python is calling some C function when it crashes. I've attached the 
 crash report, in case it can mean something more to someone else.
 
 Can anyone give me some hints as to how to track down the cause of this 
 crash? It's especially problematic since I can't mess with the live server 
 for testing, and it is quite a while between crashes, making it difficult, 
 if not impossible, to reproduce in testing. Thanks.
 ---
 Israel Brewster
 Systems Analyst II
 Ravn Alaska
 5245 Airport Industrial Rd
 Fairbanks, AK 99709
 (907) 450-7293
 ---
 
 
 Can you run the application in an IDE?

Yes - I run it through Wing during development. I don't think that would be 
such a good option for my production machine, however. If it gets really bad 
I'll consider it though - that should at least tell me where it is crashing.

 
 -Bill

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Track down SIGABRT

2015-01-13 Thread Israel Brewster
On Jan 13, 2015, at 8:26 AM, Skip Montanaro skip.montan...@gmail.com wrote:

 Assuming you have gdb available, you should be able to attach to the
 running process, then set a breakpoint in relevant functions (like
 exit() or abort()). Once there, you can pick through the C stack
 manually (kind of tedious) or use the gdbinit file which comes with
 Python to get a Python stack trace (much less tedious, once you've
 made sure any version dependencies have been eliminated). Or, with the
 latest versions of gdb (7.x I think), you get more stuff built into
 gdb itself.
 
 More details here:
 
 https://wiki.python.org/moin/DebuggingWithGdb

Thanks, I'll look into that. Hopefully running with the debugger attached won't 
slow things down to much. The main thing I think will be getting the python 
extensions installed - the instructions only talk about doing this for linux 
packages.

 
 Skip

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Track down SIGABRT

2015-01-13 Thread Israel Brewster
On Jan 12, 2015, at 5:51 PM, Jason Friedman jsf80...@gmail.com wrote:

 I have a long-running python/CherryPy Web App server process that I am
 running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I
 have the code in place to change over to 64 bit, but need to schedule the
 downtime to do it). On the 6th of this month, during normal operation from
 what I can tell, and after around 33 days of trouble-free uptime, the python
 process crashed with a SIGABRT. I restarted the process, and everything
 looked good again until yesterday, when it again crashed with a SIGABRT.
 
 Can you monitor disk and memory on the host?  Perhaps it is climbing
 towards an unacceptable value right before crashing.

Good thought. I'm pretty sure that the system monitor still showed a couple of 
gigs free memory before the last crash, but the process could still be using 
unacceptable amounts of resources

 
 Do you have the option of stopping and starting your process every
 night or every week?

Yes, that's an option, and as a work-around I'll consider it. Of course, I'd 
much rather not have the thing crash in the first place :-)

 -- 
 https://mail.python.org/mailman/listinfo/python-list

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

-- 
https://mail.python.org/mailman/listinfo/python-list


Track down SIGABRT

2015-01-12 Thread Israel Brewster
I have a long-running python/CherryPy Web App server process that I am running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I have the code in place to change over to 64 bit, but need to schedule the downtime to do it). On the 6th of this month, during normal operation from what I can tell, and after around 33 days of trouble-free uptime, the python process crashed with a SIGABRT. I restarted the process, and everything looked good again until yesterday, when it again crashed with a SIGABRT. The crash dump the system gave me doesn't tell me much, other than that it looks like python is calling some C function when it crashes. I've attached the crash report, in case it can mean something more to someone else.Can anyone give me some hints as to how to track down the cause of this crash? It's especially problematic since I can't mess with the live server for testing, and it is quite a while between crashes, making it difficult, if not impossible, to reproduce in testing. Thanks.
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD




Python_2015-01-08-152219_minilogger.crash
Description: Binary data
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cherrypy - prevent browser prefetch?

2014-12-03 Thread Israel Brewster
Ah, I see. That makes sense. Thanks.
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD
 

On Dec 2, 2014, at 9:17 PM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 Israel Brewster wrote:
 Primary because they aren’t forms, they are links. And links are, by
 definition, GET’s. That said, as I mentioned in earlier replies, if using a
 form for a simple link is the Right Way to do things like this, then I can
 change it.
 
 I'd look at it another way and say that an action with side
 effects shouldn't appear as a simple link to the user. Links
 are for requesting information; buttons are for triggering
 actions.
 
 -- 
 Greg
 -- 
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cherrypy - prevent browser prefetch?

2014-12-02 Thread Israel Brewster

 On Dec 2, 2014, at 4:33 AM, random...@fastmail.us wrote:
 
 On Mon, Dec 1, 2014, at 15:28, Israel Brewster wrote:
 For example, I have a URL on my Cherrypy app that updates some local
 caches. It is accessed at http://server/admin/updatecaches So if I
 start typing http://server/a, for example, safari may auto-fill the
 dmin/updatecaches, and trigger a cache refresh on the server - even
 though I was just trying to get to the main admin page at /admin. Or, it
 might auto-fill uth/logout instead (http://server/auth/logout), and
 log me out of my session. While the former may be acceptable (after all,
 a cache update, even if not strictly needed, is at least non-harmfull),
 the latter could cause serious issues with usability. So how can cherrypy
 tell the difference between the prefetch and an actual request, and not
 respond to the prefetch?
 
 Why is your logout form - or, your update caches form, etc - a GET
 instead of a POST?

Primary because they aren’t forms, they are links. And links are, by 
definition, GET’s. That said, as I mentioned in earlier replies, if using a 
form for a simple link is the Right Way to do things like this, then I can 
change it.

Thanks!

—
Israel Brewster

 The key problem is that a GET request is assumed by
 browser designers to not have any harmful side effects.
 -- 
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Cherrypy - prevent browser prefetch?

2014-12-01 Thread Israel Brewster
I don't know if this is a cherrypy specific question (although it will be implemented in cherrypy for sure), or more of a general http protocol question, but when using cherrypy to serve a web app, is there anyway to prevent browser prefetch? I'm running to a problem, specifically from Safari on the Mac, where I start to type a URL, and Safari auto-fills the rest of a random URL matching what I started to type, and simultaneously sends a request for that URL to my server, occasionally causing unwanted effects.For example, I have a URL on my Cherrypy app that updates some local caches. It is accessed at http://server/admin/updatecaches So if I start typing http://server/a, for example, safari may auto-fill the "dmin/updatecaches", and trigger a cache refresh on the server - even though I was just trying to get to the main admin page at /admin. Or, it might auto-fill "uth/logout" instead (http://server/auth/logout), and log me out of my session. While the former may be acceptable (after all, a cache update, even if not strictly needed, is at least non-harmfull), the latter could cause serious issues with usability. So how can cherrypy tell the difference between the "prefetch" and an actual request, and not respond to the prefetch?
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cherrypy - prevent browser prefetch?

2014-12-01 Thread Israel Brewster
On Dec 1, 2014, at 12:50 PM, Ned Batchelder n...@nedbatchelder.com wrote:

 On 12/1/14 4:26 PM, Tim Chase wrote:
 On 2014-12-01 11:28, Israel Brewster wrote:
 I don't know if this is a cherrypy specific question (although it
 will be implemented in cherrypy for sure), or more of a general
 http protocol question, but when using cherrypy to serve a web app,
 is there anyway to prevent browser prefetch? I'm running to a
 problem, specifically from Safari on the Mac, where I start to type
 a URL, and Safari auto-fills the rest of a random URL matching what
 I started to type, and simultaneously sends a request for that URL
 to my server, occasionally causing unwanted effects.
 
 All this to also say that performing non-idempotent actions on a GET
 request is just begging for trouble. ;-)
 
 
 This is the key point: your web application shouldn't be doing these kinds of 
 actions in response to a GET request.  Make them POST requests, and Safari 
 won't give you any trouble.
 
 Trying to stop Safari from making the GET requests might work for Safari, but 
 then you will find another browser, or a proxy server, or an edge-caching 
 accelerator, etc, that makes the GET requests when you don't want them.
 
 The way to indicate to a browser that it shouldn't pre-fetch a URL is to make 
 it a POST request.

Ok, that makes sense. The only difficulty I have with that answer is that to 
the best of my knowledge the only way to make a HTML link do a POST is to use 
the onclick function to run a javascript, while having the link itself point 
to nothing. Just feels a bit ugly to me, but if that's the Right Way™ to do it, 
then that's fine.

Thanks!

 
 -- 
 Ned Batchelder, http://nedbatchelder.com
 
 -- 
 https://mail.python.org/mailman/listinfo/python-list


---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cherrypy - prevent browser prefetch?

2014-12-01 Thread Israel Brewster
On Dec 1, 2014, at 1:12 PM, Tim Chase python.l...@tim.thechases.com wrote:

 On 2014-12-01 16:50, Ned Batchelder wrote:
 On 12/1/14 4:26 PM, Tim Chase wrote:
 All this to also say that performing non-idempotent actions on a
 GET request is just begging for trouble. ;-)
 
 This is the key point: your web application shouldn't be doing
 these kinds of actions in response to a GET request.  Make them
 POST requests, and Safari won't give you any trouble.
 
 Though to be fair, based on the reading I did, Safari also pulls in
 the various JS and executes it too, meaning that merely
 (pre)viewing the page triggers any Google Analytics (or other
 analytics) code you have on that page, sending page views with a
 high bounce rate (looks like you only hit one page and never browsed
 elsewhere on the site).
 
 Additionally, if the target GET URL involves high processing load on
 the server, it might be worthwhile to put a caching proxy in front of
 it to serve (semi)stale data for any preview request rather than
 impose additional load on the server just so a preview can be updated.

Right, and there are probably some URL's in my app where this may be the case - 
I still need to go back and audit the code now that I'm aware of this going on. 
In general, though, it does sound as though changing things to POST requests, 
and disallowing GET requests for those URLS in my CherryPy app is the way to go.

Thanks!

 
 So I can see at least two cases in which you might want to sniff the
 are you just previewing, or do you actually want the page
 information.  Perhaps there are more.
 
 -tkc
 
 
 
 -- 
 https://mail.python.org/mailman/listinfo/python-list

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

-- 
https://mail.python.org/mailman/listinfo/python-list


CFFI distribution questions

2014-11-25 Thread Israel Brewster
So I have a python module that I have written which uses CFFI to link against a C library I have compiled. Specifically, it is a Database driver for the 4th dimension database, using an open-source C library distributed by the 4D company. I have tested the module and C code on a couple of different platforms, but I have a few questions regarding where to go from here.1) Currently, I am manually compiling the C library, and placing it in a subfolder of the module. So the top level of the module directory contains the python code and the __init__.py file, and then there is a sub-directory (lib4d_sql) containing the C code and compiled C library. I point CFFI to the library using a construct like this:_CWD = os.path.dirname(os.path.realpath(__file__))ffi.verifier(..., Library_dirs=["{}/lib4d_sql/].format(_CWD), ...)Obviously I have left out a lot of code there. Is this sort of construct considered kosher? Or is there a better way to say "this directory relative to your location"? I can't just use relative paths, because that would be relative to the execution directory (I think).2) What is the proper way to compile and distribute the C library with the python module?The examples I've found about distributing a CFFI module all assume you are are using some brain-dead-simple built-in C command where you don't have to worry about compiling or installing a library. I found some documentation related to building C and C++ extensions with distutils, and following that managed to get the library to compile, but I can't figure out what it does with the compiled library, or how to get it into the proper location relative to my module. I also found some more "raw" distutil code here:https://coderwall.com/p/mjrepq/easy-static-shared-libraries-with-distutils that I managed to use by overriding the finalize_options function of the setup tools install class (using the cmdclass option of setup), and this allowed me to build the library in the proper location in the tmp install directory, but it doesn't seem to keep the library when installing the module in the final location. So far, the only way I have found to work around that is by including a "dummy" copy of the library in the package_data option to setup, such that when the library is built it replaces this dummy file. Is there a better/more canonical way to do this?3) The majority of the setup/packing procedure I pulled from here:https://caremad.io/2014/11/distributing-a-cffi-project/ and it seems to work - mostly. The one problem I am running into is with the implicit compile that CFFI does. Some of the code (most, really) on that page is designed to work around this by doing the compile on install so it doesn't have to be done at runtime, however this doesn't appear to be working for me. I see signs that it is doing the compile at install time, however when I try to run the module it still tries to compile at that time, an operation that fails due to lack of permissions. Might anyone have some thoughts on how I can fix this?Finally, let me ask you guys this: is distributing this as a CFFI module even the right approach? Or should I be looking at something else entirely? C code isn't needed for speed - the database itself will be way slower than the interface code. It's just that the driver is distributed as C code and I didn't want to have to reverse engineer their code and re-write it all. I've read overhttps://docs.python.org/2/extending/extending.html, but I think I'm missing something when it comes to actually interfacing with python, and dealing with the non-python types (pointers to a struct, for example) that the C library uses. Would I have to put 100% of my code, including things like the cursor and connection classes, in C code? When I call the connect function from python (which should return an instance of a connection class), what exactly should my C function return? All the examples on that page show basic C types, not pointers to class instances or the like. Maybe I just need to read over the documentation a few more times :-)Thanks for any help anyone can provide on any of these questions! :-) Sorry for being so long-winded.
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


-- 
https://mail.python.org/mailman/listinfo/python-list