Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 4:05 PM, MRAB  wrote:
> 
> On 2015-10-23 17:35, Israel Brewster wrote:
>> tl;dr: I've been using the multiprocessing module to run some
>> calculations in the background of my CherryPy web app, but apparently
>> this process sometimes gets stuck, causing problems with open sockets
>> piling up and blocking the app. Is there a better way?
>> 
>> The (rather wordy) details:
>> 
>> I have a moderately busy web app written in python using the CherryPy
>> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One
>> of the primary purposes of this web app is to track user-entered flight
>> logs, and keep a running tally of hours/cycles/landings for each
>> aircraft. To that end, whenever a user enters or modifies a log, I
>> "recalculate" the totals for that aircraft, and update all records with
>> the new totals. There are probably ways to optimize this process, but so
>> far I haven't seen a need to spend the time.
>> 
>> Ideally, this recalculation process would happen in the background.
>> There is no need for the user to wait around while the system crunches
>> numbers - they should be able to move on with entering another log or
>> whatever else they need to do. To that end, I implemented the call to
>> the recalc function using the multiprocessing module, so it could start
>> in the background and the main process move on.
>> 
>> Lately, though, I've been running into a problem where, when looking at
>> the process list on my server (Mac OS X 10.10.5), I'll see two or more
>> "copies" of my server process running - one master and one or more child
>> processes. As the above described process is the only place I am using
>> the multiprocessing module, I am making the assumption that this is what
>> these additional processes are. If they were only there for a few
>> minutes I would think this is normal, and it wouldn't be a problem.
>> 
>> However, what I am seeing is that from time to time (once or twice every
>> couple of days) these additional processes will get "stuck", and when
>> that happens sockets opened by the web app don't get properly closed and
>> start piling up. Looking at a list of open sockets on the server when I
>> have one of these "hung" processes shows a steadily increasing number of
>> sockets in a "CLOSE_WAIT" state (normally I see none in that state).
>> Killing off the hung process(es) clears out these sockets, but if I
>> don't catch it quickly enough these sockets can build up to the point
>> that I am unable to open any more, and the server starts rejecting
>> connections.
>> 
>> I'm told this happens because the process retains a reference to all
>> open files/sockets from the parent process, thus preventing the sockets
>> from closing until the process terminates. Regardless of the reason, it
>> can cause a loss of service if I don't catch it quickly enough. As such,
>> I'm wondering if there is a better way. Should I be looking at using the
>> threading library rather than the multiprocessing library? My
>> understanding is that the GIL would prevent that approach from being of
>> any real benefit for a calculation intensive type task, but maybe since
>> the rest of the application is CherryPy threads, it would still work
>> well?. Or perhaps there is a way to not give the child process any
>> references to the parent's files/sockets - although that may not help
>> with the process hanging? Maybe there is a way to "monitor" the process,
>> and automatically kill it if it stops responding? Or am I totally
>> barking up the wrong tree here?
>> 
> It sounds like the multiprocessing module is forking the new process,
> which inherits the handles.
> 
> Python 3.4 added the ability to spawn the new process, which won't inherit 
> the handles.

Well, that might be a reason to look at moving to 3 then. It's been on my to-do 
list :-)

> 
> It's unfortunate that you're using Python 2.7.6!
> 
> Could you start the background process early, before any of those
> sockets have been opened, and then communicate with it via queues?

Possibly. Simply have the process always running, and tell it to kick off 
calculations as needed via queues. It's worth investigating for sure.

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 3:40 PM, Dennis Lee Bieber  wrote:
> 
> On Fri, 23 Oct 2015 08:35:06 -0800, Israel Brewster 
> declaimed the following:
> 
>> tl;dr: I've been using the multiprocessing module to run some calculations 
>> in the background of my CherryPy web app, but apparently this process 
>> sometimes gets stuck, causing problems with open sockets piling up and 
>> blocking the app. Is there a better way?
>> 
>> The (rather wordy) details:
>> 
>   The less wordy first impression...
> 
>> I have a moderately busy web app written in python using the CherryPy 
>> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of 
>> the primary purposes of this web app is to track user-entered flight logs, 
>> and keep a running tally of hours/cycles/landings for each aircraft. To that 
>> end, whenever a user enters or modifies a log, I "recalculate" the totals 
>> for that aircraft, and update all records with the new totals. There are 
>> probably ways to optimize this process, but so far I haven't seen a need to 
>> spend the time.
>> 
>   Off-hand -- this sounds like something that should be in a database...
> Unless your calculations are really nasty, rather than just aggregates, a
> database engine should be able to apply them in SQL queries or stored
> procedures.

Sounds like a potentially valid approach. Would require some significant 
re-tooling, but could work. I'll look into it.

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
>wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Israel Brewster
On Oct 25, 2015, at 6:48 PM, Chris Angelico  wrote:
> 
> On Sat, Oct 24, 2015 at 3:35 AM, Israel Brewster  
> wrote:
>> 
>> Ideally, this recalculation process would happen in the background. There is
>> no need for the user to wait around while the system crunches numbers - they
>> should be able to move on with entering another log or whatever else they
>> need to do. To that end, I implemented the call to the recalc function using
>> the multiprocessing module, so it could start in the background and the main
>> process move on.
> 
> One way to get around this would be to separate the processes
> completely, and simply alert the other process (maybe via a socket) to
> ask it to do the recalculation. That way, the background process would
> never have any of the main process's sockets, and can't affect them in
> any way.

Sounds similar to MRAB's suggestion of starting the process before any sockets 
have been opened. Certainly worth investigating, and I think it should be 
doable. Thanks!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-26 Thread Chris Angelico
On Tue, Oct 27, 2015 at 4:01 AM, Israel Brewster  wrote:
> Sounds similar to MRAB's suggestion of starting the process before any 
> sockets have been opened. Certainly worth investigating, and I think it 
> should be doable. Thanks!

Yep, either would work. My suggestion would be to not fork off from
your main web server process _at all_, and have your background
process managed some other way (eg a systemd service or Upstart job),
but forking earlier will have a similar effect.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Please post “text/plain” message body (was: Best way to do background calculations?)

2015-10-25 Thread Ben Finney
Israel Brewster  writes:

[no text]

Please ensure your email message body is “text/plain” (and preferably
not HTML) when posting to unfamiliar recipients — which is always, on a
public forum like this.

-- 
 \“Pinky, are you pondering what I'm pondering?” “Umm, I think |
  `\   so, Brain, but three men in a tub? Ooh, that's unsanitary!” |
_o__)   —_Pinky and The Brain_ |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Best way to do background calculations?

2015-10-25 Thread Israel Brewster
tl;dr: I've been using the multiprocessing module to run some calculations in the background of my CherryPy web app, but apparently this process sometimes gets stuck, causing problems with open sockets piling up and blocking the app. Is there a better way?The (rather wordy) details:I have a moderately busy web app written in python using the CherryPy framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of the primary purposes of this web app is to track user-entered flight logs, and keep a running tally of hours/cycles/landings for each aircraft. To that end, whenever a user enters or modifies a log, I "recalculate" the totals for that aircraft, and update all records with the new totals. There are probably ways to optimize this process, but so far I haven't seen a need to spend the time.Ideally, this recalculation process would happen in the background. There is no need for the user to wait around while the system crunches numbers - they should be able to move on with entering another log or whatever else they need to do. To that end, I implemented the call to the recalc function using the multiprocessing module, so it could start in the background and the main process move on.Lately, though, I've been running into a problem where, when looking at the process list on my server (Mac OS X 10.10.5), I'll see two or more "copies" of my server process running - one master and one or more child processes. As the above described process is the only place I am using the multiprocessing module, I am making the assumption that this is what these additional processes are. If they were only there for a few minutes I would think this is normal, and it wouldn't be a problem. However, what I am seeing is that from time to time (once or twice every couple of days) these additional processes will get "stuck", and when that happens sockets opened by the web app don't get properly closed and start piling up. Looking at a list of open sockets on the server when I have one of these "hung" processes shows a steadily increasing number of sockets in a "CLOSE_WAIT" state (normally I see none in that state). Killing off the hung process(es) clears out these sockets, but if I don't catch it quickly enough these sockets can build up to the point that I am unable to open any more, and the server starts rejecting connections.I'm told this happens because the process retains a reference to all open files/sockets from the parent process, thus preventing the sockets from closing until the process terminates. Regardless of the reason, it can cause a loss of service if I don't catch it quickly enough. As such, I'm wondering if there is a better way. Should I be looking at using the threading library rather than the multiprocessing library? My understanding is that the GIL would prevent that approach from being of any real benefit for a calculation intensive type task, but maybe since the rest of the application is CherryPy threads, it would still work well?. Or perhaps there is a way to not give the child process any references to the parent's files/sockets - although that may not help with the process hanging? Maybe there is a way to "monitor" the process, and automatically kill it if it stops responding? Or am I totally barking up the wrong tree here?Thanks for any insight anyone can provide!
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-25 Thread Chris Angelico
On Sat, Oct 24, 2015 at 3:35 AM, Israel Brewster  wrote:
>
> Ideally, this recalculation process would happen in the background. There is
> no need for the user to wait around while the system crunches numbers - they
> should be able to move on with entering another log or whatever else they
> need to do. To that end, I implemented the call to the recalc function using
> the multiprocessing module, so it could start in the background and the main
> process move on.

One way to get around this would be to separate the processes
completely, and simply alert the other process (maybe via a socket) to
ask it to do the recalculation. That way, the background process would
never have any of the main process's sockets, and can't affect them in
any way.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to do background calculations?

2015-10-25 Thread MRAB

On 2015-10-23 17:35, Israel Brewster wrote:

tl;dr: I've been using the multiprocessing module to run some
calculations in the background of my CherryPy web app, but apparently
this process sometimes gets stuck, causing problems with open sockets
piling up and blocking the app. Is there a better way?

The (rather wordy) details:

I have a moderately busy web app written in python using the CherryPy
framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One
of the primary purposes of this web app is to track user-entered flight
logs, and keep a running tally of hours/cycles/landings for each
aircraft. To that end, whenever a user enters or modifies a log, I
"recalculate" the totals for that aircraft, and update all records with
the new totals. There are probably ways to optimize this process, but so
far I haven't seen a need to spend the time.

Ideally, this recalculation process would happen in the background.
There is no need for the user to wait around while the system crunches
numbers - they should be able to move on with entering another log or
whatever else they need to do. To that end, I implemented the call to
the recalc function using the multiprocessing module, so it could start
in the background and the main process move on.

Lately, though, I've been running into a problem where, when looking at
the process list on my server (Mac OS X 10.10.5), I'll see two or more
"copies" of my server process running - one master and one or more child
processes. As the above described process is the only place I am using
the multiprocessing module, I am making the assumption that this is what
these additional processes are. If they were only there for a few
minutes I would think this is normal, and it wouldn't be a problem.

However, what I am seeing is that from time to time (once or twice every
couple of days) these additional processes will get "stuck", and when
that happens sockets opened by the web app don't get properly closed and
start piling up. Looking at a list of open sockets on the server when I
have one of these "hung" processes shows a steadily increasing number of
sockets in a "CLOSE_WAIT" state (normally I see none in that state).
Killing off the hung process(es) clears out these sockets, but if I
don't catch it quickly enough these sockets can build up to the point
that I am unable to open any more, and the server starts rejecting
connections.

I'm told this happens because the process retains a reference to all
open files/sockets from the parent process, thus preventing the sockets
from closing until the process terminates. Regardless of the reason, it
can cause a loss of service if I don't catch it quickly enough. As such,
I'm wondering if there is a better way. Should I be looking at using the
threading library rather than the multiprocessing library? My
understanding is that the GIL would prevent that approach from being of
any real benefit for a calculation intensive type task, but maybe since
the rest of the application is CherryPy threads, it would still work
well?. Or perhaps there is a way to not give the child process any
references to the parent's files/sockets - although that may not help
with the process hanging? Maybe there is a way to "monitor" the process,
and automatically kill it if it stops responding? Or am I totally
barking up the wrong tree here?


It sounds like the multiprocessing module is forking the new process,
which inherits the handles.

Python 3.4 added the ability to spawn the new process, which won't 
inherit the handles.


It's unfortunate that you're using Python 2.7.6!

Could you start the background process early, before any of those
sockets have been opened, and then communicate with it via queues?

--
https://mail.python.org/mailman/listinfo/python-list