On 10/12/2016 05:49 PM, Alfred Perlstein wrote:


Mike,

Thank you, in fact thank you very, very much.   Writing out the code was
really above and beyond my expectations.  I will use this snippet in our
app.

The following problems remain for us (and also likely other users of
sqla) although they are not immediate issues:

1) I need to make sure that I can use "theading.local", since we are
based on flask we actually use werkzeug's threads, which I now have to
investigate if they are compatible with this snippet.  Likewise anyone
using greenlets may have to modify the snippet to fit with greenlets.
That said, I do believe that porting this snippet to werkzeug's thread
primatives should insulate us from greenlet/native-threads problems, but
then those folks not using werkzeug don't have that option.  This is why
overall it's a appears problematic to me.... however I've not confirmed
this yet, it's possible that "threading.local" works for most cases
(greenlets and werkzeug) but... I have to test.

threading.local() should work with greenlets if you're doing global monkeypatching. Otherwise there should be a similar construct in gevent that does this, or more simply don't use any kind of "local()" object, just check a dictionary for greenlet or thread id in the filter.


2) It looks like to me that by using the engine's logger we would be
turning on logging across all threads.  What happens when we have 1000
threads?  What will happen to us performance wise?

So I've looked in detail at how echo works and it is true that the InstanceLogger does not make use of a global log level and instead calls upon the ._log() method of the logger, doing the level calculation itself. If the Connection had this logger directly, none of the other Connection objects would see that they need to send records to the logging calls.

However, all the records passed to the log methods are passed as fixed objects with little to no overhead, and would be blocked by the filter before being processed. The logging calls make sure to defer all string interpolation and __repr__ calls of objects until they are rendered by a formatter which would not take place with the filter blocking the records from getting there.

The whole way that Connection even checks these flags is not really how logging was meant to be used, you're supposed to just send messages to the log objects and let the handlers and filters work it out. The flags are just to eke out that tiny bit extra reduction in method calls, but these wouldn't have much of an effect even on thousands of threads. But it's true, it isn't zero either.

>
> So as a question, if I ever get to it, would you entertain patches to do
> this without turning on global logging and/or being tied to the
> threading implementation?


So, to keep it zero, the event approach would be one way to go, since events can be added to a Connection object directly (that was a big change made some years ago, use to be just Engine level). Adding a new attribute connection.logger that defaults to self.engine.logger would allow InstanceLogger to be local to a connection but still the _echo flag would need to somehow work into the log system; this is of course possible, however I'm trying to keep everything to do with how logging / echo works in one place, and this would incur some kind of two-level system of "echo" between Engine and Connection that would need a lot of tests. If Connection has its own logger than we'd think that each Connection should be able to have independent logging names like Engine does, etc. It would not be a 2 line pull request.

I also worry that the precedent being set would be that anytime someone needs to do unusual things with logging, they are going to want to add new complexity to the ".echo" flag rather than going through the normal logging system which IMO is extremely flexible should be trusted to scale up as well as everything else.

Of course if turning on logging.INFO and adding the filter that blocks 99% of all log messages does prove to add some significant performance impact, that changes everything and we'd have to decide that Python logging does need to be worked around at scale. But Python logging is very widely used deep inside many networking related systems without much performance impact being noticed.




btw, if you're wondering where I'm coming from with these insane scaling
questions.... I used to be CTO of OKCupid and scaled them, now at a new
place, so these things matter to me and my team.

the "1000 greenlets" model is one I'm familiar with in Openstack (in that they've used that setting, but I found that that number of greenlets was never utilized for real). Are your servers truly using 1000 database connections in a single process?



thanks again Mike and apologies for the tone of my original email!

we can all get along, no worries.


-Alfred


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to