Re: [sqlalchemy] Performance problem of simple orm usage
Dear Jonathan, thank you for your cool remix of recommendations. Very appreciated. On Donnerstag, 26. Juni 2014 15:09:03 Jonathan Vanasco wrote: > In case this helps... > > This reminds me slightly of some RFID work I did years ago. We had a lot > of reads coming in from different units, several reads per unit per second. Fun. > I found the best way to handle writing was to just access the db directly, > but kept the ORM on the "read" side. I'm doing this in other projects, but writing SQL feels so 80ies, doesn't it.. I even did my own ORM with a small 'r' (how did not?) in about a 150 lines of code years ago. > I recall a few things from that experience. some of this might be > applicable to you: > > • doing a "Select else Insert" pattern was faster than "Insert All" and > just letting the db constraint catch errors. there were still a lot of > exceptions from different transactions. this is probably dependent on the > database/version/size/etc. but it's worth profiling. Yes, that's what I found, too. In the rollback part of it. > • we had a low number of "devices", so just cached their data into the > app's memory. losing the extra select helped a bit. > > * our transaction got pared down to looking like this: > > begin; > select record ; > insert record; > commit; > > * i think someone eventually migrated the db interaction to use a stored > procedure. An example of that would be interesting. > * we kept an in-memory array of recently seen transactions. it was just a > simple list that would have elements inserted at 0 and then truncated. we > checked that list first, then hit the database if it wasn't seen . I'm doing that, but using a dict with the unique index members as the key. > our performance greatly increased, and frustrations decreased, as we moved > more logic out of the database and into python. > > on another RFID project, the lead engineer i worked with had a similar > approach... however to keep performance better, he used an in-memory > database to record data and then did periodic flushes to the real database > after the data was "smoothed" out ( to compensate for known hardware issues > ). Oh, that case sounds pretty pathologic.. After coping with Mike's notes, I faced some bottlenecks on the sending side, which are plain Raspberry Pis, and the socket interface. Disabling Nagle helped a lot, as well as avoiding datetime.datetime.strptime(). I've learned, that some unavoidable datetime and timezone operations are still the most expensive ones.. Oh, well. Net result of this optimization quest: down to 1/10th compared to the starting point, and all that without getting too dirty. ;) Cheers, Pete -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Performance problem of simple orm usage
Hi Mike, sorry for the late reply, but it took a considerable amount of time to sort the non SQLA related issues out.. Will elaborate in a reply to Jonathan's answer a bit. On Donnerstag, 26. Juni 2014 17:34:58 Mike Bayer wrote: > On 6/26/14, 5:18 PM, Hans-Peter Jansen wrote: > > > > Still, attributes is highest in profiles. I guess, this is, what you > > called overkill in your first response. A helper for dealing with this > > situation sounds interesting, though. > > > > Thu Jun 26 20:41:50 2014srelay.pstats > > > > 55993702 function calls (54767154 primitive calls) in 533.479 > > seconds > > > >Ordered by: internal time > >List reduced from 1277 to 30 due to restriction <30> > > > >ncalls tottime percall cumtime percall filename:lineno(function) > > > > 42160 292.5360.007 292.9100.007 > > attributes.py:1321(from_collection) 63209 39.1850.001 39.294 > > 0.001 attributes.py:1255(as_state) > from_collection is essentially returning a structure that describes > which members of a collection were removed or added. if the collection > is totally empty or is unloaded, this method should be much faster. try > not do to this: > > enos.values.append(ev) > > that's what's causing that latency to occur. The one-to-many of > enos->ev is mirrored by a many to one from ev->enos - set enos as a > parent of ev instead, or even faster, just set the foreign key column on > ev to that of the primary key of enos. Okay, I did that already, but didn't realize, that it's better to just add the record to the non auto flushing session. With auto flushing disabled, some other issues arose with related db operations in that session. I finally solved that with creating an additional session besides the scoped one, and setting that one up with autoflush = False, as you suggested. Although, I haven't given up the declarative layer, about the whole work is done in the database layer, which is a great achievement for my POV. Sun Jun 29 17:17:16 2014profiles/srelay_20140629_171715.pstats 9740100 function calls (9425633 primitive calls) in 141.169 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 10.0000.000 141.170 141.170 cProfile.py:146(runcall) 10.1690.169 141.170 141.170 srelay.py:24(run) 325790.0940.000 93.9550.003 fetch_srelay.py:62(fetch_srelay) 390.0000.000 91.4492.345 fetch_srelay.py:49(recv) 780.0020.000 91.4491.172 fetch_srelay.py:34(recvblock) 167 91.4460.548 91.4460.548 {method 'recv' of '_socket.socket' objects} 460.0010.000 18.5710.404 session.py:687(commit) 95/510.0010.000 18.5700.364 session.py:351(commit) 1370.0570.000 18.4370.135 session.py:1790(flush) 95/510.0000.000 18.4140.361 session.py:320(_prepare_impl) 490.0730.001 18.3720.375 session.py:1841(_flush) 490.0010.000 17.1750.351 unitofwork.py:350(execute) 820.0150.000 16.7770.205 unitofwork.py:522(execute) 820.0010.000 16.7620.204 persistence.py:25(save_obj) 220.3650.017 15.8980.723 persistence.py:523(_emit_insert_statements) 251390.1020.000 14.9370.001 base.py:597(execute) 251390.2020.000 14.8350.001 base.py:727(_execute_clauseelement) 251390.2760.000 14.4660.001 base.py:812(_execute_context) 251390.0560.000 11.4850.000 default.py:323(do_execute) 251840.3100.000 11.4440.000 cursors.py:164(execute) 251840.0460.000 10.2130.000 cursors.py:353(_query) 251840.1170.000 10.1170.000 cursors.py:315(_do_query) 251849.6600.0009.6600.000 {method 'query' of '_mysql.connection' objects} 325730.4450.0003.6880.000 srelay.py:88(store_enos_rec) 251390.6750.0001.7290.000 default.py:391(_init_compiled) Without profiling, I'm down to 90 secs for fetching > 33000 records from a raspi, and pushing them in a mysql database. All in pure Python. Fun! Thanks again, Mike, for this great lib and your even GREATER support. Cheers, Pete -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Is it considered bad practice to have more than one session instance simultaneously in a web application?
Hi Mike, Thanks for your reply. In my case, the full_name attribute is a hybrid property using query on firstName and lastName. When I construct a Person instance, I need a session to query the names and build the full_name attribute on the fly. So do you think I should remove this session immediately after I have built full_name attribute? What if later on my application changes this person's firstName? If the session is still alive it will expire full_name attribute automatically, but if the session was removed, there won't be any automatic update on those hybrid_property, right? Doesn't this scenario justify a background thread? On Sat, Jun 28, 2014 at 7:14 AM, Mike Bayer wrote: > > On 6/28/14, 7:13 AM, Bao Niu wrote: > > My situation is like this: > > I am developing a web application, which has a Person class, which has > *FirstName* and *LastName* attributes. Now I want to build their full > name attribute and make this *full_name* attribute queriable, by using > hybrid_property, which entails query and hence session. This session for > querying hybrid_property has its life cycle as long as that particular > Person instance is active in memory, as in the running process the names > might get changed, and need to communicate to the database. > > In the mean time, in this application I also need another Session instance > to contain those Person instances themselves, and this Session instance has > a quite different life cycle than the above one. I am using > cherrypy.request to hold a thread-local session for this second purpose. > > Now it seems to me that both Session instances are necessary, I can't use > one in place of the other. But because handling two sessions at the same > time is inherently so confusing sometimes, I wonder if I am in the right > direction? Is this generally considered bad? If it is, then how to deal > with it? Thanks in advance. > > > If this is a web application, having a session that isn't lifecycled to a > request seems like it runs in some kind of background thread or > something.Otherwise, if its some session that stays open in the > cherrypy app while the app is doing nothing, and is only used by requests > (somehow? session shouldn't be accessed by multiple things at once) not > serving requests, that's bad. if you're using a Session as some kind > offline cache, that's not what it's for and it won't do a good job of that > because it isn't threadsafe. > > think more in terms of database transactions. I use two sessions all > the time when I want to separate transactions, a background job is working > in a transaction for several seconds, but a second short transaction is > used to write messages to a log table, so that I can see the log table grow > from the outside while the long transaction keeps going. But database > transactions overall should be short, and never dormant waiting for > something to happen, they should be burning through the work they have to > do as fast as possible and completing. So should your sessions. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "sqlalchemy" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/sqlalchemy/CVIkd-WQiDM/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.