Re: [sqlalchemy] Performance problem of simple orm usage

2014-06-29 Thread Hans-Peter Jansen
Dear Jonathan,

thank you for your cool remix of recommendations. Very appreciated.

On Donnerstag, 26. Juni 2014 15:09:03 Jonathan Vanasco wrote:
> In case this helps...
> 
> This reminds me slightly of some RFID work I did years ago.  We had a lot
> of reads coming in from different units, several reads per unit per second.

Fun.

> I found the best way to handle writing was to just access the db directly,
> but kept the ORM on the "read" side.

I'm doing this in other projects, but writing SQL feels so 80ies, doesn't it.. 
I even did my own ORM with a small 'r' (how did not?) in about a 150 lines of 
code years ago.

> I recall a few things from that experience.  some of this might be
> applicable to you:
> 
> • doing a "Select else Insert" pattern was faster than "Insert All" and
> just letting the db constraint catch errors.  there were still a lot of
> exceptions from different transactions.  this is probably dependent on the
> database/version/size/etc.  but it's worth profiling.

Yes, that's what I found, too. In the rollback part of it.

> • we had a low number of "devices", so just cached their data into the
> app's memory.  losing the extra select helped a bit.
>
> * our transaction got pared down to looking like this:
> 
>  begin;
>  select record ;
>  insert record;
>  commit;
>
> * i think someone eventually migrated the db interaction to use a stored
> procedure.

An example of that would be interesting.
 
> * we kept an in-memory array of recently seen transactions.  it was just a
> simple list that would have elements inserted at 0 and then truncated.  we
> checked that list first, then hit the database if it wasn't seen .

I'm doing that, but using a dict with the unique index members as the key.

> our performance greatly increased, and frustrations decreased, as we moved
> more logic out of the database and into python.
> 
> on another RFID project, the lead engineer i worked with had a similar
> approach... however to keep performance better, he used an in-memory
> database to record data and then did periodic flushes to the real database
> after the data was "smoothed" out ( to compensate for known hardware issues
> ).

Oh, that case sounds pretty pathologic..

After coping with Mike's notes, I faced some bottlenecks on the sending side, 
which are plain Raspberry Pis, and the socket interface. Disabling Nagle 
helped a lot, as well as avoiding datetime.datetime.strptime(). I've learned, 
that some unavoidable datetime and timezone operations are still the most 
expensive ones.. Oh, well.

Net result of this optimization quest: down to 1/10th compared to the starting 
point, and all that without getting too dirty. ;)

Cheers,
Pete

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Performance problem of simple orm usage

2014-06-29 Thread Hans-Peter Jansen
Hi Mike,

sorry for the late reply, but it took a considerable amount of time to 
sort the non SQLA related issues out.. Will elaborate in a reply to 
Jonathan's answer a bit.

On Donnerstag, 26. Juni 2014 17:34:58 Mike Bayer wrote:
> On 6/26/14, 5:18 PM, Hans-Peter Jansen wrote:
> > 
> > Still, attributes is highest in profiles. I guess, this is, what you
> > called overkill in your first response. A helper for dealing with this
> > situation sounds interesting, though.
> > 
> > Thu Jun 26 20:41:50 2014srelay.pstats
> > 
> >  55993702 function calls (54767154 primitive calls) in 533.479
> >  seconds
> >
> >Ordered by: internal time
> >List reduced from 1277 to 30 due to restriction <30>
> >
> >ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> >
> > 42160  292.5360.007  292.9100.007
> > attributes.py:1321(from_collection) 63209   39.1850.001   39.294 
> >   0.001 attributes.py:1255(as_state)
> from_collection is essentially returning a structure that describes
> which members of a collection were removed or added.  if the collection
> is totally empty or is unloaded, this method should be much faster.  try
> not do to this:
> 
> enos.values.append(ev)
> 
> that's what's causing that latency to occur.   The one-to-many of
> enos->ev is mirrored by a many to one from ev->enos - set enos as a
> parent of ev instead, or even faster, just set the foreign key column on
> ev to that of the primary key of enos.

Okay, I did that already, but didn't realize, that it's better to just add the
record to the non auto flushing session. With auto flushing disabled, some  
other issues arose with related db operations in that session. I finally solved 
that with creating an additional session besides the scoped one, and setting 
that one up with autoflush = False, as you suggested.

Although, I haven't given up the declarative layer, about the whole work is 
done in the database layer, which is a great achievement for my POV.

Sun Jun 29 17:17:16 2014profiles/srelay_20140629_171715.pstats

 9740100 function calls (9425633 primitive calls) in 141.169 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.000  141.170  141.170 cProfile.py:146(runcall)
10.1690.169  141.170  141.170 srelay.py:24(run)
325790.0940.000   93.9550.003 fetch_srelay.py:62(fetch_srelay)
   390.0000.000   91.4492.345 fetch_srelay.py:49(recv)
   780.0020.000   91.4491.172 fetch_srelay.py:34(recvblock)
  167   91.4460.548   91.4460.548 {method 'recv' of 
'_socket.socket' objects}
   460.0010.000   18.5710.404 session.py:687(commit)
95/510.0010.000   18.5700.364 session.py:351(commit)
  1370.0570.000   18.4370.135 session.py:1790(flush)
95/510.0000.000   18.4140.361 session.py:320(_prepare_impl)
   490.0730.001   18.3720.375 session.py:1841(_flush)
   490.0010.000   17.1750.351 unitofwork.py:350(execute)
   820.0150.000   16.7770.205 unitofwork.py:522(execute)
   820.0010.000   16.7620.204 persistence.py:25(save_obj)
   220.3650.017   15.8980.723 
persistence.py:523(_emit_insert_statements)
251390.1020.000   14.9370.001 base.py:597(execute)
251390.2020.000   14.8350.001 
base.py:727(_execute_clauseelement)
251390.2760.000   14.4660.001 base.py:812(_execute_context)
251390.0560.000   11.4850.000 default.py:323(do_execute)
251840.3100.000   11.4440.000 cursors.py:164(execute)
251840.0460.000   10.2130.000 cursors.py:353(_query)
251840.1170.000   10.1170.000 cursors.py:315(_do_query)
251849.6600.0009.6600.000 {method 'query' of 
'_mysql.connection' objects}
325730.4450.0003.6880.000 srelay.py:88(store_enos_rec)
251390.6750.0001.7290.000 default.py:391(_init_compiled)

Without profiling, I'm down to 90 secs for fetching > 33000 records from a 
raspi, 
and pushing them in a mysql database. All in pure Python. Fun!

Thanks again, Mike, for this great lib and your even GREATER support.

Cheers,
Pete

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is it considered bad practice to have more than one session instance simultaneously in a web application?

2014-06-29 Thread Bao Niu
Hi Mike,
Thanks for your reply. In my case, the full_name attribute is a hybrid
property using query on firstName and lastName. When I construct a Person
instance, I need a session to query the names and build the full_name
attribute on the fly. So do you think I should remove this session
immediately after I have built full_name attribute? What if later on my
application changes this person's firstName? If the session is still alive
it will expire full_name attribute automatically, but if the session was
removed, there won't be any automatic update on those hybrid_property,
right?

Doesn't this scenario justify a background thread?


On Sat, Jun 28, 2014 at 7:14 AM, Mike Bayer 
wrote:

>
> On 6/28/14, 7:13 AM, Bao Niu wrote:
>
> My situation is like this:
>
> I am developing a web application, which has a Person class, which has
> *FirstName* and *LastName* attributes. Now I want to build their full
> name attribute and make this *full_name* attribute queriable, by using
> hybrid_property, which entails query and hence session. This session for
> querying hybrid_property has its life cycle as long as that particular
> Person instance is active in memory, as in the running process the names
> might get changed, and need to communicate to the database.
>
> In the mean time, in this application I also need another Session instance
> to contain those Person instances themselves, and this Session instance has
> a quite different life cycle than the above one. I am using
> cherrypy.request to hold a thread-local session for this second purpose.
>
> Now it seems to me that both Session instances are necessary, I can't use
> one in place of the other. But because handling two sessions at the same
> time is inherently so confusing sometimes, I wonder if I am in the right
> direction? Is this generally considered bad? If it is, then how to deal
> with it? Thanks in advance.
>
>
> If this is a web application, having a session that isn't lifecycled to a
> request seems like it runs in some kind of background thread or
> something.Otherwise, if its some session that stays open in the
> cherrypy app while the app is doing nothing, and is only used by requests
> (somehow?  session shouldn't be accessed by multiple things at once) not
> serving requests, that's bad.   if you're using a Session as some kind
> offline cache, that's not what it's for and it won't do a good job of that
> because it isn't threadsafe.
>
> think more in terms of database transactions. I use two sessions all
> the time when I want to separate transactions, a background job is working
> in a transaction for several seconds, but a second short transaction is
> used to write messages to a log table, so that I can see the log table grow
> from the outside while the long transaction keeps going.   But database
> transactions overall should be short, and never dormant waiting for
> something to happen, they should be burning through the work they have to
> do as fast as possible and completing.  So should your sessions.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/CVIkd-WQiDM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.