On Apr 4, 2008, at 11:09 AM, GK wrote:

>
> I tried initially without any additional column indexes, and (observed
> by watching debug output) the program started off quickly enough, but
> after a hundred or so records the rate of insertion started to drop
> off dramatically.  I reasoned that this was because the code was being
> forced to perform a linear search for each new entry processed, and
> hypothesized that this would be improved by judicious addition of some
> additional column indexes.  I duly did this, but the performance is
> not observably improved.  Code snippets below.
>
> I'm using Python 2.5.1, SA 0.4.4 and SQLite 3.5.7
>
> I'm not looking for blinding performance, just "some time today".  I'm
> trying to process about 1300 source records, and so far it's taken a
> couple of hours at 100% CPU utilization.  The sqlite data file is
> still zero length.  I can't tell how far it's got through the source
> data.

1300 rows, even if individually fetched and inserted one by one (which  
it seems is what youre doing), shouldn't take more than a minute.  So  
some questions to ask are, how big is this table you're selecting  
from ?   You are filtering on about ten columns.  What is important to  
note there is that if you did in fact place indexes on all ten columns  
to speed up selecting, you'd also directly impact the speed of  
insertion negatively.  So thats something to consider.

>
>
> My questions, then, are:
> 1. does the SQLAlchemy ORM/query/filter_by use column indexes to
> optimize queries?

your database is what would be taking advantage of column indexes.  If  
you tell SQLA to select from a table and filter on these 10 columns,  
theres no particular optimization to be done at the SQL construction  
layer.

>
> 2. if so, are there any particular steps I need to take to ensure this
> happens?

you should experiment with your databases "explain plan" feature to  
show what indexes are taking effect for the SQL being issued.   With  
SQLA, turn on SQL echoing to see the conversation taking place.

> As a supplementary question, are there any good reference materials
> for SA whose intended audience level lies somewhere between the
> excellent introductory tutorials and concise API reference available
> at the web site?

There is an oreilly book, which I havent seen yet, coming out in  
June.    I cant speak to its accuracy or up-to-dateness since we  
release new features very frequently.

>
> and the access code like this:
>
>                # Update occurrence count or add new message
>                emlq = session.query(Email).filter_by(
>                    sendadr  = sadr,
>                    fromadr  = fadr,
>                    toadr    = tadr,
>                    sendnam  = snam,
>                    fromnam  = fnam,
>                    tonam    = tnam,
>                    subject  = msgsubj,
>                    received = msgrcvd,
>                    spam     = msgspam,
>                    folderid = folderid)
>                try:
>                    eml = emlq.one()
>                except Exception, e:
>                    eml = Email(
>                        sadr, snam, fadr, fnam, tadr, tnam,
>                        msgsubj, msgrcvd, msgspam,
>                        mboxname, folderid)
>                eml.occurs += 1
>                session.save_or_update(eml)

One thing you definitely want to do here is flush() your session after  
several rows.  The reason for your latency might be just lots of  
pending data building up in the session unflushed.

Additionally, for a datafile import I'd probably not use the ORM at  
all and use straight SQL constructs, i.e.  
mytable.select().where(sendadr=sadr, fromadr=fadr, ...), and then  
mytable.update() or mytable.insert() depending on the results.  The  
SQL expression tutorial lays it all out how to use those.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to