Brian Granger wrote:
>
> Hi,
>
> I am using sqlalchemy through its orm layer.  The basic flow of my
> program is this:
>
> 1.  Download about 18000 web pages and store the raw HTML in a table
> 2.  Iterate through each web page and use pyparsing to parse it.  I
> then insert the results of that
> parsing into about a dozen different tables.
>
> The second step is taking a long time (above and beyond the fixed
> parsing time) and I want to make sure that I am handling  the session
> in an optimal manner.  Here is a sketch of what I do in step 2:
>
> session = make_my_session()
> query = session.query(MyModel)
> for obj in query:    # 18,000 of these
>     try:
>         obj.parse()  # creates a bunch of orm classes in my db (in
> about 12 tables)
>         session.commit()
>     except:
>         session.rollback()
>
> Is this a good way of doing this?  Is there a better/faster way?

upgrade to 0.5.5 for starters.  around version 0.5.2 we killed an immense
speed issue in the session that was brought about by sessions with large
numbers of objects.

second thing, this would apply to either, do the session.commit() every
100 objects or so, which can reduce how many objects continue to hang
around in memory.

thirdly, querying 18000 objects at once requires that they all be loaded
at once.   You may want to use yield_per() or load many "windows" of
results using limit()/offset().

for #2 and #3, using 0.5.5 again is extremely recommended since the
Session also is able to drop references to objects fully as of the same
0.5.2 release, so memory usage will be improved.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to