date:20160715

Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Mike Bayer




On 07/15/2016 05:14 PM, Martijn van Oosterhout wrote:


On 15 July 2016 at 18:46, Mike Bayer > wrote:


Here's the problem that cant be solved:

1. fetch rows 0-1000

2. start loading Foo objects:

 Foo(1) ->  eagerload Foo.related with 400 members
 Foo(2) ->  eagerload Foo.related with 500 members
 Foo(3) -> eagerload Foo.related with the first 100 of 250 members

3. yield.  That is, stop reading more rows.  Send out Foo(1),
Foo(2), Foo(3).  The application now can go nuts with these.  It can
read them, write them, iterate through .related.

Right there, we're broken.  Foo(3) is out in the world with less
than half of its ".related" collection - they are still sitting on
the cursor!


Ok, I think I see what you're getting at but I don't think that a
problem here. It looks like you have a query which joins all the
relationships in one go. That's not what I'm doing, I'm relying on the
fact that I can interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched
multiple times, but this is not a problem for me (the whole thing is
read-only anyway). What I need however is for the secondary queries to
populate the relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do
pagination of any scale with no chance of complex failures.


A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking.
It doesn't say explicitly, but it looks like it will work transparently
with eager loading. It basically does the above, but skips the cursor
and replaces it with queries on ranges of the primary key (which is
shorter and probably more efficient to boot).


OK, but the IN loading you're doing, that's good too, I'd like to add a 
loader which uses that someday, and you're right, if we did, we could 
make it work with yield_per too due to the nature of it.  If you want to 
use that approach take a look at 
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading.




Thanks for the tip!
--
Martijn van Oosterhout >
http://svana.org/kleptog/

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout

On 15 July 2016 at 18:46, Mike Bayer  wrote:


> Here's the problem that cant be solved:
>
> 1. fetch rows 0-1000
>
> 2. start loading Foo objects:
>
>  Foo(1) ->  eagerload Foo.related with 400 members
>  Foo(2) ->  eagerload Foo.related with 500 members
>  Foo(3) -> eagerload Foo.related with the first 100 of 250 members
>
> 3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2),
> Foo(3).  The application now can go nuts with these.  It can read them,
> write them, iterate through .related.
>
> Right there, we're broken.  Foo(3) is out in the world with less than half
> of its ".related" collection - they are still sitting on the cursor!
>
>
Ok, I think I see what you're getting at but I don't think that a problem
here. It looks like you have a query which joins all the relationships in
one go. That's not what I'm doing, I'm relying on the fact that I can
interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched multiple
times, but this is not a problem for me (the whole thing is read-only
anyway). What I need however is for the secondary queries to populate the
relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do pagination
> of any scale with no chance of complex failures.
>
>
A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking. It
doesn't say explicitly, but it looks like it will work transparently with
eager loading. It basically does the above, but skips the cursor and
replaces it with queries on ranges of the primary key (which is shorter and
probably more efficient to boot).

Thanks for the tip!
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Mike Bayer




On 07/15/2016 12:08 PM, Martijn van Oosterhout wrote:

Ok, so SQLAlchemy has this nice feature where you can eager load
relationships to significantly reduce the number of queries during
processing.

On the other hand, to reduce memory usage you can use yield_per() (on
Postgres) to significantly reduce the memory usage by not loading the
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both
of these in the same query, yet that is kind of my goal. What I'd like
to achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them



Here's the problem that cant be solved:

1. fetch rows 0-1000

2. start loading Foo objects:

 Foo(1) ->  eagerload Foo.related with 400 members
 Foo(2) ->  eagerload Foo.related with 500 members
 Foo(3) -> eagerload Foo.related with the first 100 of 250 members

3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2), 
Foo(3).  The application now can go nuts with these.  It can read them, 
write them, iterate through .related.


Right there, we're broken.  Foo(3) is out in the world with less than 
half of its ".related" collection - they are still sitting on the cursor!


So let's try, don't actually yield Foo(3) until we've seen Foo(4), or 
the result has ended.  That is, let's try to be clever and look at the 
next row to see that we're definitely done with Foo(3).  This is 
completely complicated to do, but I have made it do this when I tried to 
make this work.


This is more insidious, because it will actually work most of the time. 
 However, it won't work when we don't actually get objects in order 
like that.  If Foo.related is a many-to-many, we could see a Bar in one 
yield batch, then see it all over again in another batch later.  What if 
the user application did all kinds of things to that Bar() in the first 
place, now we're seeing it again, but the identity map has already been 
garbage collected.  Now that Bar() is stale *before it ever left its 
result set*, not to mention if that Bar() has more eagerloaded 
collections, they're all wrong too!  It's a disaster.


Vastly simpler is to use the recommended window functions to do 
pagination of any scale with no chance of complex failures.









Now, the eager loading part is posing difficulties (or I'm not reading
the documentation carefully enough). I found the
|attributes.set_committed_value()|

function which solves half the problem, but I still need to generate the
actual query to return the necessary objects. So perhaps (pseudo-code):

|
defeagerload_for_set(object_list,relationship)
   ids =set(o.get(relationship.left_id)foro inobject_list)
   lookup
=Query(relationship.right_table).filter_by(relationship.right_column.in_(ids)).all()
   foro inobject_list:

 o.set_committed_value(o,relationship.left,lookup[relationship.left_id])
|

Before I go diving into the SQLAlchemy to make the above actually work,
does it seem reasonable? Are there are handy utils somewhere that might
help?

Thanks for any ideas,

Have a nice day,

--
Martijn van Oosterhout



--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout

Ok, so SQLAlchemy has this nice feature where you can eager load 
relationships to significantly reduce the number of queries during 
processing.

On the other hand, to reduce memory usage you can use yield_per() (on 
Postgres) to significantly reduce the memory usage by not loading the 
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both of 
these in the same query, yet that is kind of my goal. What I'd like to 
achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them

Now, the eager loading part is posing difficulties (or I'm not reading the 
documentation carefully enough). I found the 
attributes.set_committed_value() 

 
function which solves half the problem, but I still need to generate the 
actual query to return the necessary objects. So perhaps (pseudo-code):

def eagerload_for_set(object_list, relationship)
   ids = set(o.get(relationship.left_id) for o in object_list)
   lookup = Query(relationship.right_table).filter_by(relationship.
right_column.in_(ids)).all()
   for o in object_list:
   o.set_committed_value(o, relationship.left, lookup[relationship.
left_id])

Before I go diving into the SQLAlchemy to make the above actually work, 
does it seem reasonable? Are there are handy utils somewhere that might 
help?

Thanks for any ideas,

Have a nice day,

-- 
Martijn van Oosterhout



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Rolling back the session in a context manager

2016-07-15 Thread Mike Bayer




On 07/15/2016 07:49 AM, Alex Grönholm wrote:

The documentation provides the following example snippet for using
sessions within a context manager:


so, back when I started putting "examples" in those docs, the idea was 
like, "hey, here's an *example*.  The Python programmer is free to do 
whatever they wish with these examples, and adjust as necessary".


That is, the reason something is an example and not a feature is, "you 
don't have to do it this way!  do it however you want".


That there's been a trend recently of examples being used as is, but 
then when the example lacks some feature they result in bug reports 
against the library itself (not this case, but a different case recently 
comes to mind), is sadly the opposite of what i had intended.  Of course 
examples can be modified to be reasonable, however.





@contextmanager
def session_scope():
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()

I've been wondering why there is an except: block there. Shouldn't
session.close() be enough? At least according to the documentation, the
active transaction is rolled back by default when the connection is
returned to the pool.



that is correct.  However .close() does not reset the state of the 
objects managed by the Session to be "expired", which arguably is 
necessary because without the transaction, you now have no idea what the 
state of the object's corresponding rows in the database are (this is 
what the whole "SQLAlchemy Session: In Depth" talk is about).


In reality, the above context manager is probably not that useful 
because it bundles the lifespan of the Session and the lifespan of a 
transaction together, and IMO an application should be more thoughtful 
than that.




This snippet has a second potential problem: what if the transaction is
in a bad state when exiting the block? Shouldn't session.commit() be
skipped then?


it's assumed that if anything is in "a bad state" then an exception 
would have been raised, you'd not reach commit().


Otherwise, if the idea is, "I'm using this context manager, but I'm not 
sure I want to commit at the end even though nothing was raised", well 
then this is not the context manager for you :). The example of 
contextmanagers for things like writing files and such sets up the 
convention of, "open resource, flush out all changes at the end if no 
exceptions".   That's what people usually want.



Like, if not session.is_active: session.commit()? Let's

say the user code catches IntegrityError but doesn't roll back.


if it doesn't re-raise, then we'd hit the commit() and that would 
probably fail also (depending on backend).  I don't see how that's 
different from:


with open("important_file.txt", "w") as handle:
handle.write("important thing #1")
handle.write("important thing #2")
try:
 important_thing_number_three = calculate_special_thing()
 handle.write(important_thing_number_three)
except TerribleException:
 log.info("oh crap! someone should fix this someday.")
handle.write("important thing #4")





The

example code will then raise an exception when it tries to commit the
session transaction. Am I missing something?


On the better backends like Postgresql, it would.

If there's a use case you're looking for here, e.g. catch an 
IntegrityError but not leave the transaction, that's what savepoints are 
for.   There should be examples there.



Now, if someone on IRC is using savepoints with IntegrityError and the 
context manager above and they're on Python 2 and are using MySQL and 
getting deadlock errors and can't see the original cause, there's a very 
specific sad situation going on with that which is 
https://bitbucket.org/zzzeek/sqlalchemy/issues/2696 and I can help them 
with that.  But that's using the built in context managers.   Using your 
own context manager is a great way to get around that bug :).






--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit

Re: [sqlalchemy] How to get unambiguous column name for selectable?

2016-07-15 Thread Mike Bayer


I wasn't sure what the question was here.

To get the first ORM entity, use column_descriptions:

query = query.order_by(query.column_descriptions[0]["entity"].name)


http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=column_descriptions#sqlalchemy.orm.query.Query.column_descriptions

On 07/15/2016 04:49 AM, Andrew Pashkin wrote:

Bump.



--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Rolling back the session in a context manager

2016-07-15 Thread Alex Grönholm

The documentation provides the following example snippet for using sessions 
within a context manager:

@contextmanagerdef session_scope():
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()

I've been wondering why there is an except: block there. Shouldn't 
session.close() be enough? At least according to the documentation, the 
active transaction is rolled back by default when the connection is 
returned to the pool.
This snippet has a second potential problem: what if the transaction is in 
a bad state when exiting the block? Shouldn't session.commit() be skipped 
then? Like, if not session.is_active: session.commit()? Let's say the user 
code catches IntegrityError but doesn't roll back. The example code will 
then raise an exception when it tries to commit the session transaction. Am 
I missing something?

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] How to get unambiguous column name for selectable?

2016-07-15 Thread Andrew Pashkin


Bump.

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Combining yield_per and eager loading

Re: [sqlalchemy] Combining yield_per and eager loading

Re: [sqlalchemy] Combining yield_per and eager loading

[sqlalchemy] Combining yield_per and eager loading

Re: [sqlalchemy] Rolling back the session in a context manager

Re: [sqlalchemy] How to get unambiguous column name for selectable?

[sqlalchemy] Rolling back the session in a context manager

Re: [sqlalchemy] How to get unambiguous column name for selectable?

8 matches

Site Navigation

Mail list logo

Footer information