Re: [sqlalchemy] How should I eagerly load a property from multiple levels of a self-referential table?

2018-10-26 Thread Alex Wang
Ah, ok. I thought defaultload() meant use whatever was originally specified 
in the relationship(). That helps a lot!

On Friday, October 26, 2018 at 12:34:53 PM UTC-4, Mike Bayer wrote:
>
> On Fri, Oct 26, 2018 at 12:10 PM Alex Wang  > wrote: 
> > 
> > I ended up needing one more layer of selectinload(Node.direct_children), 
> but it got rid of all the lazy loads. Thanks for the suggestion; that's 
> definitely a lot better than what I initially had. I was thinking that I 
> needed to specify eager loading separately, but now that I think more about 
> your suggestion the redundancy becomes clearer. 
> > 
> > There's one more layer of complications --- if I need to get multiple 
> entries off of each node, is there a clean-ish way of eagerly loading those 
> attributes in addition to the entries mentioned in my original post? Or 
> should I just write a function like what you suggested and use that? 
>
> if you want to keep specifying the same path and add more attributes, 
> defaultload() helps with that: 
>
> selectinload(Foo.bar).selectinload(Bar.attribute1) 
> defaultload(Foo.bar).selectinload(Bar.attribute2) 
> defaultload(Foo.bar).selectinload(Bar.attribute3) 
>
> above, the "defaultload" just means, "here is a token in the path but 
> don't change the existing loading options on this token". 
>
>
>
>
>
> > 
> > And just to make sure, selectinloads should be the right choice for this 
> kind of nested collection, right? 
> > 
> > Thanks! 
> > 
> > On Thursday, October 25, 2018 at 6:54:53 PM UTC-4, Mike Bayer wrote: 
> >> 
> >> On Thu, Oct 25, 2018 at 4:21 PM Alex Wang  wrote: 
> >> > 
> >> > Hi all! 
> >> > 
> >> > I'm trying to write a small script to interface with a database 
> controlled by a third-party application, and I'm not sure the way I set up 
> eager loading is right. 
> >> > 
> >> > The code I have looks something like the following: 
> >> > 
> >> > from sqlalchemy.ext.declarative import declarative_base 
> >> > from sqlalchemy.dialects.mssql import FLOAT, NVARCHAR, 
> UNIQUEIDENTIFIER 
> >> > from sqlalchemy.orm import foreign, relationship, remote 
> >> > from sqlalchemy.schema import Column 
> >> > Base = declarative_base() 
> >> > class Node(Base): 
> >> > NodeID = Column(NVARCHAR(length=65), primary_key=True, 
> nullable=False) 
> >> > ParentNodeID = Column(NVARCHAR(length=65), nullable=True) 
> >> > direct_children = relationship('Node', primaryjoin=(NodeID == 
> remote(foreign(ParentNodeID 
> >> > entries = relationship('Entry', primaryjoin=('Node.NodeID == 
> remote(foreign(Entry.NodeID))') 
> >> > 
> >> > def node_and_all_children(self): 
> >> > result = [self] 
> >> > for child in self.direct_children: 
> >> > result += child.node_and_all_children() 
> >> > return result 
> >> > 
> >> > def cost(self): 
> >> > entries = [e for p in self.node_and_all_children() for e in 
> p.entries] 
> >> > return sum(e.Value1 * e.Value2 for e in entries) 
> >> > 
> >> > class Entry(Base): 
> >> > TEID = Column(UNIQUEIDENTIFIER, primary_key=True, nullable=False) 
> >> > NodeID = Column(NVARCHAR(length=65), nullable=False) 
> >> > Value1 = Column(FLOAT, nullable=True) 
> >> > Value2 = Column(FLOAT, nullable=True) 
> >> > 
> >> > I want to write something like this: 
> >> > 
> >> > def get_costs(session, node_ids: List[str]): 
> >> > nodes = 
> session.query(Node).filter(Node.NodeID.in_(node_ids)).all() 
> >> > return {n.NodeID: n.cost() for n in nodes} 
> >> > 
> >> > From what I understand, this results in an N+1-ish access pattern. I 
> can eagerly load the children easily enough (I think I can guarantee <= 3 
> levels of children, so three selectinload() calls are hopefully enough?): 
> >> > 
> >> > def get_costs(session, node_ids: List[str]): 
> >> > nodes = ( 
> >> > session.query(Node) 
> >> > .options( 
> >> > selectinload(Node.direct_children) 
> >> > .selectinload(Node.direct_children) 
> >> > .selectinload(Node.direct_children) 
> >> > ) 
> >> > .filter(Node.NodeID.in_(node_ids)) 
> >> > .all() 
> >> > ) 
> >> > return {n.NodeID: n.cost() for n in nodes} 
> >> > 
> >> > It's here that I get stuck, though. This eagerly loads the children, 
> but doesn't eagerly load each node's entries, which results in a query 
> being sent to the database for each child. I can specify that entries 
> should be eagerly loaded at each level: 
> >> > 
> >> > def get_costs(session, node_ids: List[str]): 
> >> > nodes = ( 
> >> > session.query(Node) 
> >> > .options( 
> >> > selectinload(Node.direct_children) 
> >> > .selectinload(Node.direct_children) 
> >> > .selectinload(Node.direct_children), 
> >> > selectinload(Node.entries), 
> >> > 
> selectinload(Node.direct_children).selectinload(Node.entries), 
> >> > 
> selectinlo

Re: [sqlalchemy] How should I eagerly load a property from multiple levels of a self-referential table?

2018-10-26 Thread Mike Bayer
On Fri, Oct 26, 2018 at 12:10 PM Alex Wang  wrote:
>
> I ended up needing one more layer of selectinload(Node.direct_children), but 
> it got rid of all the lazy loads. Thanks for the suggestion; that's 
> definitely a lot better than what I initially had. I was thinking that I 
> needed to specify eager loading separately, but now that I think more about 
> your suggestion the redundancy becomes clearer.
>
> There's one more layer of complications --- if I need to get multiple entries 
> off of each node, is there a clean-ish way of eagerly loading those 
> attributes in addition to the entries mentioned in my original post? Or 
> should I just write a function like what you suggested and use that?

if you want to keep specifying the same path and add more attributes,
defaultload() helps with that:

selectinload(Foo.bar).selectinload(Bar.attribute1)
defaultload(Foo.bar).selectinload(Bar.attribute2)
defaultload(Foo.bar).selectinload(Bar.attribute3)

above, the "defaultload" just means, "here is a token in the path but
don't change the existing loading options on this token".





>
> And just to make sure, selectinloads should be the right choice for this kind 
> of nested collection, right?
>
> Thanks!
>
> On Thursday, October 25, 2018 at 6:54:53 PM UTC-4, Mike Bayer wrote:
>>
>> On Thu, Oct 25, 2018 at 4:21 PM Alex Wang  wrote:
>> >
>> > Hi all!
>> >
>> > I'm trying to write a small script to interface with a database controlled 
>> > by a third-party application, and I'm not sure the way I set up eager 
>> > loading is right.
>> >
>> > The code I have looks something like the following:
>> >
>> > from sqlalchemy.ext.declarative import declarative_base
>> > from sqlalchemy.dialects.mssql import FLOAT, NVARCHAR, UNIQUEIDENTIFIER
>> > from sqlalchemy.orm import foreign, relationship, remote
>> > from sqlalchemy.schema import Column
>> > Base = declarative_base()
>> > class Node(Base):
>> > NodeID = Column(NVARCHAR(length=65), primary_key=True, nullable=False)
>> > ParentNodeID = Column(NVARCHAR(length=65), nullable=True)
>> > direct_children = relationship('Node', primaryjoin=(NodeID == 
>> > remote(foreign(ParentNodeID
>> > entries = relationship('Entry', primaryjoin=('Node.NodeID == 
>> > remote(foreign(Entry.NodeID))')
>> >
>> > def node_and_all_children(self):
>> > result = [self]
>> > for child in self.direct_children:
>> > result += child.node_and_all_children()
>> > return result
>> >
>> > def cost(self):
>> > entries = [e for p in self.node_and_all_children() for e in 
>> > p.entries]
>> > return sum(e.Value1 * e.Value2 for e in entries)
>> >
>> > class Entry(Base):
>> > TEID = Column(UNIQUEIDENTIFIER, primary_key=True, nullable=False)
>> > NodeID = Column(NVARCHAR(length=65), nullable=False)
>> > Value1 = Column(FLOAT, nullable=True)
>> > Value2 = Column(FLOAT, nullable=True)
>> >
>> > I want to write something like this:
>> >
>> > def get_costs(session, node_ids: List[str]):
>> > nodes = session.query(Node).filter(Node.NodeID.in_(node_ids)).all()
>> > return {n.NodeID: n.cost() for n in nodes}
>> >
>> > From what I understand, this results in an N+1-ish access pattern. I can 
>> > eagerly load the children easily enough (I think I can guarantee <= 3 
>> > levels of children, so three selectinload() calls are hopefully enough?):
>> >
>> > def get_costs(session, node_ids: List[str]):
>> > nodes = (
>> > session.query(Node)
>> > .options(
>> > selectinload(Node.direct_children)
>> > .selectinload(Node.direct_children)
>> > .selectinload(Node.direct_children)
>> > )
>> > .filter(Node.NodeID.in_(node_ids))
>> > .all()
>> > )
>> > return {n.NodeID: n.cost() for n in nodes}
>> >
>> > It's here that I get stuck, though. This eagerly loads the children, but 
>> > doesn't eagerly load each node's entries, which results in a query being 
>> > sent to the database for each child. I can specify that entries should be 
>> > eagerly loaded at each level:
>> >
>> > def get_costs(session, node_ids: List[str]):
>> > nodes = (
>> > session.query(Node)
>> > .options(
>> > selectinload(Node.direct_children)
>> > .selectinload(Node.direct_children)
>> > .selectinload(Node.direct_children),
>> > selectinload(Node.entries),
>> > selectinload(Node.direct_children).selectinload(Node.entries),
>> > 
>> > selectinload(Node.direct_children).selectinload(Node.direct_children).selectinload(Node.entries)
>> > )
>> > .filter(Node.NodeID.in_(node_ids))
>> > .all()
>> > )
>> > return {n.NodeID: n.cost() for n in nodes}
>> >
>> > But this feels pretty gross, and I'm hoping there is a better way.
>>
>> you don't have to specify selectinload() twice like that, you can say:
>>
>> selectinload(Node.entries),
>> selectinload(N

Re: [sqlalchemy] How should I eagerly load a property from multiple levels of a self-referential table?

2018-10-26 Thread Alex Wang
I ended up needing one more layer of selectinload(Node.direct_children), 
but it got rid of all the lazy loads. Thanks for the suggestion; that's 
definitely a lot better than what I initially had. I was thinking that I 
needed to specify eager loading separately, but now that I think more about 
your suggestion the redundancy becomes clearer.

There's one more layer of complications --- if I need to get multiple 
entries off of each node, is there a clean-ish way of eagerly loading those 
attributes in addition to the entries mentioned in my original post? Or 
should I just write a function like what you suggested and use that?

And just to make sure, selectinloads should be the right choice for this 
kind of nested collection, right?

Thanks!

On Thursday, October 25, 2018 at 6:54:53 PM UTC-4, Mike Bayer wrote:
>
> On Thu, Oct 25, 2018 at 4:21 PM Alex Wang > 
> wrote: 
> > 
> > Hi all! 
> > 
> > I'm trying to write a small script to interface with a database 
> controlled by a third-party application, and I'm not sure the way I set up 
> eager loading is right. 
> > 
> > The code I have looks something like the following: 
> > 
> > from sqlalchemy.ext.declarative import declarative_base 
> > from sqlalchemy.dialects.mssql import FLOAT, NVARCHAR, UNIQUEIDENTIFIER 
> > from sqlalchemy.orm import foreign, relationship, remote 
> > from sqlalchemy.schema import Column 
> > Base = declarative_base() 
> > class Node(Base): 
> > NodeID = Column(NVARCHAR(length=65), primary_key=True, 
> nullable=False) 
> > ParentNodeID = Column(NVARCHAR(length=65), nullable=True) 
> > direct_children = relationship('Node', primaryjoin=(NodeID == 
> remote(foreign(ParentNodeID 
> > entries = relationship('Entry', primaryjoin=('Node.NodeID == 
> remote(foreign(Entry.NodeID))') 
> > 
> > def node_and_all_children(self): 
> > result = [self] 
> > for child in self.direct_children: 
> > result += child.node_and_all_children() 
> > return result 
> > 
> > def cost(self): 
> > entries = [e for p in self.node_and_all_children() for e in 
> p.entries] 
> > return sum(e.Value1 * e.Value2 for e in entries) 
> > 
> > class Entry(Base): 
> > TEID = Column(UNIQUEIDENTIFIER, primary_key=True, nullable=False) 
> > NodeID = Column(NVARCHAR(length=65), nullable=False) 
> > Value1 = Column(FLOAT, nullable=True) 
> > Value2 = Column(FLOAT, nullable=True) 
> > 
> > I want to write something like this: 
> > 
> > def get_costs(session, node_ids: List[str]): 
> > nodes = session.query(Node).filter(Node.NodeID.in_(node_ids)).all() 
> > return {n.NodeID: n.cost() for n in nodes} 
> > 
> > From what I understand, this results in an N+1-ish access pattern. I can 
> eagerly load the children easily enough (I think I can guarantee <= 3 
> levels of children, so three selectinload() calls are hopefully enough?): 
> > 
> > def get_costs(session, node_ids: List[str]): 
> > nodes = ( 
> > session.query(Node) 
> > .options( 
> > selectinload(Node.direct_children) 
> > .selectinload(Node.direct_children) 
> > .selectinload(Node.direct_children) 
> > ) 
> > .filter(Node.NodeID.in_(node_ids)) 
> > .all() 
> > ) 
> > return {n.NodeID: n.cost() for n in nodes} 
> > 
> > It's here that I get stuck, though. This eagerly loads the children, but 
> doesn't eagerly load each node's entries, which results in a query being 
> sent to the database for each child. I can specify that entries should be 
> eagerly loaded at each level: 
> > 
> > def get_costs(session, node_ids: List[str]): 
> > nodes = ( 
> > session.query(Node) 
> > .options( 
> > selectinload(Node.direct_children) 
> > .selectinload(Node.direct_children) 
> > .selectinload(Node.direct_children), 
> > selectinload(Node.entries), 
> > 
> selectinload(Node.direct_children).selectinload(Node.entries), 
> > 
> selectinload(Node.direct_children).selectinload(Node.direct_children).selectinload(Node.entries)
>  
>
> > ) 
> > .filter(Node.NodeID.in_(node_ids)) 
> > .all() 
> > ) 
> > return {n.NodeID: n.cost() for n in nodes} 
> > 
> > But this feels pretty gross, and I'm hoping there is a better way. 
>
> you don't have to specify selectinload() twice like that, you can say: 
>
> selectinload(Node.entries), 
> selectinload(Node.direct_children).selectinload(Node.entries), 
> selectinload(Node.direct_children).selectinload(Node.direct_children).selectinload(Node.entries)
>  
>
>
> if you are using query.options(), that's the API.  if you put the 
> selectinload on your relationship, you can use join_depth: 
>
> direct_children = relationship(Node, lazy="selectin", join_depth=3) 
>
> The options form of this doesn't have a "join_depth" option right now 
> but you can certainly make yourself a function, somethi

Re: [sqlalchemy] How should I eagerly load a property from multiple levels of a self-referential table?

2018-10-25 Thread Mike Bayer
On Thu, Oct 25, 2018 at 4:21 PM Alex Wang  wrote:
>
> Hi all!
>
> I'm trying to write a small script to interface with a database controlled by 
> a third-party application, and I'm not sure the way I set up eager loading is 
> right.
>
> The code I have looks something like the following:
>
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy.dialects.mssql import FLOAT, NVARCHAR, UNIQUEIDENTIFIER
> from sqlalchemy.orm import foreign, relationship, remote
> from sqlalchemy.schema import Column
> Base = declarative_base()
> class Node(Base):
> NodeID = Column(NVARCHAR(length=65), primary_key=True, nullable=False)
> ParentNodeID = Column(NVARCHAR(length=65), nullable=True)
> direct_children = relationship('Node', primaryjoin=(NodeID == 
> remote(foreign(ParentNodeID
> entries = relationship('Entry', primaryjoin=('Node.NodeID == 
> remote(foreign(Entry.NodeID))')
>
> def node_and_all_children(self):
> result = [self]
> for child in self.direct_children:
> result += child.node_and_all_children()
> return result
>
> def cost(self):
> entries = [e for p in self.node_and_all_children() for e in p.entries]
> return sum(e.Value1 * e.Value2 for e in entries)
>
> class Entry(Base):
> TEID = Column(UNIQUEIDENTIFIER, primary_key=True, nullable=False)
> NodeID = Column(NVARCHAR(length=65), nullable=False)
> Value1 = Column(FLOAT, nullable=True)
> Value2 = Column(FLOAT, nullable=True)
>
> I want to write something like this:
>
> def get_costs(session, node_ids: List[str]):
> nodes = session.query(Node).filter(Node.NodeID.in_(node_ids)).all()
> return {n.NodeID: n.cost() for n in nodes}
>
> From what I understand, this results in an N+1-ish access pattern. I can 
> eagerly load the children easily enough (I think I can guarantee <= 3 levels 
> of children, so three selectinload() calls are hopefully enough?):
>
> def get_costs(session, node_ids: List[str]):
> nodes = (
> session.query(Node)
> .options(
> selectinload(Node.direct_children)
> .selectinload(Node.direct_children)
> .selectinload(Node.direct_children)
> )
> .filter(Node.NodeID.in_(node_ids))
> .all()
> )
> return {n.NodeID: n.cost() for n in nodes}
>
> It's here that I get stuck, though. This eagerly loads the children, but 
> doesn't eagerly load each node's entries, which results in a query being sent 
> to the database for each child. I can specify that entries should be eagerly 
> loaded at each level:
>
> def get_costs(session, node_ids: List[str]):
> nodes = (
> session.query(Node)
> .options(
> selectinload(Node.direct_children)
> .selectinload(Node.direct_children)
> .selectinload(Node.direct_children),
> selectinload(Node.entries),
> selectinload(Node.direct_children).selectinload(Node.entries),
> 
> selectinload(Node.direct_children).selectinload(Node.direct_children).selectinload(Node.entries)
> )
> .filter(Node.NodeID.in_(node_ids))
> .all()
> )
> return {n.NodeID: n.cost() for n in nodes}
>
> But this feels pretty gross, and I'm hoping there is a better way.

you don't have to specify selectinload() twice like that, you can say:

selectinload(Node.entries),
selectinload(Node.direct_children).selectinload(Node.entries),
selectinload(Node.direct_children).selectinload(Node.direct_children).selectinload(Node.entries)

if you are using query.options(), that's the API.  if you put the
selectinload on your relationship, you can use join_depth:

direct_children = relationship(Node, lazy="selectin", join_depth=3)

The options form of this doesn't have a "join_depth" option right now
but you can certainly make yourself a function, something like this:

def selectinload_selfref(attr, depth):
opt = selectinload(attr)
for i in range(depth - 1):
   opt = opt.selectinload(attr)
   return opt




>
> Is there a cleaner way of specifying that entries should be eagerly loaded 
> for each level of children? Or is there a totally different query structure 
> that would be even better?
>
> Thanks!
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolk

[sqlalchemy] How should I eagerly load a property from multiple levels of a self-referential table?

2018-10-25 Thread Alex Wang
Hi all!

I'm trying to write a small script to interface with a database controlled 
by a third-party application, and I'm not sure the way I set up eager 
loading is right.

The code I have looks something like the following:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.mssql import FLOAT, NVARCHAR, UNIQUEIDENTIFIER
from sqlalchemy.orm import foreign, relationship, remote
from sqlalchemy.schema import Column
Base = declarative_base()
class Node(Base):
NodeID = Column(NVARCHAR(length=65), primary_key=True, nullable=False)
ParentNodeID = Column(NVARCHAR(length=65), nullable=True)
direct_children = relationship('Node', primaryjoin=(NodeID == remote(
foreign(ParentNodeID
entries = relationship('Entry', primaryjoin=('Node.NodeID == 
remote(foreign(Entry.NodeID))')

def node_and_all_children(self):
result = [self]
for child in self.direct_children:
result += child.node_and_all_children()
return result

def cost(self):
entries = [e for p in self.node_and_all_children() for e in p.
entries]
return sum(e.Value1 * e.Value2 for e in entries)

class Entry(Base):
TEID = Column(UNIQUEIDENTIFIER, primary_key=True, nullable=False)
NodeID = Column(NVARCHAR(length=65), nullable=False)
Value1 = Column(FLOAT, nullable=True)
Value2 = Column(FLOAT, nullable=True)

I want to write something like this:

def get_costs(session, node_ids: List[str]):
nodes = session.query(Node).filter(Node.NodeID.in_(node_ids)).all()
return {n.NodeID: n.cost() for n in nodes}

>From what I understand, this results in an N+1-ish access pattern. I can 
eagerly load the children easily enough (I think I can guarantee <= 3 
levels of children, so three selectinload() calls are hopefully enough?):

def get_costs(session, node_ids: List[str]):
nodes = (
session.query(Node)
.options(
selectinload(Node.direct_children)
.selectinload(Node.direct_children)
.selectinload(Node.direct_children)
)
.filter(Node.NodeID.in_(node_ids))
.all()
)
return {n.NodeID: n.cost() for n in nodes}

It's here that I get stuck, though. This eagerly loads the children, but 
doesn't eagerly load each node's entries, which results in a query being 
sent to the database for each child. I can specify that entries should be 
eagerly loaded at each level:

def get_costs(session, node_ids: List[str]):
nodes = (
session.query(Node)
.options(
selectinload(Node.direct_children)
.selectinload(Node.direct_children)
.selectinload(Node.direct_children),
selectinload(Node.entries),
selectinload(Node.direct_children).selectinload(Node.entries),
selectinload(Node.direct_children).selectinload(Node.
direct_children).selectinload(Node.entries)
)
.filter(Node.NodeID.in_(node_ids))
.all()
)
return {n.NodeID: n.cost() for n in nodes}

But this feels pretty gross, and I'm hoping there is a better way.

Is there a cleaner way of specifying that entries should be eagerly loaded 
for each level of children? Or is there a totally different query structure 
that would be even better?

Thanks!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.