Re: [sqlalchemy] Is there any way to get around the "redundancy" of coupled .join and contains_eager chains?

Mike Bayer Mon, 01 Oct 2018 12:41:57 -0700

On Mon, Oct 1, 2018 at 1:46 PM seaders <[email protected]> wrote:
>
> Actually, is there a much, much simpler option of something like "no 
> aliasing" for a query?
>
> I.e.
>
>   EventMarket.query
>   .options(
>     joinedload(EventMarket.event, with_aliases=False)
>   )
>   .filter(Event.date == '2018-10-01')


that would be a much easier option to add.   however you need to be OK
adding your criteria with "filter" and not as part of the ON clause.

feel free to provide a PR for that which includes tests.    im not
sure how hard of a change it is as the joined eager load code is
pretty intricate.  might need some flags propagated in a bunch of
places, not sure.


>
> Or
>
>   EventMarket.query
>   .options(
>     without_aliases,
>     joinedload(EventMarket.event)
>   )
>   .filter(Event.date == '2018-10-01')
>
> Which would produce
>
>   SELECT ...
>   FROM event_market
>   LEFT OUTER JOIN event AS event ON event.id = event_market_1.event_id
>   WHERE event.us_date = '2018-10-01'
>
> Instead of
>
>   SELECT ...
>   FROM event_market, event
>   LEFT OUTER JOIN event AS event_1 ON event_1.id = event_market_1.event_id
>   WHERE event.us_date = '2018-10-01'
>
> Which is my main bone of contention when you don't do the multiple references,
>
>   EventMarket.query
>   .options(
>     joinedload(EventMarket.event)
>   )
>   .filter(Event.date == '2018-10-01')
>
> And that joins the Event table as event_1, but also includes it in the FROM 
> list, as event,
>
>
> I consider that a pretty standard type of query which produces a very 
> incorrect result (all the worse when you've 1000s of event_market rows, and 
> the thing you're working on was quick, but is now really, really slow - 
> because you only meant to get them for 1 day).  Now, I consider the result of 
> that query so different from what you think would happen (you JOIN event, and 
> filter those results in WHERE), that I would at least include a warning that 
> this is probably not what you wanted to happen.  That's the thing with my 
> feeling about the redundancy factor.  It'd be one thing if the Python code 
> above didn't work, and produced an error in SQLAlchemy or with the dB engine, 
> but my problem is the fact that it works, and produces what it does.
>
> But anyway, you're the maintainer and driver of the API, so I'll say no more 
> about that.
>
> With `with_aliases=False` or `without_aliases`, that'd handle all, and even 
> with pathing, it *could* handle that, too.  I've been caught previously with 
> a very similar situation like this specifically because I didn't realise the 
> join of a pathed relationship with a secondary join and table was put in 
> brackets in the query.  It was basically the same thing, I wanted to be using 
> the relationship as the join in the query, but be able to filter based on the 
> tables in that relationship.
>
> Basically,
>
> class Source(Base):
>     id = Column(Integer, primary_key=True)
>     name = Column(String(50), unique=True)
>
> class Sport(Base):
>     id = Column(Integer, primary_key=True)
>     name = Column(String(50), unique=True)
>
> class Competition(Base):
>     id = Column(Integer, primary_key=True)
>     name = Column(String(50), unique=True)
>
>     sport_id = Column(Integer, ForeignKey(Sport.id), primary_key=True)
>     sport = relationship(Sport)
>
> class SourceCompetition(Base):
>     comp_id = Column(Integer, ForeignKey(Competition.id), primary_key=True)
>     comp = relationship(Competition)
>
>     source_id = Column(Integer, ForeignKey(Source.id), primary_key=True)
>     source = relationship(Source)
>
>     source_comp_id = Column(String(50), unique=True)
>
> class SourceSport(Base):
>     sport_id = Column(Integer, ForeignKey(Sport.id), primary_key=True)
>     sport = relationship(Sport)
>
>     source_id = Column(Integer, ForeignKey(Source.id), primary_key=True)
>     source = relationship(Source)
>
>     source_sport_id = Column(String(50), unique=True)
>
>     source_competitions = relationship(
>         SourceCompetition,
>         secondary=lambda: table_for(Competition),
>         primaryjoin=lambda: and_(
>             SourceSport.sport_id == Competition.sport_id,
>             SourceSport.source_id == SourceCompetition.source_id,
>         ),
>         secondaryjoin=lambda: and_(
>             Competition.id == SourceCompetition.comp_id,
>         ),
>         uselist=True,
>     )
>
>
> For the above, I kinda consider it a near identical issue, I've 
> "front-loaded" the logic of how SourceSport and SourceCompetition are 
> connected in the relationship, so then I could run a simple query like,
>
>   SourceSport.query
>   .options(
>     joinedload(SourceSport.source_competitions)
>    .joinedload(SourceCompetition.competition)
>   )
>   .filter(Source.id == 1,
>            Sport.name == 'American Football')
>
> I get problems just with source_competitions, and how it self-references 
> tables,
>
> SELECT ...
> FROM sport, source_sport
>   LEFT OUTER JOIN (competition AS competition_1 INNER JOIN source_competition 
> AS source_competition_1
>                    ON competition_1.id = source_competition_1.comp_id)
>   ON source_sport.sport_id = competition_1.sport_id AND 
> source_sport.source_id = source_competition.source_id
>
> WHERE team_sports_sport.name = %s
>
>
> Leads to,
>
> _mysql_exceptions.OperationalError: (1054, "Unknown column 
> 'source_competition.source_id' in 'on clause'")
>
>
> Never mind using those same tables for extra filtering outside!
>
> Like the request above, if there were an option for without_aliasing,
>
> SELECT ...
> FROM sport, source_sport
>   LEFT OUTER JOIN (competition INNER JOIN source_competition
>                    ON competition.id = source_competition.comp_id)
>   ON source_sport.sport_id = competition.sport_id AND source_sport.source_id 
> = source_competition.source_id
>
> WHERE team_sports_sport.name = %s
>
>
> It works.
>
> How about that, then?  An explicit option for no aliasing?  That way, if 
> someone (me) trips over and falls flat on their face by referencing the same 
> table in two places incorrectly, that's explicitly our problem, because 
> that's the option we've stipulated in the join.
>
>
> On Monday, October 1, 2018 at 4:36:04 PM UTC+1, Mike Bayer wrote:
>>
>>
>> Well again, there's two public API points that do what you want, you just 
>> don't like calling both, so I don't think a fork is needed, there are a lot 
>> of ways to extend Query:
>>
>> 1. you can subclass it.  Such as, override join() and add a flag 
>> "eagerload=True", then apply the contains_eager() option.
>>
>> 2. you can use the with_transformation hook I illustrated earlier:  
>> https://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=with_transformation#sqlalchemy.orm.query.Query.with_transformation
>>
>> 3. you can make your own MapperOption (bug, it seems to not be linked in the 
>> docs right now, it should be, here it is: 
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/interfaces.py#L572).
>>    The only glitch there is that you have to modify the Query in place to do 
>> the query.join() part which might not be that simple.
>>
>> 4. you can just use your own function (not too different from 
>> with_transformation)
>>
>>
>> right off, no API that I like is occurring to me as yet.    I dont like 
>> hardcoding an AND and assuming the existing join condtion should stay, as 
>> the extra_join_condition proposes.  That's a very specific and arbitrary use 
>> case, does not support being able to change the ON clause entirely, does not 
>> support being able to load the objects from a different entity such as a 
>> SELECT statement, does not support being able to change the ORDER BY, etc.   
>>  The Query can already generate exact joins/conditions/orderings etc. using 
>> its existing API which is here focused on the join() method.
>>
>> the query.join(..., eagerload=True) would be the best candidate but it would 
>> only make sense with a limited set of arguments that can be passed to 
>> joinI().  That is, this makes sense:
>>
>> query.join(Foo.bar, eagerload=True)
>>
>> because the relationship is apparent, "Foo.bar".
>>
>> these do not, because no relationship name is apparent:
>>
>> query.join(some_table)
>>
>> query.join(SomeClass, SomeClass.foo == OtherClass.bar)
>>
>>
>> so then the most unambiguous API is :
>>
>> query.join(<whatever>, eagerload=Foo.bar)
>>
>> But even then, what if the eagerload here is A.bs -> B.foo -> Foo.bar, and 
>> explicitly *not* A.cs -> C.foo -> Foo.bar ?  there's no pathing here and it 
>> is again ambiguous (same problem with just a boolean flag also).   The issue 
>> of "pathing" is one that users aren't generally aware of because it doesn't 
>> come into play for simple queries, but once a mapping is more complicated 
>> then they suddenly are aware of it, and it is the main reason why the APIs 
>> here have to be very deliberate in their nature.    Part of SQLAlchemy's 
>> nature is that everything has to be as generalizable as possible.   We try 
>> really hard not to add one-offs that can't be part of a larger composable 
>> vocabulary.
>>
>> It's unfortunately a bad API pattern to take a very explicit and clear API, 
>> that is, query.join().options(contains_eager(...)) and attempt to "shorten" 
>> it with another API that does the exact same thing (e.g. is redundant) but 
>> also introduces complexity and ambiguity (e.g. doesn't even work as well, 
>> introduces bugs, etc).
>>
>> So I have no ideas here as of yet, though I agree if the existing pattern is 
>> a little verbose.
>>
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>>
>>> http://www.sqlalchemy.org/
>>>
>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> description.
>>> ---
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Is there any way to get around the "redundancy" of coupled .join and contains_eager chains?

Reply via email to