Using ORM querying what is the best practice for limiting the output to a
given number of resulting *entities*?
Consider this model:
from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Department(Base):
__tablename__ = 'department'
id = Column(Integer, primary_key=True)
name = Column(String)
def __repr__(self):
return f'Department({self.id}, {self.name})'
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String)
department_id = Column(Integer, ForeignKey('department.id'))
# Use cascade='delete,all' to propagate the deletion of a Department
onto its Employees
department = relationship(
Department,
backref=backref('employees', uselist=True,
cascade='delete,all'))
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
session.add(Department(id=1, name='IT'))
session.add(Department(id=2, name='Finance'))
session.add(Department(id=3, name='Sales'))
session.add(Employee(id=1, name='Victor', department_id=1))
session.add(Employee(id=2, name='Michal', department_id=1))
session.add(Employee(id=3, name='Kinga', department_id=2))
session.add(Employee(id=4, name='Andy', department_id=3))
session.commit()
Now, let's query for the list of Departments given some criteria on the
Employee:
session.query(Department).join(Employee).filter(Employee.name.in_(['Andy',
'Kinga', 'Victor', 'Michal'])).all()
As expected we get:
[Department(1, IT), Department(2, Finance), Department(3, Sales)]
Now suppose our intent is to limit the number of results, and we would
prefer to use a LIMIT clause to do the filtering on the database side:
session.query(Department).join(Employee).filter(Employee.name.in_(['Andy',
'Kinga', 'Victor', 'Michal'])).limit(2).all()
This (in my case) resulted in:
[Department(1, IT)]
The reason for this is that the limit gets applied to the resulting *rows* from
the joined tables, which happen to begin with the two employees from IT
(this is non-deterministic unless order by is also used). Since both
represent the same entity, only one instance is returned.
Other approaches (and their shortcomings) are:
1. Using a DISTINCT clause prior to LIMIT - won't work if any field is
include non-comparable types like IMAGE, BLOB
2. Using a subquery on the Department with LIMIT - this may filter out
departments, which would otherwise match
What works is wrapping the entire query in a subquery, but only selecting a
DISTINCT Department.id field, and using that in the IN clause. It seems
quite convoluted and some databases might not do great to optimize this
away.
sq = session.query(Department.id).join(Employee).filter(Employee.name.in_([
'Andy', 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery()
session.query(Department).filter(Department.id.in_(sq)).all()
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit
https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.