Re: [sqlalchemy] Help writing query - "where most recent is true, but any other is false"

Simon King Thu, 14 Dec 2017 02:31:06 -0800

On Wed, Dec 13, 2017 at 10:24 PM, Leo Simmons <l...@si14.io> wrote:
> I have a table `special_product_list` and a table
> `special_product_historical_details_list` that holds information of products
> over time, where there's a record in the history table every day for every
> product.
>
> I want to write a query to select all products from `special_product_list`
> that meet the two following conditions:
>     - the product's most recent `special_product_historical_details_list`
> entry has `available=true`
>     - at least one other `special_product_historical_details_list` entry for
> that product except the most recent one has `available=false`
>
> if the following were the most recent
> `special_product_historical_details_list` entries for a product, only in the
> first situation would I want that product to be returned from the query
>
> p_id | date       | is_available
> ----------------------------
> 1   | 2017-12-13 | true
> 1   | 2017-12-12 | false
> 1   | 2017-12-11 | true
> 1   | 2017-12-10 | true
>
> p_id | date      | is_available
>  -------------------------------
>  1   |2017-12-13 | false
>  1   |2017-12-12 | true
>  1   |2017-12-11 | false
>  1   |2017-12-10 | false
>
> p_id | date      | is_available
>  -------------------------------
>  1   |2017-12-13 | true
>  1   |2017-12-12 | true
>  1   |2017-12-11 | true
>  1   |2017-12-10 | true
>


I would approach this by figuring out the SQL first, and then
translate to SQLAlchemy afterwards.

To match your first condition (most recent row has 'available=true'),
you need a way to find the most recent row for each p_id. Here's one
way (all untested):

SELECT p_id, MAX(date) AS max_date
FROM special_product_historical_details_list
GROUP BY p_id

To find rows where the most recent entry has 'available=true', you
would combine the above subquery with a join back to the
special_product_historical_details_list table again, something like:

SELECT *
FROM special_product_historical_details_list sphdl
INNER JOIN (
    SELECT p_id, MAX(date) AS max_date
    FROM special_product_historical_details_list
    GROUP BY p_id
) lastrows ON (lastrows.p_id = sphdl.p_id AND lastrows.max_date = sphdl.date)
WHERE sphdl.is_available = 'true'

(This assumes that each product has unique date values - you'll get
confusing results if a product ever has more than one row with the
same date)

For your second condition, you want to know if a product has any
"false" entries. (There's no need to include the "except for the most
recent entry" part, because you're going to combine this condition
with the previous one, which already asserts that the most recent
entry is "true"). There are a few ways of writing this (eg. using
EXISTS or ANY with a correlated subquery) but I'm not sure of the
performance characteristics. Here's one way:

SELECT DISTINCT p_id
FROM special_product_historical_details_list
WHERE available = 'false'

The combined query would look something like:

SELECT *
FROM special_product_historical_details_list sphdl
INNER JOIN (
    SELECT p_id, MAX(date) AS max_date
    FROM special_product_historical_details_list
    GROUP BY p_id
) lastrows ON (lastrows.p_id = sphdl.p_id AND lastrows.max_date = sphdl.date)
INNER JOIN (
    SELECT DISTINCT p_id
    FROM special_product_historical_details_list
    WHERE available = 'false'
) havefalse ON sphdl.p_id = havefalse.p_id
WHERE sphdl.is_available = 'true'

> Is there a way I could write this in SqlAlchemy?

You ought to be able to build up the above SQL incrementally using the
Query.subquery() method. See the example at:

http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#using-subqueries

If you wanted to use the EXISTS version of the second condition, you
would need something like:

http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#using-exists

Hope that helps,

Simon

>
> Here are my models:
>
> class SpecialProduct(Product):
>     __tablename__ =  'special_product_list'
>     special_id = db.Column(String(64), nullable=False, index=True,
> primary_key=True)
>     history = relationship("SpecialProductHistoricalDetails",
>                             back_populates='special_product',
>
> foreign_keys='SpecialProductHistoricalDetails.special_id')
>     most_recent_historical_details_id = db.Column(Integer,
>
> ForeignKey('special_product_historical_details_list.id'))
>     most_recent_historical_details_entry =
> relationship("SpecialProductHistoricalDetails",
>                             uselist=False,
> foreign_keys=[most_recent_historical_details_id])
>
> class SpecialProductReviewsHistoricalDetails(db.Model):
>     __tablename__ = 'special_product_historical_details_list'
>     id = db.Column(Integer, primary_key=True)
>     special_id = db.Column(String(64),
> ForeignKey('special_product_list.special_id'), nullable=False)
>     special_product = relationship("SpecialProduct", uselist=False,
> foreign_keys=[special_id])
>     time_updated = db.Column(TIMESTAMP, default=func.now(),
>                              onupdate=func.now(),
> server_default=text("CURRENT_TIMESTAMP"))
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Help writing query - "where most recent is true, but any other is false"

Reply via email to