Re: [sqlalchemy] Transforming SQL to SQLAlchemy

Mike Bayer Mon, 27 Jul 2015 10:37:03 -0700


On 7/27/15 12:19 PM, D.S. Ljungmark wrote:

Thanks!
Going through this of how to go backwards from a query to SQLAlchemy
is the part that I was missing. I'm personally not wedded to the
Postgres method, actually the opposite. But it's also been the one
call that didn't stall massively on larger datasets.

The performance characteristics are pretty strange, as the one that
you suggested at first went up way high, but goes down again when you
disable seqscan..

Here are the explain outputs (ignoring my version, as that is even
slower and actually pages out to disk for sorting due to having too
many rows)


And frankly, I do not have the knowledge I need to understand "why"
the difference between seqscan enabled and disabled is happening.
Guessing that my indexes aren't correctly set up, but as said, I lack
the necessary DBA skills for this. I feel as if I'm in over my head.


Now, since this query is turning out to be a whole lot more annoying
to work on than I'd ever imagined, is there a better way of
representing my data store for this?

Duplicating the latest timestamp in a 1:1 relationship, or something else?

i think your schema is fine, profiling is a whole can of beans and I'drecommend googling PG documentation and blogs on that. However,correlated subqueries are always less than ideal - the query you havethat uses just JOIN and GROUP BY should perform the best. The use caseof "select all the X with a max/min/count/avg/etc. of related Y" is verycommon and the join to subquery w/ GROUP BY approach is the way to doit. Also, SQL is "declarative" but is a leaky abstraction; you alwaysneed to be savvy about things like indexes, table scans, etc. That'sjust how it is.


//D.S.




caramel=# explain analyze SELECT (c).csr_id, (c).not_before,
(c).not_after FROM (SELECT (SELECT c FROM certificate c WHERE
c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1) AS c FROM csr
offset 0) s;

                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on s  (cost=0.00..989627.43 rows=196332 width=32)
(actual time=4.288..6130.111 rows=196332 loops=1)
    ->  Limit  (cost=0.00..987664.11 rows=196332 width=4) (actual
time=4.286..6030.657 rows=196332 loops=1)
          ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..987664.11 rows=196332 width=4) (actual
time=4.284..5968.457 rows=196332 loops=1)
                Heap Fetches: 0
                SubPlan 1
                  ->  Limit  (cost=0.00..4.96 rows=1 width=2029)
(actual time=0.029..0.029 rows=1 loops=196332)
                        ->  Index Scan Backward using
certificate_csr_id_not_after_idx on certificate c  (cost=0.00..24.80
rows=5 width=2029) (actual time=0.028..0.028 rows=1 loops=196332)
                              Index Cond: (csr_id = csr.id)
  Total runtime: 6163.357 ms



caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
from certificate as c
join csr on c.csr_id = csr.id
where c.csr_id = (
     select csr_id from certificate where c.csr_id=csr.id
     order by c.not_after desc limit 1
);

          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=16391.49..395380.31 rows=5 width=20) (actual
time=21552.556..21574.528 rows=1 loops=1)
    Hash Cond: (c.csr_id = csr.id)
    Join Filter: ((SubPlan 1) = c.csr_id)
    Rows Removed by Join Filter: 571550
    ->  Seq Scan on certificate c  (cost=0.00..289383.98 rows=555898
width=20) (actual time=1.029..16866.441 rows=571551 loops=1)
    ->  Hash  (cost=13937.34..13937.34 rows=196332 width=4) (actual
time=88.475..88.475 rows=196332 loops=1)
          Buckets: 32768  Batches: 1  Memory Usage: 6903kB
          ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..13937.34 rows=196332 width=4) (actual time=0.018..42.298
rows=196332 loops=1)
                Heap Fetches: 0
    SubPlan 1
      ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
time=0.006..0.006 rows=1 loops=571551)
            ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
(actual time=0.006..0.006 rows=1 loops=571551)
                  One-Time Filter: (c.csr_id = csr.id)
                  ->  Index Only Scan using certificate_csr_id_idx on
certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
time=0.005..0.005 rows=1 loops=571551)
                        Heap Fetches: 0
  Total runtime: 21574.638 ms
(16 rows)


caramel=# SET enable_seqscan = OFF;
SET
caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
from certificate as c
join csr on c.csr_id = csr.id
where c.csr_id = (
     select csr_id from certificate where c.csr_id=csr.id
     order by c.not_after desc limit 1
);

          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=1600.52..2226996.44 rows=5 width=20) (actual
time=0.026..12796.020 rows=1 loops=1)
    Merge Cond: (csr.id = c.csr_id)
    Join Filter: ((SubPlan 1) = c.csr_id)
    Rows Removed by Join Filter: 571550
    ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..13937.34 rows=196332 width=4) (actual time=0.007..422.941
rows=196332 loops=1)
          Heap Fetches: 0
    ->  Index Scan using certificate_csr_id_idx on certificate c
(cost=0.00..2127137.37 rows=555898 width=20) (actual
time=0.005..8224.643 rows=571551 loops=1)
    SubPlan 1
      ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
time=0.006..0.006 rows=1 loops=571551)
            ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
(actual time=0.006..0.006 rows=1 loops=571551)
                  One-Time Filter: (c.csr_id = csr.id)
                  ->  Index Only Scan using certificate_csr_id_idx on
certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
time=0.005..0.005 rows=1 loops=571551)
                        Heap Fetches: 0
  Total runtime: 12798.066 ms
(14 rows)


caramel=# \d certificate
                                      Table "public.certificate"
    Column   |            Type             |
Modifiers
------------+-----------------------------+----------------------------------------------------------
  id         | integer                     | not null default
nextval('certificate_id_seq'::regclass)
  pem        | text                        | not null
  not_before | timestamp without time zone | not null
  not_after  | timestamp without time zone | not null
  csr_id     | integer                     | not null
Indexes:
     "certificate_pkey" PRIMARY KEY, btree (id)
     "certificate_csr_id_idx" btree (csr_id)
     "certificate_csr_id_not_after_idx" btree (csr_id, not_after)
Foreign-key constraints:
     "certificate_csr_id_fkey" FOREIGN KEY (csr_id) REFERENCES csr(id)

caramel=# \d csr
                                   Table "public.csr"
    Column   |         Type          |                    Modifiers
------------+-----------------------+--------------------------------------------------
  id         | integer               | not null default
nextval('csr_id_seq'::regclass)
  sha256sum  | character(64)         | not null
  pem        | text                  | not null
  orgunit    | character varying(64) |
  commonname | character varying(64) |
  rejected   | boolean               |
Indexes:
     "csr_pkey" PRIMARY KEY, btree (id)
     "csr_sha256sum_key" UNIQUE CONSTRAINT, btree (sha256sum)
     "csr_rejected_idx4" btree (id, rejected)
Referenced by:
     TABLE "accesslog" CONSTRAINT "accesslog_csr_id_fkey" FOREIGN KEY
(csr_id) REFERENCES csr(id)
     TABLE "certificate" CONSTRAINT "certificate_csr_id_fkey" FOREIGN
KEY (csr_id) REFERENCES csr(id)
     TABLE "icky" CONSTRAINT "icky_csr_id_fkey" FOREIGN KEY (csr_id)
REFERENCES csr(id)


On Mon, Jul 27, 2015 at 5:05 PM, Mike Bayer <[email protected]> wrote:


On 7/27/15 3:47 AM, D.S. Ljungmark wrote:

But we're not looking for one row in the Certificate table, what it
does is, for _every_ CSR, take the Certificate metadata that matches
the _oldest_ certificate.


whoops, right I looked at it for too long and didn't see us selecting all of
csr.

The equivalent using a Join would be the following (the below is the
generated code):

SELECT certificate.csr_id AS certificate_csr_id,
certificate.not_before AS certificate_not_before,
certificate.not_after AS certificate_not_after
FROM certificate JOIN (SELECT certificate.csr_id AS csr_id,
max(certificate.not_after) AS not_after
FROM certificate GROUP BY certificate.csr_id) AS latest ON
certificate.csr_id = latest.csr_id AND certificate.not_after =
latest.not_after

And while I'm pretty certain the above query isn't in any way the
optimal one,  I'm still a bit out of my depth on the SQL side of
things.

I bet if you run EXPLAIN ANALYZE on that query it will perform better (or
definitely not worse) than the correlated subquery version.

Here's another correlated subquery version that uses standard SQL, just move
the correlated subq you have to the WHERE clause so that it only need refer
to one column:

select c.csr_id, c.not_before, c.not_after from certificate as c
join csr on c.csr_id = csr.id
where c.csr_id = (
     select csr_id from certificate where c.csr_id=csr.id
     order by c.not_after desc limit 1
)


now suppose that no, you really want to use the Postgresql-specific syntax.
That's fine.   There shoudn't be much need to convert it to a SQL expression
construct - you can just run session.execute("your query").     If you need
ORM objects back from it, run
session.query(Certificate).from_statement("your query").


Otherwise, to begin supporting PG's syntaxes, you'd need to use the compiler
extension to build up constructs that do all of these special Postgresql
syntaxes (http://docs.sqlalchemy.org/en/rel_1_0/core/compiler.html).

PG's syntaxes in this area are geared towards turning SQL itself into an
"object oriented" programming language (see
http://ledgersmbdev.blogspot.com/2012/08/intro-to-postgresql-as-object.html
for a long series on this philosophy) which is *potentially* awkward in
conjunction with a tool like SQLAlchemy, which is trying to build up similar
kinds of patterns on the Python side instead.   This is why SQLAlchemy
doesn't get into this area of PG syntax; it would require an entirely new
set of constructs that work smoothly and consistently with PG's way of
thinking here, and would first off be huge; it really should be a
third-party extension that attempts to support all, or a large portion, of
PG's "object-relational" system.  But after all that,  it may very well turn
out that using another programming language like Python on top of all of
that starts to look redundant (or maybe it would be amazing, I don't know.
It's just way outside the scope of "SQL mapping" because PG is really trying
to invent their own language in this area). If your application is tailored
towards this architecture that might suggest that it prefers Postgresql's
syntaxes directly, and you only need an object marshalling layer in your
app.

SQLAlchemy has always stressed that literal SQL is always an option.
Historically it has meant that you lose flexibility with ORM features like
eager loads and such, so I've added
https://bitbucket.org/zzzeek/sqlalchemy/issues/3501/ for 1.1, which takes
advantage of some improved result-row-mapping features in 1.0 so that we can
finally map any raw SQL of any kind directly to full object graphs without
losing any flexibility.



On Sun, Jul 26, 2015 at 8:33 PM, Mike Bayer <[email protected]>
wrote:


On 7/26/15 12:56 PM, Dennis Ljungmark wrote:

Hello, all.

     I am trying to reshape the (somewhat tricky) SQL query below into
something SQLAlchemy can generate, and I just hit a wall:

This is the query:
SELECT (c).csr_id, (c).not_before, (c).not_after FROM (SELECT (SELECT c
FROM
certificate c WHERE c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1) AS
c
FROM csr offset 0) s;

And, I just can't seem to get SQLAlchemy to emit a correlated scalar
subquery like this.

correlated scalar subqueries are illustrated in the Core tutorial at
http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#scalar-selects.
I'm not sure why inklesspen's example isn't making use of as_scalar() or
stmt.label(), which would be required.     Query supports the same use if
you are working with ORM queries.

However, the statement "SELECT c FROM certificate c" and the "(c).csr_id"
syntax appears to be using a very unusual postgresql-specific syntax
which
is rolling the whole inner row into a single-column tuple and then
referencing it that way, which is odd because the row itself is already a
tuple, not sure what forcing it into a tuple that contains exactly one
tuple
accomplishes here.

While we can certainly use tuple_() and probably create some extra custom
constructs around it to support this syntax, I see nothing in this
specific
statement to justify the use of these syntaxes, nor the need to produce
correlated scalar subqueries when we just are looking for one row from
the
"certificate" table.  I don't at the moment see why you can't use this
statement, which is easier to read, platform agnostic, and way more
efficient:

SELECT c.csr_id, c.not_before, c.not_after FROM certificate AS c JOIN csr
ON
csr.id = c.csr_id ORDER BY c.not_after DESC LIMIT 1










There ought to be something, but for the life of me I can't seem to make
it
happen.

InklessPen tried to help me on IRC, and we got something, but it's still
not
quite the same,
   https://gist.github.com/inklesspen/49e69e1f33f3852d348a

A reduced model is below,


class CSR(Base):
      certificates = _orm.relationship("Certificate", backref="csr")

class Certificate(Base):
      not_before = _sa.Column(_sa.DateTime, nullable=False)
      not_after = _sa.Column(_sa.DateTime, nullable=False)
      csr_id = _fkcolumn(CSR.id, nullable=False)


Regards,
    D.S.

--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the
Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the
Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Transforming SQL to SQLAlchemy

Reply via email to