Re: [HACKERS] Can simplify 'limit 1' with slow function?
Fascinating. On Fri, Jul 04, 2014 at 10:47:06AM +0800, gotoschool6g wrote: > slow query(8531 ms): > SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 > 40.12211338311868)')) FROM road order by id LIMIT 1; > > explain output: > "Limit (cost=4653.48..4653.48 rows=1 width=3612)" > " -> Sort (cost=4653.48..4683.06 rows=11832 width=3612)" > "Sort Key: id" > "-> Seq Scan on road (cost=0.00..4594.32 rows=11832 width=3612)" > > fast query(16ms): > select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 > 40.12211338311868)')) from (SELECT shape FROM road order by id LIMIT 1) a > > explain output: > "Subquery Scan on a (cost=1695.48..1695.74 rows=1 width=3608)" > " -> Limit (cost=1695.48..1695.48 rows=1 width=3612)" > "-> Sort (cost=1695.48..1725.06 rows=11832 width=3612)" > " Sort Key: road.id" > " -> Seq Scan on road (cost=0.00..1636.32 rows=11832 > width=3612)" So Postgres knows perfectly well that it's expensive, it just doesn't appear to understand it has the option of moving the calculation above the limit. In this case though, it seems an index on road(id) would make it instant in any case. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Can simplify 'limit 1' with slow function?
slow query(8531 ms): SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id LIMIT 1; explain output: "Limit (cost=4653.48..4653.48 rows=1 width=3612)" " -> Sort (cost=4653.48..4683.06 rows=11832 width=3612)" "Sort Key: id" "-> Seq Scan on road (cost=0.00..4594.32 rows=11832 width=3612)" fast query(16ms): select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape FROM road order by id LIMIT 1) a explain output: "Subquery Scan on a (cost=1695.48..1695.74 rows=1 width=3608)" " -> Limit (cost=1695.48..1695.48 rows=1 width=3612)" "-> Sort (cost=1695.48..1725.06 rows=11832 width=3612)" " Sort Key: road.id" " -> Seq Scan on road (cost=0.00..1636.32 rows=11832 width=3612)" CREATE TABLE road ( shape geometry, id integer ) WITH ( OIDS=FALSE ); There are redundant call when sorting? > On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout > wrote: > > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: > >> The simplified scene: > >> select slowfunction(s) from a order by b limit 1; > >> is slow than > >> select slowfunction(s) from (select s from a order by b limit 1) as z; > >> if there are many records in table 'a'. > >> > >> > >> The real scene. Function ST_Distance_Sphere is slow, the query: > >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road > >> order by c limit 1; > >> is slow than: > >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s > >> from road order by c limit 1) as a; > >> There are about 7000 records in 'road'. > > > > I think to help here I think we need the EXPLAIN ANALYSE output for > > both queries. > > Well, I think the problem is a well understood one: there is no > guarantee that functions-in-select-list are called exactly once per > output row. This is documented -- for example see here: > http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. > > In short, if you want very precise control of function evaluation use > a subquery, or, if you're really paranoid, a CTE. I'm probably dense, but I'm not sure I understand. Or it is that the slowfunction() is called prior to the sort? That seems insane. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
Martijn van Oosterhout writes: > On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote: >> It's not so much the limit as that the sort has to happen before the >> limit, and yes, evaluation of the targetlist happens before the sort. > I guess I assumed the column c was indexable, and it that case I > beleive the slowfunction() would indeed only be called once. There are cases where we can avoid an explicit sort step by relying on some earlier phase of the processing pipeline to generate the rows in the right order to begin with. Evidently this wasn't one of them though :-(. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote: > David G Johnston writes: > > Martijn van Oosterhout wrote > >> I'm probably dense, but I'm not sure I understand. Or it is that the > >> slowfunction() is called prior to the sort? That seems insane. > > > The basic reality is that limit applies to the final set of rows that could > > be output. > > It's not so much the limit as that the sort has to happen before the > limit, and yes, evaluation of the targetlist happens before the sort. I guess I assumed the column c was indexable, and it that case I beleive the slowfunction() would indeed only be called once. > This is fundamental to the SQL conceptual model; remember that SQL92 had > "SELECT slowfunction(), ... ORDER BY 1", which certainly requires the > function to be evaluated before the sort happens. And there's nothing in > the conceptual model suggesting that different targetlist entries should > be evaluated at different times, so just ordering by something other than > the slowfunction() entry doesn't get you out of that. > > I'm not sure how much of this there is chapter and verse for in the > SQL standard, but ISTM the stage sequencing we lay out in our SELECT > reference page is pretty much forced by the standard. In the conceptual model the limit must happen after the select. But as an optimisation moving the evaluation above the limit node (when possible) should always be a win. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Can simplify 'limit 1' with slow function?
David G Johnston writes: > Martijn van Oosterhout wrote >> I'm probably dense, but I'm not sure I understand. Or it is that the >> slowfunction() is called prior to the sort? That seems insane. > The basic reality is that limit applies to the final set of rows that could > be output. It's not so much the limit as that the sort has to happen before the limit, and yes, evaluation of the targetlist happens before the sort. This is fundamental to the SQL conceptual model; remember that SQL92 had "SELECT slowfunction(), ... ORDER BY 1", which certainly requires the function to be evaluated before the sort happens. And there's nothing in the conceptual model suggesting that different targetlist entries should be evaluated at different times, so just ordering by something other than the slowfunction() entry doesn't get you out of that. I'm not sure how much of this there is chapter and verse for in the SQL standard, but ISTM the stage sequencing we lay out in our SELECT reference page is pretty much forced by the standard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
Martijn van Oosterhout wrote > On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote: >> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout >> < > kleptog@ > > wrote: >> > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: >> >> The simplified scene: >> >> select slowfunction(s) from a order by b limit 1; >> >> is slow than >> >> select slowfunction(s) from (select s from a order by b limit 1) as z; >> >> if there are many records in table 'a'. >> >> >> >> >> >> The real scene. Function ST_Distance_Sphere is slow, the query: >> >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road >> order by c limit 1; >> >> is slow than: >> >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from >> (SELECT s from road order by c limit 1) as a; >> >> There are about 7000 records in 'road'. >> > >> > I think to help here I think we need the EXPLAIN ANALYSE output for >> > both queries. >> >> Well, I think the problem is a well understood one: there is no >> guarantee that functions-in-select-list are called exactly once per >> output row. This is documented -- for example see here: >> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. >> In short, if you want very precise control of function evaluation use >> a subquery, or, if you're really paranoid, a CTE. > > I'm probably dense, but I'm not sure I understand. Or it is that the > slowfunction() is called prior to the sort? That seems insane. The basic reality is that limit applies to the final set of rows that could be output. Since stuff like group by and distinct require knowledge of the exact values of every output column all expressions must necessarily be evaluated before limit. If you want to pick just 10 rows and then process them you need a subquery. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810297.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote: > On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout > wrote: > > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: > >> The simplified scene: > >> select slowfunction(s) from a order by b limit 1; > >> is slow than > >> select slowfunction(s) from (select s from a order by b limit 1) as z; > >> if there are many records in table 'a'. > >> > >> > >> The real scene. Function ST_Distance_Sphere is slow, the query: > >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road > >> order by c limit 1; > >> is slow than: > >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s > >> from road order by c limit 1) as a; > >> There are about 7000 records in 'road'. > > > > I think to help here I think we need the EXPLAIN ANALYSE output for > > both queries. > > Well, I think the problem is a well understood one: there is no > guarantee that functions-in-select-list are called exactly once per > output row. This is documented -- for example see here: > http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. > In short, if you want very precise control of function evaluation use > a subquery, or, if you're really paranoid, a CTE. I'm probably dense, but I'm not sure I understand. Or it is that the slowfunction() is called prior to the sort? That seems insane. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Can simplify 'limit 1' with slow function?
On Tue, Jul 1, 2014 at 3:06 PM, David G Johnston wrote: > Merlin Moncure-2 wrote >> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout >> < > >> kleptog@ > >> > wrote: >>> On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: The simplified scene: select slowfunction(s) from a order by b limit 1; is slow than select slowfunction(s) from (select s from a order by b limit 1) as z; if there are many records in table 'a'. The real scene. Function ST_Distance_Sphere is slow, the query: SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1; is slow than: select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a; There are about 7000 records in 'road'. >>> >>> I think to help here I think we need the EXPLAIN ANALYSE output for >>> both queries. >> >> Well, I think the problem is a well understood one: there is no >> guarantee that functions-in-select-list are called exactly once per >> output row. This is documented -- for example see here: >> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. >> In short, if you want very precise control of function evaluation use >> a subquery, or, if you're really paranoid, a CTE. >> >> merlin > > I would have to disagree on the "this is documented" comment - the linked > section on advisory locks does not constitute documentation of the fact that > limit can be applied after expressions in the select-list are evaluated. > > http://www.postgresql.org/docs/9.3/static/sql-select.html > > In the select command documentation item 5 covers select-list evaluation > while item 9 covers limit thus implying what we are saying - though keep in > mind each select statement gets processed independently and possibly in a > correlated fashion (i.e. potentially multiple times). Sure, although I did not claim that..the select documentation *does* cover this behavior but I find the syntax driven doc pages to be fairly arcane and unhelpful -- they don't say (for the most part) "avoid this" or "do that". I pointed out this particular section because it proved an example that matched the OP's problem case. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
Merlin Moncure-2 wrote > On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout > < > kleptog@ > > wrote: >> On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: >>> The simplified scene: >>> select slowfunction(s) from a order by b limit 1; >>> is slow than >>> select slowfunction(s) from (select s from a order by b limit 1) as z; >>> if there are many records in table 'a'. >>> >>> >>> The real scene. Function ST_Distance_Sphere is slow, the query: >>> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road >>> order by c limit 1; >>> is slow than: >>> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT >>> s from road order by c limit 1) as a; >>> There are about 7000 records in 'road'. >> >> I think to help here I think we need the EXPLAIN ANALYSE output for >> both queries. > > Well, I think the problem is a well understood one: there is no > guarantee that functions-in-select-list are called exactly once per > output row. This is documented -- for example see here: > http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. > In short, if you want very precise control of function evaluation use > a subquery, or, if you're really paranoid, a CTE. > > merlin I would have to disagree on the "this is documented" comment - the linked section on advisory locks does not constitute documentation of the fact that limit can be applied after expressions in the select-list are evaluated. http://www.postgresql.org/docs/9.3/static/sql-select.html In the select command documentation item 5 covers select-list evaluation while item 9 covers limit thus implying what we are saying - though keep in mind each select statement gets processed independently and possibly in a correlated fashion (i.e. potentially multiple times). David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810061.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout wrote: > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: >> The simplified scene: >> select slowfunction(s) from a order by b limit 1; >> is slow than >> select slowfunction(s) from (select s from a order by b limit 1) as z; >> if there are many records in table 'a'. >> >> >> The real scene. Function ST_Distance_Sphere is slow, the query: >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order >> by c limit 1; >> is slow than: >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s >> from road order by c limit 1) as a; >> There are about 7000 records in 'road'. > > I think to help here I think we need the EXPLAIN ANALYSE output for > both queries. Well, I think the problem is a well understood one: there is no guarantee that functions-in-select-list are called exactly once per output row. This is documented -- for example see here: http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. In short, if you want very precise control of function evaluation use a subquery, or, if you're really paranoid, a CTE. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: > The simplified scene: > select slowfunction(s) from a order by b limit 1; > is slow than > select slowfunction(s) from (select s from a order by b limit 1) as z; > if there are many records in table 'a'. > > > The real scene. Function ST_Distance_Sphere is slow, the query: > SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order > by c limit 1; > is slow than: > select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s > from road order by c limit 1) as a; > There are about 7000 records in 'road'. I think to help here I think we need the EXPLAIN ANALYSE output for both queries. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature