Re: Query performance in 9.6.24 vs 14.10
Thanks for the update. On Mon, Jan 29, 2024, 16:53 Ron Johnson wrote: > According to my tests, sometimes JIT is a little faster, and sometimes > it's a little slower. Mostly within the realm of statistical noise > (especially with each query having a sample size of only 13, on a VM that > lives on a probably-busy host). > > On Mon, Jan 29, 2024 at 9:18 AM Ron Johnson > wrote: > >> Yes, jit=on. >> >> I'll test them with jit=off, to see the difference. (The application is >> 3rd party, so will change it at the system level.) >> >> On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe >> wrote: >> >>> Out of curiosity, is the pg14 running with the default jit=on setting? >>> >>> This is obviously entirely due to the nature of the particular queries >>> themselves, but we found that for our workloads that pg versions >>> greater than 11 were exacting a huge cost due to the jit compiler. Once we >>> explicitly turned jit=off we started to see improvements. >>> >>
Re: Query performance in 9.6.24 vs 14.10
According to my tests, sometimes JIT is a little faster, and sometimes it's a little slower. Mostly within the realm of statistical noise (especially with each query having a sample size of only 13, on a VM that lives on a probably-busy host). On Mon, Jan 29, 2024 at 9:18 AM Ron Johnson wrote: > Yes, jit=on. > > I'll test them with jit=off, to see the difference. (The application is > 3rd party, so will change it at the system level.) > > On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe > wrote: > >> Out of curiosity, is the pg14 running with the default jit=on setting? >> >> This is obviously entirely due to the nature of the particular queries >> themselves, but we found that for our workloads that pg versions >> greater than 11 were exacting a huge cost due to the jit compiler. Once we >> explicitly turned jit=off we started to see improvements. >> >
Re: Query performance in 9.6.24 vs 14.10
Yes, jit=on. I'll test them with jit=off, to see the difference. (The application is 3rd party, so will change it at the system level.) On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe wrote: > Out of curiosity, is the pg14 running with the default jit=on setting? > > This is obviously entirely due to the nature of the particular queries > themselves, but we found that for our workloads that pg versions > greater than 11 were exacting a huge cost due to the jit compiler. Once we > explicitly turned jit=off we started to see improvements. >
Re: Query performance in 9.6.24 vs 14.10
Out of curiosity, is the pg14 running with the default jit=on setting? This is obviously entirely due to the nature of the particular queries themselves, but we found that for our workloads that pg versions greater than 11 were exacting a huge cost due to the jit compiler. Once we explicitly turned jit=off we started to see improvements. On Mon, 29 Jan 2024 at 07:55, Ron Johnson wrote: > On Sun, Jan 28, 2024 at 10:44 PM David Rowley > wrote: > >> On Mon, 29 Jan 2024 at 07:37, Ron Johnson >> wrote: >> >>> 08 9.6.24 1,142.164 1,160.801 1,103.716 1,249.852 1,191.081 >>> 14.10 159.354 155.111 155.111 162.797 158.157 86.72% >>> >> >> Your speedup per cent calculation undersells PG14 by quite a bit. I'd >> call that an increase of ~639% rather than 86.72%. >> >> I think you've done "1 - sum( <14.10 numbers> ) / sum( <9.6.24 numbers>)" >> whereas I think you should have done "sum( <9.6.24 numbers>) / sum( <14.10 >> numbers> ) - 1" >> >> Nonetheless, thanks for testing this out. I assume this is just a report >> giving good feedback about progress in this area...? >> > > The spreadsheet function, using the Median cells, is (PG9.6 - PG14) / > PG9.6). That's essentially the same as what you wrote. > > 158.157 / 1191.081 = 0.13278 > > 1191.081 / 158.157 = 7.53, so 9.6.24 on that query is 7.53x slower. > >
Re: Query performance in 9.6.24 vs 14.10
On Sun, Jan 28, 2024 at 10:44 PM David Rowley wrote: > On Mon, 29 Jan 2024 at 07:37, Ron Johnson wrote: > >> 08 9.6.24 1,142.164 1,160.801 1,103.716 1,249.852 1,191.081 >> 14.10 159.354 155.111 155.111 162.797 158.157 86.72% >> > > Your speedup per cent calculation undersells PG14 by quite a bit. I'd > call that an increase of ~639% rather than 86.72%. > > I think you've done "1 - sum( <14.10 numbers> ) / sum( <9.6.24 numbers>)" > whereas I think you should have done "sum( <9.6.24 numbers>) / sum( <14.10 > numbers> ) - 1" > > Nonetheless, thanks for testing this out. I assume this is just a report > giving good feedback about progress in this area...? > The spreadsheet function, using the Median cells, is (PG9.6 - PG14) / PG9.6). That's essentially the same as what you wrote. 158.157 / 1191.081 = 0.13278 1191.081 / 158.157 = 7.53, so 9.6.24 on that query is 7.53x slower.
Re: Query performance in 9.6.24 vs 14.10
On Mon, 29 Jan 2024 at 07:37, Ron Johnson wrote: > 08 9.6.24 1,142.164 1,160.801 1,103.716 1,249.852 1,191.081 > 14.10 159.354 155.111 155.111 162.797 158.157 86.72% > Your speedup per cent calculation undersells PG14 by quite a bit. I'd call that an increase of ~639% rather than 86.72%. I think you've done "1 - sum( <14.10 numbers> ) / sum( <9.6.24 numbers>)" whereas I think you should have done "sum( <9.6.24 numbers>) / sum( <14.10 numbers> ) - 1" Nonetheless, thanks for testing this out. I assume this is just a report giving good feedback about progress in this area...? David
Query performance in 9.6.24 vs 14.10
(I don't know how this will look in text mode. Hopefully it will be comprehensible in the archives.) This is the summary of EXPLAIN (ANALYZE) on eight frequently-run complex queries from our application, extracted from the Postgresql log because either the BIND or SELECT takes longer than 3000 ms. I ran them each 13 times in succession, on otherwise-idle VMs with exact specifications. The 9.6 server runs RHEL6, and the 14 server runs RHEL8. The Planning costs are so high, I think, because the tables are partitioned using Inheritance. (This will NOT be changed, yet.) Naturally, Your Mileage *Will* Vary. Qry Num PG Version FIRST LAST MIN MAX MEDIAN Speed-up Pct *Planning* 01 9.6.24 17,922.488 18,160.742 16,695.140 18,580.261 17,741.328 14.10 13,176.262 13,483.961 12,934.866 13,689.540 13,483.961 24.00% 02 9.6.24 3,673.805 4,141.206 3,470.742 4,141.206 3,673.805 14.10 1,777.958 1,700.264 1,675.137 1,777.958 1,718.571 53.22% 03 9.6.24 239.675 268.971 225.361 348.565 271.909 14.10 218.283 220.856 215.588 225.895 221.485 18.54% 04 9.6.24 4,394.479 4,111.673 3,632.297 4,394.479 4,006.170 14.1 1,960.575 1,937.504 1,928.816 2,094.365 1,997.432 50.14% 05 9.6.24 3,674.314 3,876.537 3,192.784 4,105.541 3,789.318 14.10 1,790.975 1,753.901 1,677.151 1,846.911 1,760.372 53.54% 06 9.6.24 3,431.269 4,026.270 3,431.269 4,026.270 3,702.654 14.10 1,801.959 1,876.325 1,653.144 1,876.325 1,725.933 53.39% 07 9.6.24 4,273.917 4,268.979 3,888.995 4,553.267 4,157.208 14.10 1,914.381 2,006.053 1,914.381 2,042.719 1,980.043 52.37% 08 9.6.24 4,403.466 4,338.042 4,062.099 4,699.233 4,360.336 14.10 1,961.035 1,866.127 1,866.127 2,037.441 1,956.783 55.12% *Execution* 01 9.6.24 2,245.996 2,071.423 1,774.261 2,245.996 1,984.238 14.10 540.961 469.541 462.658 540.961 469.660 76.33% 02 9.6.24 36.875 37.047 33.718 40.299 37.513 14.10 31.910 29.618 28.387 31.910 30.284 19.27% 03 9.6.24 587.939 553.042 538.726 594.222 556.793 14.10 172.890 159.617 154.603 172.890 162.924 70.74% 04 9.6.24 1,068.230 1,081.044 1,022.087 1,100.884 1,068.230 14.10 157.058 161.618 157.058 167.006 160.243 85.00% 05 9.6.24 15.608 17.949 15.171 19.329 17.211 14.10 9.538 9.837 8.920 10.917 9.572 44.38% 06 9.6.24 54.796 53.216 53.216 67.378 57.401 14.10 60.135 50.899 47.108 60.135 48.267 15.91% 07 9.6.24 1,041.981 1,104.688 1,031.757 1,134.455 1,090.329 14.10 158.666 159.348 155.695 162.539 159.972 85.33% 08 9.6.24 1,142.164 1,160.801 1,103.716 1,249.852 1,191.081 14.10 159.354 155.111 155.111 162.797 158.157 86.72%
RE: [EXT] Re: Query performance going from Oracle to Postgres
Thanks to all who replied on this issue. I overlooked the difference in the execution plan between Oracle and Postgres. Oracle in fact does an index range scan only with no table lookup. When I changed the query to be a count(*) Postgres went from 332 logical reads to 19. Oracle did 20. Oracle did 20 regardless if it was count(*) or count(non-indexed-column). Regards Steve -Original Message- From: Peter Geoghegan Sent: Thursday, September 7, 2023 8:22 PM To: David Rowley Cc: Dirschel, Steve ; pgsql-general@lists.postgresql.org; Wong, Kam Fook (TR Technology) Subject: [EXT] Re: Query performance going from Oracle to Postgres External Email: Use caution with links and attachments. On Thu, Sep 7, 2023 at 3:48 AM David Rowley wrote: > On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan wrote: > > It seems likely that the problem here is that some of the predicates > > appear as so-called "Filter:" conditions, as opposed to true index > > quals. > > hmm, if that were true we'd see "Rows Removed by Filter" in the > explain analyze. That's why I hedged, with "seems likely". The problem with using filter conditions rather than true index quals isn't limited to the problem of extra heap accesses. It happened to be convenient to make my point that way, but that isn't particularly fundamental here. I deliberately chose to make my example involve an index-only scan (that doesn't require any heap accesses) for this reason. > I think all that's going on is that each tuple is on a different page > and the heap accesses are just causing many buffers to be accessed. This index is an absolute monstrosity. I find it far easier to believe that the real explanation is the one that Steve intuited: that there is an issue with the way that the physical data structures (which are more or less comparable in both systems) are accessed in Postgres. The index in question ("history_event_display_timesta_prism_guid_display_timestamp_idx1") has certain columns that are omitted from the query. These columns nevertheless appear between other columns that the query filters on. The leading two columns ("prism_guid" and "display_timestamp") are made into index quals by Postgres, but the next index column after those two ("unique_lookup_key") is omitted by the query, and so isn't an index qual. In fact *four* columns are omitted after that one. But, the very-low-order index column "product_sid" *does* appear in the query, and so also appears as Postgres index quals. There is every chance that the true underlying explanation is that Oracle is able to skip over significant parts of the index structure dynamically. In fact I'm practically certain that that's the case, since the "product_sid" column appears as an "access predicate", rather than as a "filter predicate". These terms are explained here: https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WK7WLP60$ https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WCXw7ubs$ How could "product_sid" be used as an "access predicate" given the omitted index columns? It seems very likely that parts of the index can be skipped in Oracle, but not in Postgres -- at least not yet. Like Markus Winand, I think that it's a real problem that EXPLAIN doesn't yet expose the difference between access predicates and filter predicates. Some of the index quals shown by EXPLAIN are marked SK_BT_REQFWD/SK_BT_REQBKWD by nbtree (meaning that they're what Oracle calls "access predicates"), while other are not (meaning that they're what Oracle calls "filter predicates"). That issue makes it far harder to spot these sorts of problems. > It seems to me that an IOS would likely fix that problem which is why > I suggested count(*) instead of > count( > ) Probably, but why should we have to use an index-only scan? And what about my example, that already used one, and still showed a big disparity where there is no inherently reason why there had to be? My example didn't involve jumping to another part of the index because the problem seems to be more general than that. -- Peter Geoghegan
Re: Query performance going from Oracle to Postgres
On Thu, Sep 7, 2023 at 3:48 AM David Rowley wrote: > On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan wrote: > > It seems likely that the problem here is that some of the predicates > > appear as so-called "Filter:" conditions, as opposed to true index > > quals. > > hmm, if that were true we'd see "Rows Removed by Filter" in the > explain analyze. That's why I hedged, with "seems likely". The problem with using filter conditions rather than true index quals isn't limited to the problem of extra heap accesses. It happened to be convenient to make my point that way, but that isn't particularly fundamental here. I deliberately chose to make my example involve an index-only scan (that doesn't require any heap accesses) for this reason. > I think all that's going on is that each tuple is on a different page > and the heap accesses are just causing many buffers to be accessed. This index is an absolute monstrosity. I find it far easier to believe that the real explanation is the one that Steve intuited: that there is an issue with the way that the physical data structures (which are more or less comparable in both systems) are accessed in Postgres. The index in question ("history_event_display_timesta_prism_guid_display_timestamp_idx1") has certain columns that are omitted from the query. These columns nevertheless appear between other columns that the query filters on. The leading two columns ("prism_guid" and "display_timestamp") are made into index quals by Postgres, but the next index column after those two ("unique_lookup_key") is omitted by the query, and so isn't an index qual. In fact *four* columns are omitted after that one. But, the very-low-order index column "product_sid" *does* appear in the query, and so also appears as Postgres index quals. There is every chance that the true underlying explanation is that Oracle is able to skip over significant parts of the index structure dynamically. In fact I'm practically certain that that's the case, since the "product_sid" column appears as an "access predicate", rather than as a "filter predicate". These terms are explained here: https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates How could "product_sid" be used as an "access predicate" given the omitted index columns? It seems very likely that parts of the index can be skipped in Oracle, but not in Postgres -- at least not yet. Like Markus Winand, I think that it's a real problem that EXPLAIN doesn't yet expose the difference between access predicates and filter predicates. Some of the index quals shown by EXPLAIN are marked SK_BT_REQFWD/SK_BT_REQBKWD by nbtree (meaning that they're what Oracle calls "access predicates"), while other are not (meaning that they're what Oracle calls "filter predicates"). That issue makes it far harder to spot these sorts of problems. > It seems to me that an IOS would likely fix that problem which is why I > suggested count(*) instead of > count() Probably, but why should we have to use an index-only scan? And what about my example, that already used one, and still showed a big disparity where there is no inherently reason why there had to be? My example didn't involve jumping to another part of the index because the problem seems to be more general than that. -- Peter Geoghegan
Re: Query performance going from Oracle to Postgres
On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan wrote: > It seems likely that the problem here is that some of the predicates > appear as so-called "Filter:" conditions, as opposed to true index > quals. hmm, if that were true we'd see "Rows Removed by Filter" in the explain analyze. I think all that's going on is that each tuple is on a different page and the heap accesses are just causing many buffers to be accessed. It seems to me that an IOS would likely fix that problem which is why I suggested count(*) instead of count() David
Re: Query performance going from Oracle to Postgres
On Wed, Sep 6, 2023 at 1:07 PM Dirschel, Steve wrote: > Oracle will find the same 332 rows using the same index but in Oracle it only > does 20 logical reads. I thought maybe the index was fragmented so I > reindexed that index: It seems likely that the problem here is that some of the predicates appear as so-called "Filter:" conditions, as opposed to true index quals. For reasons that aren't particularly fundamental, Postgres B-Trees cannot push down != (or <>) to the index level. Strangely enough, this is possible with IS NOT NULL. I'm working to improve things in this area. That difference is easy to see in the following example. The first query does far fewer buffer accesses than the. second query, despite being very similar: regression=# explain (analyze, buffers) select * from multi_test where a = 1 and b is not null; QUERY PLAN - Index Only Scan using multi_test_idx on multi_test (cost=0.29..50.10 rows=2191 width=8) (actual time=0.095..0.100 rows=14 loops=1) Index Cond: ((a = 1) AND (b IS NOT NULL)) Heap Fetches: 0 Buffers: shared hit=3 Planning Time: 0.056 ms Execution Time: 0.145 ms (6 rows) regression=# explain (analyze, buffers) select * from multi_test where a = 1 and b != 42; QUERY PLAN -- Index Only Scan using multi_test_idx on multi_test (cost=0.29..222.57 rows=2191 width=8) (actual time=0.087..2.982 rows=14 loops=1) Index Cond: (a = 1) Filter: (b <> 42) Rows Removed by Filter: 1 Heap Fetches: 0 Buffers: shared hit=11 Planning Time: 0.076 ms Execution Time: 3.204 ms (8 rows) (There are lots of index tuples matching (a, b) = "(1, NULL)" here, you see.) -- Peter Geoghegan
Re: Query performance going from Oracle to Postgres
On Thu, 7 Sept 2023 at 11:14, Dirschel, Steve wrote: > select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from > hist28.history_event_display_timestamp_20230301 historyeve0_ where > historyeve0_.IS_DELETED=0 > history_event_sid | character varying(32) | | not > null | Do you get an Index Only Scan if you use count(*) instead of count(historyeve0_.HISTORY_EVENT_SID)? It seems that column is the only one missing from the index and that might be the only thing that's stopping PostgreSQL using an Index Only Scan. Perhaps Oracle realises that the column is NOT NULL so can do count(*) instead of counting the non-nulls from that column. That may allow it to Index Only Scan? I'm not versed in reading Oracle plans so don't know if it's possible to tell. David
Re: Query performance going from Oracle to Postgres
On Wed, 2023-09-06 at 20:06 +, Dirschel, Steve wrote: > We are in the process of converting from Oracle to Postgres and I have a > query that is using > the same index in Postgres as is used in Oracle but in Postgres the query > does 16x more > buffer/logical reads. I’d like to understand why. > > explain (analyze, buffers) > select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ > from hist28.history_event_display_timestamp_20230301 historyeve0_ where > historyeve0_.IS_DELETED=0 > and historyeve0_.PRISM_GUID='i0accd6a2018405f095ee669dc5b4' > and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW' > and (historyeve0_.EVENT_TYPE not in > ('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryE > vent')) > and (historyeve0_.PRODUCT_VIEW in > ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA') > or historyeve0_.PRODUCT_VIEW is null) > and historyeve0_.CLIENT_ID='WLNCE_VNJXL7' > and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and > '2023-03-01 23:59:59.999'); > > Aggregate (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 > rows=1 loops=1) > Buffers:shared hit=341 > -> Index Scan using > history_event_display_timesta_prism_guid_display_timestamp_idx1 on > history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..56.64 > rows=1 width=33) (actual > time=0.034..0.897 rows=332 loops=1) > Index Cond: (((prism_guid)::text = > 'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= > '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= > '2023- > 03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = > 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND > ((client_id)::text = 'WLNCE_VNJXL7'::text)) > Filter: (((event_type)::text <> ALL > ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::t > ext[])) AND (((product_view)::text = ANY > ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) > OR (product_view IS NULL))) > Buffers: shared hit=341 > Planning: > Buffers: shared hit=6 > Planning Time: 0.266 ms > Execution Time: 0.965 ms > (10 rows) > > *** 341 logical reads to find 332 rows. > > Plan hash value: 3361538278 > > -- > | Id | Operation | Name | Starts | E-Rows | > A-Rows | A-Time | Buffers | > -- > | 0 | SELECT STATEMENT | | 1 | | > 1 |00:00:00.01 | 20 | > | 1 | SORT AGGREGATE | | 1 | 1 | > 1 |00:00:00.01 | 20 | > |* 2 | FILTER | | 1 | | > 332 |00:00:00.01 | 20 | > | 3 | PARTITION RANGE SINGLE| | 1 | 1 | > 332 |00:00:00.01 | 20 | > |* 4 | INDEX RANGE SCAN |HISTORY_EVENT_IDX02 | 1 | 1 | > 332 |00:00:00.01 | 20 | > -- > > Predicate Information (identified by operation id): > --- > > 2 - > filter(TO_TIMESTAMP(:SYS_B_16,:SYS_B_17)<=TO_TIMESTAMP(:SYS_B_18,:SYS_B_19)) > 4 - access("HISTORYEVE0_"."PRISM_GUID"=:SYS_B_01 AND > > "HISTORYEVE0_"."DISPLAY_TIMESTAMP">=TO_TIMESTAMP(:SYS_B_16,:SYS_B_17) AND > "HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02 AND > "HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND > "HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND > "HISTORYEVE0_"."DISPLAY_TIMESTAMP"<=TO_TIMESTAMP(:SYS_B_1 > 8,:SYS_B_19)) > filter(("HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND > "HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02 > AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_03 AND > "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_04 AND > "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_05 AND > "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_06 AND > "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_07 AND > "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_08 AND > "HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND > (INTERNAL_FUNCTION("HISTORYEVE0_"."PRODUCT_VIEW") OR > "HISTORYEVE0_"."PRODUCT_VIEW" IS NULL))) > > It finds the same 332 rows but it does only 20 logical reads. In PostgreSQL, the 332 matching rows seem to me stored mostly on different pages, while they are better clustered in your Oracle table. If it is really very important for you,
Query performance going from Oracle to Postgres
We are in the process of converting from Oracle to Postgres and I have a query that is using the same index in Postgres as is used in Oracle but in Postgres the query does 16x more buffer/logical reads. I'd like to understand why. The query is hitting a partitioned table but to simply things I changed it to hit a single partition. This partition has the same number of rows in Oracle as in Postgres. Here is the Postgres query, partition definition, execution plan. I will also include the Oracle execution plan below in case anyone is interested. explain (analyze, buffers) select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from hist28.history_event_display_timestamp_20230301 historyeve0_ where historyeve0_.IS_DELETED=0 and historyeve0_.PRISM_GUID='i0accd6a2018405f095ee669dc5b4' and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW' and (historyeve0_.EVENT_TYPE not in ('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryEvent')) and (historyeve0_.PRODUCT_VIEW in ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA') or historyeve0_.PRODUCT_VIEW is null) and historyeve0_.CLIENT_ID='WLNCE_VNJXL7' and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and '2023-03-01 23:59:59.999'); Aggregate (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 rows=1 loops=1) Buffers: shared hit=341 -> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..56.64 rows=1 width=33) (actual time=0.034..0.897 rows=332 loops=1) Index Cond: (((prism_guid)::text = 'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= '2023-03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text)) Filter: (((event_type)::text <> ALL ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[])) AND (((product_view)::text = ANY ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) OR (product_view IS NULL))) Buffers: shared hit=341 Planning: Buffers: shared hit=6 Planning Time: 0.266 ms Execution Time: 0.965 ms (10 rows) *** 341 logical reads to find 332 rows. Oracle will find the same 332 rows using the same index but in Oracle it only does 20 logical reads. I thought maybe the index was fragmented so I reindexed that index: reindex index hist28.history_event_display_timesta_prism_guid_display_timestamp_idx1; Plan after that: Aggregate (cost=40.64..40.65 rows=1 width=8) (actual time=0.707..0.708 rows=1 loops=1) Buffers: shared hit=328 -> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..40.64 rows=1 width=33) (actual time=0.032..0.683 rows=332 loops=1) Index Cond: (((prism_guid)::text = 'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= '2023-03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text)) Filter: (((event_type)::text <> ALL ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[])) AND (((product_view)::text = ANY ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) OR (product_view IS NULL))) Buffers: shared hit=328 Planning: Buffers: shared hit=27 Planning Time: 0.321 ms Execution Time: 0.741 ms (10 rows) Shared hit came down a little but is still 16x more than Oracle. The actual query will hit 12 partitions so the extra amount of work it does in Postgres adds up (note some data points will find significantly more rows than 332 so performance is much worse than shown here but this data point is used just to show the difference between Oracle and Postgres). The interesting part it is seems Postgres is doing 1 shared hit per row. I don't know anyone on this list knows Postgres's internal index implementations vs Oracle's but is Postgres's Index Scan operation significantly different than Oracle's index range scan? There is something implemented much less efficiently here vs Oracle and I don't know what I can do to get this query to perform more closer to
Re: Query Performance
I should have given you the full query. Here it is Select a.locationname, a.usercode, a.itemname, a.uomname, a.batchnumber, a.expirydate, a.itemnamefk, a.itemuomfk, f.itemgroupname, a.locationpk, Sum(a.quantity) as quantity, Sum(a.freequantity) as freequantity, b.beforetaxrate as beforetaxrate, (b.rate / b.rateper) as purchaserate, b.netrate as netrate, b.netsalesrate as netsalesrate, b.effectiverate as effectiverate, b.rateper, a.reportformat, g.standardcost, g.defaultpurchaserate, g.salesrateone, g.salesratetwo, g.salesratethree, g.salesratefour, g.salesratefive, g.salesratesix, g.salesrateseven, g.salesrateeight, g.salesratenine, 'N' as negativerate From (Select k.locationname, a.usercode, a.itemname, c.uomname, j.batchnumber, j.expirydate, j.itemnamefk, j.itemuomfk, k.locationpk, j.receivedquantity as quantity, j.receivedfreequantity as freequantity, c.reportformat From in_item_name a Join in_item_uom b On b.itemnamefk = a.itemnamepk Join gl_uom c On c.uompk = b.uomfk Join view_item_receipts j On j.itemnamefk = a.itemnamepk And j.itemuomfk = b.itemuompk Join in_location k On k.locationpk = j.locationfk Where j.companycode = 'SDM' And j.branchcode = '001' And j.accountperiodid = 1 And j.voucherdate <= '2022/09/17' And j.billstatus <> 'C' And j.topparentcode <> 4 And (j.receivedquantity <> 0 Or j.receivedfreequantity <> 0) UNION ALL Select k.locationname, a.usercode, a.itemname, c.uomname, j.batchnumber, j.expirydate, j.itemnamefk, j.itemuomfk, k.locationpk, (j.issuedquantity * -1) as quantity, (j.issuedfreequantity * -1) as freequantity, c.reportformat From in_item_name a Join in_item_uom b On b.itemnamefk = a.itemnamepk Join gl_uom c On c.uompk = b.uomfk Join view_item_issues j On j.itemnamefk = a.itemnamepk And j.itemuomfk = b.itemuompk Join in_location k On k.locationpk = j.locationfk Where j.companycode = 'SDM' And j.branchcode = '001' And j.accountperiodid = 1 And j.voucherdate <= '2022/09/17' And j.billstatus <> 'C' And j.topparentcode <> 4 And (j.issuedquantity <> 0 Or j.issuedfreequantity <> 0)) a Left Outer Join view_item_receipts b <- It's actually a view of 4 tables which tries to arrive the last purchase rate On b.itemreceiptspk = (Select c.itemreceiptspk From view_item_receipts c Where c.companycode = 'SDM' And c.branchcode = '001' And c.accountperiodid = 1 And c.voucherdate <= '2022/09/17' And c.billstatus <> 'C' And c.itemnamefk = a.itemnamefk And c.itemuomfk = a.itemuomfk And c.batchnumber = a.batchnumber And c.expirydate = a.expirydate And (c.receivedquantity <> 0 Or c.receivedfreequantity <> 0) Order by c.voucherdate desc, c.vouchernumber desc, c.sequencenumber desc Limit 1 ) Join in_item_name c On c.itemnamepk = a.itemnamefk Join in_item_group f On f.itemgrouppk = c.itemgroupfk Left Outer Join in_item_rate g On g.itemuomfk = b.itemuomfk And g.itemnamefk = b.itemnamefk And '2022/09/17' between g.fromdate and g.todate Group By a.locationname, a.usercode, a.itemname, a.uomname, a.batchnumber, a.expirydate, a.itemnamefk, a.itemuomfk, f.itemgroupname, a.locationpk, b.beforetaxrate, b.rate, b.netrate, b.netsalesrate, b.effectiverate, b.rateper, a.reportformat, g.standardcost, g.defaultpurchaserate, g.salesrateone, g.salesratetwo, g.salesratethree, g.salesratefour, g.salesratefive, g.salesratesix, g.salesrateseven, g.salesrateeight, g.salesratenine Having (sum(a.quantity) + sum(a.freequantity)) <> 0 Order by 1, 3, 2, 5 Create a index for companycode, branchcode, c.accountperiodid, voucherdate, billstatus, itemnamefk, itemuomfk, batchnumber, expirydate, receivedquantity, receivedfreequantity in all the 4 tables that this view got. Happiness Always BKR Sivaprakash On Saturday, 17 September, 2022 at 03:33:48 pm IST, Peter J. Holzer wrote: On 2022-09-17 05:28:25 +, sivapostg...@yahoo.com wrote: > My query is like this > > Select a.field1, a.field2, a.field3 > From (Select a.field1, b.field2, c.field3 > From table1 a >
Re: Query Performance
On 2022-09-17 05:28:25 +, sivapostg...@yahoo.com wrote: > My query is like this > > Select a.field1, a.field2, a.field3 > From (Select a.field1, b.field2, c.field3 > From table1 a > Join table2 b > on b.something = a.something > Join table3 c > On c.something = a.something > Where a.field7 = 'value' > > UNION ALL > > Select a.field4, a.field5, a.field6 > From table11 a > Join table21 b > On b.something = a.something > Where a.field8 = 'something' ) a > Join table10 b > On b.field11 = (Select c.field11 > From table10 c > Where c.field10 = a.field1 ) <- > instead of a.field1, if I hardcode value (eg. '100') query runs faster > Join table21 c > On c.something = a.something > ... Well, you are now searching table10 for a constant value (which can be done once) instead of the output of the union (which has to be done for each line of the union, so I'm not surprised that it's faster. What is the output of `explain (analyze)` for the two queries? Is there an obvious place where an index would help? Can you restructure the query? BTW, it is (at least for me) very hard to give advice on a query with only completely abstract names like `table11` or `field4`: I have no idea what this is supposed to do, so it's hard to tell if there is a better way. Using `a` to refer to 3 different things doesn't help either. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Query Performance
On 9/17/22 00:28, sivapostg...@yahoo.com wrote: Hello, My query is like this Select a.field1, a.field2, a.field3 From (Select a.field1, b.field2, c.field3 From table1 a Join table2 b on b.something = a.something Join table3 c On c.something = a.something Where a.field7 = 'value' UNION ALL Select a.field4, a.field5, a.field6 From table11 a Join table21 b On b.something = a.something Where a.field8 = 'something' ) a Join table10 b On b.field11 = (Select c.field11 From table10 c Where c.field10 = a.field1 ) <- instead of a.field1, if I hardcode value (eg. '100') query runs faster Join table21 c On c.something = a.something ... In the above query, If I substitute a value for a.field1, query runs faster. Any suggestion/guidance/links to improve the query performance without substituting the value ? Is there an index on table1.field1? Have you vacuumed or analyzed table1 lately? Have you decomposed the query into the smallest possible query that demonstrates the problem? -- Angular momentum makes the world go 'round.
Re: Query Performance
Hello, My query is like this Select a.field1, a.field2, a.field3From (Select a.field1, b.field2, c.field3 From table1 a Join table2 b on b.something = a.something Join table3 c On c.something = a.something Where a.field7 = 'value' UNION ALL Select a.field4, a.field5, a.field6 From table11 a Join table21 b On b.something = a.something Where a.field8 = 'something' ) aJoin table10 bOn b.field11 = (Select c.field11 From table10 c Where c.field10 = a.field1 ) <- instead of a.field1, if I hardcode value (eg. '100') query runs fasterJoin table21 cOn c.something = a.something... In the above query, If I substitute a value for a.field1, query runs faster. Any suggestion/guidance/links to improve the query performance without substituting the value ? Happiness Always BKR Sivaprakash
Re: Query performance with min and filter
"Yorwerth, Adam" writes: > We seem to have found a situation where a query run using explain analyse or > pgbench is incredibly fast, but run via Java under load performs very poorly > – we’ve checked query performance metrics for our Postgres instance and can > confirm that it’s the query running slowly, and not a Java problem. Usually when people report something like this, the explanation is that it's not actually the same query being tested in both cases. In particular, if you're parameterizing the query in Java but substituting constant values when running it manually, that can make a difference. A further-afield possibility is that your application has been set up to run with custom values of work_mem, planner cost constants, etc, which you're failing to duplicate in the manual tests. If all else fails, try installing auto_explain so that you can capture the actual query plan used when the application executes this query. regards, tom lane
Re: Query performance with min and filter
Hi Adam, What're the query times when you run the query directly on the psql prompt, but without explain/analyze? Can you check the cache hit rate vs disk read on explain analyze vs from java? Sometimes, the data's in RAM when you run a query manually, but the live Java app might be hitting the disk, causing the slowdown. Are you also sure that the time taken by psql for the query when called from the java app is being measured and not the network delays etc? Regards, Karthik. -- HoSD, In-Store Platform Ph: +91 99406 59452 From: Yorwerth, Adam Sent: Monday, August 31, 2020 2:57 PM To: pgsql-general@lists.postgresql.org Cc: Cianciaruso, Lorenzo Subject: Query performance with min and filter Hi Postgres community, We seem to have found a situation where a query run using explain analyse or pgbench is incredibly fast, but run via Java under load performs very poorly – we’ve checked query performance metrics for our Postgres instance and can confirm that it’s the query running slowly, and not a Java problem. The table has ~7 million rows, with total size of 44GB. As shown below, explain analyse gives a sub 1ms execution time, yet in practice the query takes an average of 2 seconds – it’s worth mentioning that a small percentage of requests do take ~1ms, so something non-deterministic is happening here. The query in question is below, with explain analyse: explain analyse SELECT coalesce ( (SELECT min(msg_offset) - 1 FROM events WHERE created_utc >= CURRENT_TIMESTAMP - INTERVAL '60 SECONDS' ), (SELECT max(msg_offset) FROM events), 0 ); QUERY PLAN --- - Result (cost=2.03..2.04 rows=1 width=8) (actual time=0.183..0.183 rows=1 loops=1) InitPlan 1 (returns $1) -> Aggregate (cost=1.56..1.57 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1) -> Index Only Scan using test_4 on events (cost=0.44..1.56 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: (created_utc >= (now() - '00:01:00'::interval)) Heap Fetches: 0 InitPlan 3 (returns $3) -> Result (cost=0.45..0.46 rows=1 width=8) (actual time=0.153..0.153 rows=1 loops=1) InitPlan 2 (returns $2) -> Limit (cost=0.43..0.45 rows=1 width=8) (actual time=0.152..0.152 rows=1 loops=1) -> Index Only Scan Backward using events_pkey on events events_1 (cost=0.43..156842.57 rows=7643099 width=8) (actual time=0.151..0.151 rows=1 loops=1) Index Cond: (msg_offset IS NOT NULL) Heap Fetches: 0 Planning time: 2.480 ms Execution time: 0.242 ms (15 rows) Output of \d events, with a few test indexes added to try and improve performance: Table "public.events" Column|Type | Collation | Nullable | Default --+-+---+--+ msg_offset | bigint | | not null | nextval('events_msg_offset_seq'::regclass) msg_key | character varying | | not null | content_type | character varying | | not null | type | character varying | | not null | created_utc | timestamp without time zone | | not null | data | text| | | event_size | integer | | not null | 1048576 cluster_id | bigint | | not null | 1 Indexes: "events_pkey" PRIMARY KEY, btree (msg_offset) "cluster_idx" btree (msg_key, cluster_id) "cluster_type_filter_idx" btree (type, cluster_id) "created_idx" btree (created_utc, msg_key) "key_idx" btree (msg_key) "test_1" btree (created_utc, msg_offset) "test_2" btree (msg_offset, created_utc) "test_3" btree (created_utc, msg_offset) "test_4" btree (created_utc, msg_offset DESC) "type_idx" btree (type) Any help understanding what is happening here, or guidance to improve performance would be much appreciated. Regards, Adam This is a confidential email. Tesco may monitor and record all emails. The views expressed in this email are those of the sender and not Tesco. Tesco Stores Limited Company Number: 519500 Registered in England Registered Office: Tesco House, Shire Park, Kestrel Way, Welwyn Garden City, AL7 1GA VAT Registration Number: GB 220 4302 31
Query performance with min and filter
Hi Postgres community, We seem to have found a situation where a query run using explain analyse or pgbench is incredibly fast, but run via Java under load performs very poorly – we’ve checked query performance metrics for our Postgres instance and can confirm that it’s the query running slowly, and not a Java problem. The table has ~7 million rows, with total size of 44GB. As shown below, explain analyse gives a sub 1ms execution time, yet in practice the query takes an average of 2 seconds – it’s worth mentioning that a small percentage of requests do take ~1ms, so something non-deterministic is happening here. The query in question is below, with explain analyse: explain analyse SELECT coalesce ( (SELECT min(msg_offset) - 1 FROM events WHERE created_utc >= CURRENT_TIMESTAMP - INTERVAL '60 SECONDS' ), (SELECT max(msg_offset) FROM events), 0 ); QUERY PLAN --- - Result (cost=2.03..2.04 rows=1 width=8) (actual time=0.183..0.183 rows=1 loops=1) InitPlan 1 (returns $1) -> Aggregate (cost=1.56..1.57 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1) -> Index Only Scan using test_4 on events (cost=0.44..1.56 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: (created_utc >= (now() - '00:01:00'::interval)) Heap Fetches: 0 InitPlan 3 (returns $3) -> Result (cost=0.45..0.46 rows=1 width=8) (actual time=0.153..0.153 rows=1 loops=1) InitPlan 2 (returns $2) -> Limit (cost=0.43..0.45 rows=1 width=8) (actual time=0.152..0.152 rows=1 loops=1) -> Index Only Scan Backward using events_pkey on events events_1 (cost=0.43..156842.57 rows=7643099 width=8) (actual time=0.151..0.151 rows=1 loops=1) Index Cond: (msg_offset IS NOT NULL) Heap Fetches: 0 Planning time: 2.480 ms Execution time: 0.242 ms (15 rows) Output of \d events, with a few test indexes added to try and improve performance: Table "public.events" Column|Type | Collation | Nullable | Default --+-+---+--+ msg_offset | bigint | | not null | nextval('events_msg_offset_seq'::regclass) msg_key | character varying | | not null | content_type | character varying | | not null | type | character varying | | not null | created_utc | timestamp without time zone | | not null | data | text| | | event_size | integer | | not null | 1048576 cluster_id | bigint | | not null | 1 Indexes: "events_pkey" PRIMARY KEY, btree (msg_offset) "cluster_idx" btree (msg_key, cluster_id) "cluster_type_filter_idx" btree (type, cluster_id) "created_idx" btree (created_utc, msg_key) "key_idx" btree (msg_key) "test_1" btree (created_utc, msg_offset) "test_2" btree (msg_offset, created_utc) "test_3" btree (created_utc, msg_offset) "test_4" btree (created_utc, msg_offset DESC) "type_idx" btree (type) Any help understanding what is happening here, or guidance to improve performance would be much appreciated. Regards, Adam This is a confidential email. Tesco may monitor and record all emails. The views expressed in this email are those of the sender and not Tesco. Tesco Stores Limited Company Number: 519500 Registered in England Registered Office: Tesco House, Shire Park, Kestrel Way, Welwyn Garden City, AL7 1GA VAT Registration Number: GB 220 4302 31
[HELP] General query - performance metrics
Hello, I have set up pgpool and there are 2 postgres servers behind it. I had run pgbench with and without pgpool and got the results. I would like to showcase the results for a presentation. Just thought of asking you people if there are any tools to generate nice results with charts and graphs ? TIA -- *Regards,* *K S Praveen Kumar*
Re: Suggestion to improve query performance of data validation in proc.
On Fri, May 22, 2020 at 2:09 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Friday, May 22, 2020, postgann2020 s wrote: > > >> >> We are looking for a better query than "*SELECT 1 FROM >> schema.table_name WHERE column1=structure_id1*" this query for data >> validation. >> > If many rows match potentially, then wrapping the query with select exists(old_query) would allow the execution to bail asap.
Re: Suggestion to improve query performance of data validation in proc.
On Friday, May 22, 2020, postgann2020 s wrote: > > We are looking for a better query than "*SELECT 1 FROM schema.table_name > WHERE column1=structure_id1*" this query for data validation. > There is no more simple a query that involve records on a single,table. Please suggest is there any other ways to validate this kind of queries > which will improve the overall performance. > Abandon procedural logic and embrace the declarative set oriented nature of SQL. David J.
Re: Suggestion to improve query performance of data validation in proc.
Hi David, Thanks for your feedback. We are using the below kind of validation throughout the proc in multiple locations and for validation we are using the below statements. --check Data available or not for structure_id1 IF EXISTS(SELECT 1 FROM schema.table_name WHERE column1=structure_id1) THEN is_exists1 :=true; END IF; We are looking for a better query than "*SELECT 1 FROM schema.table_name WHERE column1=structure_id1*" this query for data validation. Please suggest is there any other ways to validate this kind of queries which will improve the overall performance. Regards, Postgann. On Fri, May 22, 2020 at 12:36 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > You should read through the and the contained linked FAQ - note especially > the concept and recommendation for “cross-posting”. > > https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics > > On Thursday, May 21, 2020, postgann2020 s wrote: > >> >> We have multiple long procs that are having 100s of data validations and >> currently we have written as below. >> >> *** >> >> if (SELECT 1 FROM SCHEMA.TABLE WHERE column=data AND column=data) then >> statements >> etc.. >> >> *** >> >> Are there any other ways to validate the data, which will help us to >> improve the performance of the query? >> > > I have no idea what your are trying to get at here. You should try > providing SQL that actually runs. Though at first glance it seems quite > probable your are doing useless work anyway. > > David J. >
Re: Suggestion to improve query performance of data validation in proc.
You should read through the and the contained linked FAQ - note especially the concept and recommendation for “cross-posting”. https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics On Thursday, May 21, 2020, postgann2020 s wrote: > > We have multiple long procs that are having 100s of data validations and > currently we have written as below. > > *** > > if (SELECT 1 FROM SCHEMA.TABLE WHERE column=data AND column=data) then > statements > etc.. > > *** > > Are there any other ways to validate the data, which will help us to > improve the performance of the query? > I have no idea what your are trying to get at here. You should try providing SQL that actually runs. Though at first glance it seems quite probable your are doing useless work anyway. David J.
Suggestion to improve query performance of data validation in proc.
Hi Team, Thanks for your support. Could you please suggest on below query. We have multiple long procs that are having 100s of data validations and currently we have written as below. *** if (SELECT 1 FROM SCHEMA.TABLE WHERE column=data AND column=data) then statements etc.. *** Are there any other ways to validate the data, which will help us to improve the performance of the query?. Thanks for your support. Regards, PostgAnn.
Suggestion to improve query performance.
Hi Team, Thanks for your support. We are using below environment: Application : Programming Language : JAVA Geoserver Database Stack: PostgreSQL : 9.5.15 Postgis We have 3 geoserver queries and are getting some performance issues after changing the GeoServer queries.I have posted the queries and explain the plans of both the old and new queries. The same type of issues found for 3 queries: 1. Changed index scan to Bitmap scan. 2. All New Queries, again condition checked. Old Queriy: >> Query No:1 1. No issue while executing query. 2. It is feteching: 38 rows only. === EXPLAIN ANALYZE SELECT "underground_route_id","ug_route_sub_type","sw_uid22",encode(ST_AsBinary(ST_Simplify(ST_Force2D("the_geom"), 1.506687768824122E-5, true)),'base64') as "the_geom" FROM "schema"."underground_route" WHERE ("the_geom" && ST_GeomFromText('POLYGON ((77.20637798309326 28.627887618687176, 77.20637798309326 28.632784466413323, 77.21195697784424 28.632784466413323, 77.21195697784424 28.627887618687176, 77.20637798309326 28.627887618687176))', 4326) AND (("ug_route_sub_type" = 'IP1-IRU-Intercity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'IP1-IRU-Intracity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'IRU-Intracity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'IRU-Intercity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'IP1-Own-Intercity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'IP1-Own-Intracity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Own-Intracity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Own-Intercity' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Own-Intercity-Patch-replacement' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Own-Intracity-Patch-replacement' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Clamping' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'None' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'On kerb' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Other' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'Suspend' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = 'In Duct Chamber' AND "ug_route_sub_type" IS NOT NULL ) OR ("ug_route_sub_type" = '' AND "ug_route_sub_type" IS NOT NULL ) OR "ug_route_sub_type" IS NULL OR ("sw_uid22" = 'Overhead' AND "sw_uid22" IS NOT NULL AND "ug_route_sub_type" = 'Own-Intercity' AND "ug_route_sub_type" IS NOT NULL ) OR ("sw_uid22" = 'Overhead' AND "sw_uid22" IS NOT NULL AND "ug_route_sub_type" = 'Own-Intracity' AND "ug_route_sub_type" IS NOT NULL ) OR ("sw_uid22" = 'Overhead' AND "sw_uid22" IS NOT NULL AND "ug_route_sub_type" = 'Own-Intercity-Patch-replacement' AND "ug_route_sub_type" IS NOT NULL ) OR ("sw_uid22" = 'Overhead' AND "sw_uid22" IS NOT NULL AND "ug_route_sub_type" = 'Own-Intracity-Patch-replacement' AND "ug_route_sub_type" IS NOT NULL ))); Explan Plan: Index Scan using underground_route_the_geom_geo_idx on underground_route (cost=0.41..41.20 rows=7 width=157) (actual time=0.158..1.010 rows=38 loops=1) Index Cond: (the_geom && '010320E6100100054C354D534022DEBDA03C40004C354D53407BA9AC29FEA13C4000B4904D53407BA9AC29FEA13C4000B49 04D534022DEBDA03C40004C354D534022DEBDA03C40'::geometry) Filter: ug_route_sub_type)::text = 'IP1-IRU-Intercity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'IP1-IRU-Intracity'::text) AN D (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'IRU-Intracity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'IRU-In tercity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'IP1-Own-Intercity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub _type)::text = 'IP1-Own-Intracity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'Own-Intracity'::text) AND (ug_route_sub_type IS NOT NUL L)) OR (((ug_route_sub_type)::text = 'Own-Intercity'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'Own-Intercity-Patch-replacement'::tex t) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'Own-Intracity-Patch-replacement'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_su b_type)::text = 'Clamping'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_route_sub_type)::text = 'None'::text) AND (ug_route_sub_type IS NOT NULL)) OR (((ug_rout e_sub_type)::text
Re: strange slow query performance
On Thu, Jan 17, 2019 at 4:13 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero > wrote: > > Any ideas as to why this is happening? > > Not really, I would expect roughly double execution time, not an > exponential increase. Still not experienced enough to diagnose with > what has been provided but I will suggest you provide the version that > you are running these queries against and confirm that the results are > consistent across multiple runs to discount the possibility that > locking or other transient work is involved. > > Curious if increasing work_mem helps but its mostly a trial-and-error > thing for me (I wouldn't expect it to)... > > David J. > Query was tested multiple times with similar results. I also tried increasing work_mem with the same results as well. Memory did not increase or even come close to the 10MB setting. Query does seem to occupy one full cpu for the duration of the run though. Query is running on windows with Postgres 9.6.7
Re: strange slow query performance
On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero wrote: > Any ideas as to why this is happening? Not really, I would expect roughly double execution time, not an exponential increase. Still not experienced enough to diagnose with what has been provided but I will suggest you provide the version that you are running these queries against and confirm that the results are consistent across multiple runs to discount the possibility that locking or other transient work is involved. Curious if increasing work_mem helps but its mostly a trial-and-error thing for me (I wouldn't expect it to)... David J.
Re: strange slow query performance
Ben Snaidero wrote: > The following query runs as expected. > > explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS > MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS > MainTable_type FROM MainTable >WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) > AND objectid=3161; > > QUERY PLAN > > - > Index Scan using ix_MainTable_objectid_datetime on MainTable > (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 > loops=1) >Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161)) >InitPlan 1 (returns $3) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.403..26.147 rows=19042 loops=1) > [...] > Planning time: 5.693 ms > Execution time: 33.383 ms > (15 rows) > > But when we add a second condition to the where clause it causes the query > performance to drop dramatically. Values in "any(array" do not make a > difference. In this example they are the same but even for different values > performance is still the poor. Any ideas as to why this is happening? > > explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS > MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS > MainTable_type FROM MainTable >WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) > AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; > > > QUERY PLAN > > > -- > Index Scan using ix_MainTable_objectid_datetime on MainTable > (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 > rows=0 loops=1) >Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid > = 3161)) >InitPlan 1 (returns $3) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.438..28.484 rows=19042 loops=1) > [...] >InitPlan 2 (returns $7) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.056..11.786 rows=19042 loops=1) > [...] > Planning time: 4.860 ms > Execution time: 16442.462 ms > (26 rows) Not sure what is going on, but why don't you put the condition "WHERE ne.objectid=3161" into the subselects? Then there should be at most one result row, and I can imagine that things will become faster. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
strange slow query performance
The following query runs as expected. explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.403..26.147 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.249..10.485 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.103..0.233 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.074..5.022 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.002..0.007 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 Planning time: 5.693 ms Execution time: 33.383 ms (15 rows) But when we add a second condition to the where clause it causes the query performance to drop dramatically. Values in "any(array" do not make a difference. In this example they are the same but even for different values performance is still the poor. Any ideas as to why this is happening? explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN -- - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.438..28.484 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.172..7.141 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.046..0.111 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.073..3.438 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.003..0.010 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 InitPlan 2 (returns $7) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.056..11.786 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.034..1.343 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne_1 (cost=0.41..16.80 rows=2 width=8) (actual time=0.007..0.016 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_
Query performance
I want to compare the performance of some queries so I created those tables: *CREATE TABLE reservation1 (room int, name varchar(255), during daterange);CREATE TABLE reservation2 (room int, name varchar(255), start_date date, end_date date);* then: *INSERT INTO public.reservation1(name, room, during) select concat('room ', a), a, '[2010-01-01, 2010-04-02)' from generate_series(1,1000) as a;INSERT INTO public.reservation2(name, room, start_date, end_date) select concat('room ', a), a, '2010-01-01', '2010-04-02' from generate_series(1,1000) as a;* I create an index for during column: *CREATE INDEX reservation1_idx ON reservation1 USING GIST (during);* I'm using the operator contains (@>) and overlaps (&&): *select * from reservation1where during @> '[2010-02-15,2010-03-02)';select * from reservation1where during && '[2010-02-15,2010-03-02)'; * And I got such result time: * 25s* However when I use this query: *select * from reservation2where ('2010-02-15' between start_date and end_date)and ('2010-03-02' between start_date and end_date);* The result time was: * 9s.* I understand that the index does not have any effect when the amount of fetched data was huge and the query planner used the seq scan method. I want to know if it is not recommended to use rang types and operator to get good performance or should I add something to the queries to be faster?
Re: query performance
On Sun, 18 Feb 2018 00:35:18 +0100 hmidi slimwrote: > Hi, > I have two tables: establishment which contains these columns: id, name, > longitude, latitude, geom (Geometric column) > Product contains: id, name, establishment_id > First of all I want to select the establishment within a radius. > I run this query: > select e.name, e1.name > from establishment as e, establishment as e1 > where e.id <> e1.id > and e1.id = 1 > and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, > 1000) > > The result of this query returns all the establishment within a radius 1KM > from from a given establishment which has an id = 1. > > After that I want to get the product's name of each establishment from the > query's result. > > Is there an other optimized solution to make a query such this: > select * from ( > select e.name, e1.name, e.id > from establishment as e, establishment as e1 > where e.id <> e1.id > and e1.id = 1 > and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, > 1000)) as tmp inner join product as p on p.establishment_id = tmp.id Did you not see my earlier response? -- Bill Moran
Re: query performance
On 02/18/2018 06:37 AM, David Rowley wrote: > On 18 February 2018 at 12:35, hmidi slimwrote: >> Is there an other optimized solution to make a query such this: >> select * from ( >> select e.name, e1.name, e.id >> from establishment as e, establishment as e1 >> where e.id <> e1.id >> and e1.id = 1 >> and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, >> 1000)) as tmp inner join product as p on p.establishment_id = tmp.id > > You really should state what is wrong with the performance of the > above version and also include the EXPLAIN (ANALYZE, BUFFERS) of that > query. > > The version of PostgreSQL that you're running it on is also a good > thing to share. > > Details of the indexes which you've defined on the tables are also > useful information. It would especially be good to know if you've > added an index on product (establishment_id), for example. > There's actually a wiki page with overview of what should be included in "slow query" questions: https://wiki.postgresql.org/wiki/Slow_Query_Questions regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: query performance
On 18 February 2018 at 12:35, hmidi slimwrote: > Is there an other optimized solution to make a query such this: > select * from ( > select e.name, e1.name, e.id > from establishment as e, establishment as e1 > where e.id <> e1.id > and e1.id = 1 > and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, > 1000)) as tmp inner join product as p on p.establishment_id = tmp.id You really should state what is wrong with the performance of the above version and also include the EXPLAIN (ANALYZE, BUFFERS) of that query. The version of PostgreSQL that you're running it on is also a good thing to share. Details of the indexes which you've defined on the tables are also useful information. It would especially be good to know if you've added an index on product (establishment_id), for example. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
query performance
Hi, I have two tables: establishment which contains these columns: id, name, longitude, latitude, geom (Geometric column) Product contains: id, name, establishment_id First of all I want to select the establishment within a radius. I run this query: select e.name, e1.name from establishment as e, establishment as e1 where e.id <> e1.id and e1.id = 1 and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, 1000) The result of this query returns all the establishment within a radius 1KM from from a given establishment which has an id = 1. After that I want to get the product's name of each establishment from the query's result. Is there an other optimized solution to make a query such this: select * from ( select e.name, e1.name, e.id from establishment as e, establishment as e1 where e.id <> e1.id and e1.id = 1 and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, 1000)) as tmp inner join product as p on p.establishment_id = tmp.id