Re: How to improve the performance of my SQL query?

2023-07-29 Thread Alban Hertroys


> On 29 Jul 2023, at 10:59, Peter J. Holzer  wrote:
> 
> On 2023-07-26 15:46:16 +0800, gzh wrote:
>> SET enable_seqscan TO off;
> [...]
>>->  Parallel Bitmap Heap Scan on tbl_sha  
>> (cost=92112.45..2663789.14 rows=800650 width=18) (actual 
>> time=260.540..21442.169 rows=804500 loops=3)
>>  Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>>  Rows Removed by Index Recheck: 49
>>  Filter: (etrys = '0001'::bpchar)
>>  Rows Removed by Filter: 295500
>>  Heap Blocks: exact=13788 lossy=10565
>>  ->  Bitmap Index Scan on index_search_04_mscd_cdate 
>>  (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
>> rows=330 loops=1)
>>Index Cond: (ms_cd = 'MLD009'::bpchar)
> 
> So now it's using index_search_04_mscd_cdate which contains only ms_cd
> (and - judging from the name, other fields not relevant to this query),
> but it still doesn't use index_search_01 which would fit the query
> exactly. I can understand that Postgres prefers a sequential scan over
> an index scan (the number of matching rows is about 10% of the total
> table size which is a lot), but why would it prefer a less specific
> index to a more specific one?
> 
> Can you get Postgres to use that index at all?
> 
> Find a combination of ms_cd and etrys which doesn't cover millions of
> rows and try that.
> 
> Also try lowering random_page_cost.

Wasn’t this an RDS server with just 4GB of memory?

How large are those multi-column indices? Perhaps they don’t (all) fit into 
available cache memory and the server decided to use the one that it had cached?

I’m frankly not at all certain how the server would behave around such resource 
shortage situations, but I suppose loading an uncached index into cache could 
get a higher cost than using a less optimal (costlier) index that’s already 
cached.


Regarding lowering random_page_cost; If your index files are on SSD storage, 
lowering that sufficiently (to a realistic value) could then sufficiently lower 
the cost of loading that uncached index into memory, evicting the index it was 
using in above plan to make room (unless other active sessions are using it).

Alban Hertroys
--
There is always an exception to always.








Re: How to improve the performance of my SQL query?

2023-07-29 Thread Peter J. Holzer
On 2023-07-26 15:46:16 +0800, gzh wrote:
> SET enable_seqscan TO off;
[...]
> ->  Parallel Bitmap Heap Scan on tbl_sha  
> (cost=92112.45..2663789.14 rows=800650 width=18) (actual 
> time=260.540..21442.169 rows=804500 loops=3)
>   Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>   Rows Removed by Index Recheck: 49
>   Filter: (etrys = '0001'::bpchar)
>   Rows Removed by Filter: 295500
>   Heap Blocks: exact=13788 lossy=10565
>   ->  Bitmap Index Scan on index_search_04_mscd_cdate 
>  (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
> rows=330 loops=1)
> Index Cond: (ms_cd = 'MLD009'::bpchar)

So now it's using index_search_04_mscd_cdate which contains only ms_cd
(and - judging from the name, other fields not relevant to this query),
but it still doesn't use index_search_01 which would fit the query
exactly. I can understand that Postgres prefers a sequential scan over
an index scan (the number of matching rows is about 10% of the total
table size which is a lot), but why would it prefer a less specific
index to a more specific one?

Can you get Postgres to use that index at all?

Find a combination of ms_cd and etrys which doesn't cover millions of
rows and try that.

Also try lowering random_page_cost.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to improve the performance of my SQL query?

2023-07-28 Thread gzh
Thank you very much for taking the time to reply to my question. 




> The problem is none of the explains you sent match with the description 
> above. The last one when you forced the optimizer to go with index scan (SET 
> enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd 
> and cd_ate (following your standard of 2 characters column name). There may 
> have a couple of explanations to this:

> - One is that the index may not be exactly the same as described above;

The primary key of the tbl_sha table consists of several fields, and ms_cd is 
just one of them. I just explained the definitions of the fields used in the 
query SQL.




> - Another one is the order in the index. Because you have a composed index 
> the order of the columns in the index matters, and it seems the order is 
> (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the 
> following order: (ms_cd, etrys, et_cd) and run the same query;

The index of TBL_SHA table is defined as follows.




CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS)

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, 
FR_CD, RM_CD)




> You can also try to trick the optimizer, for example, what is the result (and 
> explain) of the below query?

> 

> WITH tbi (ry_cd) AS (

> SELECT tbl_inf.ry_cd

> FROM tbl_inf tbi

> WHERE tbi.ms_cd = 'MLD009'

> AND tbl_inf.ry_cd = '0001'

> ) SELECT COUNT(et_cd) FROM tbl_sha tbs

> JOIN tbi ON tbi.ry_cd = tbs .etrys

> WHERE tbs .ms_cd = 'MLD009';

The SQL execution encountered an error, so I made some modifications.

Please refer to the execution plan.

 WHERE tbi.ms_cd = 'MLD009'

→

 WHERE tbl_inf.ms_cd = 'MLD009'




QUERY PLAN

Limit  (cost=2668811.76..2668811.77 rows=1 width=8) (actual 
time=133555.074..133557.729 rows=1 loops=1)

  ->  Aggregate  (cost=2668811.76..2668811.77 rows=1 width=8) (actual 
time=133555.072..133557.726 rows=1 loops=1)

->  Nested Loop  (cost=1000.29..2664512.83 rows=1719572 width=9) 
(actual time=29657.638..133341.053 rows=2113500 loops=1)

  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=1.316..1.321 rows=1 loops=1)

Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))

Heap Fetches: 1

  ->  Gather  (cost=1000.00..2647308.80 rows=1719572 width=18) 
(actual time=29656.318..132969.910 rows=2113500 loops=1)

Workers Planned: 2

Workers Launched: 2

->  Parallel Seq Scan on tbl_sha tbs  
(cost=0.00..2474351.60 rows=716488 width=18) (actual time=29654.184..132876.292 
rows=704500 loops=3)

  Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = 
'0001'::bpchar))

  Rows Removed by Filter: 14678996

Planning Time: 0.164 ms

Execution Time: 133557.767 ms




> Well, adding more resources tends to improve performance, but it's usually 
> not linear and the improvement may not be as large as you want for the extra 
> price you are paying. I would first try to understand the performance problem 
> because using the "add more resources" approach may just delay the problem 
> and it tends to get worse with time as the dataset increases.

I strongly agree with your viewpoint, but I currently don't have a solution in 
mind for the problem.
















At 2023-07-28 04:38:39, "Charly"  wrote:

Hi "gzh",


Based on the info you provided I'm assuming you are trying to use the TBL_SHA 
primary key to do an index-only scan as in you mentioned above you have:
> TBL_SHA
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

Assuming a composed index here by the 3 columns.


> TBL_INF
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> ry_cd character(8) NOT NULL   -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2 columns.



The problem is none of the explains you sent match with the description above. 
The last one when you forced the optimizer to go with index scan (SET 
enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and 
cd_ate (following your standard of 2 characters column name). There may have a 
couple of explanations to this:
 - One is that the index may not be exactly the same as described above;
 - Another one is the order in the index. Because you have a composed index the 
order of the columns in the index matters, and it seems the order is (ms_cd, 
et_cd, etrys). I wonder if you could recreate this index with the following 
order: (ms_cd, etrys, et_cd) and run the same query;
 There may be other problems happening there, but those are the ones I see more 
evident from the description of the problem.


Giving a closer look to your query I 

Re: How to improve the performance of my SQL query?

2023-07-27 Thread Charly
Hi "gzh",

Based on the info you provided I'm assuming you are trying to use the
TBL_SHA primary key to do an index-only scan as in you mentioned above you
have:
> TBL_SHA
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
Assuming a composed index here by the 3 columns.

> TBL_INF
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> ry_cd character(8) NOT NULL   -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2
columns.

The problem is none of the explains you sent match with the description
above. The last one when you forced the optimizer to go with index scan
(SET enable_seqscan TO off;) the chosen index seems to be one composed by
*ms_cd* and cd_ate (following your standard of 2 characters column name).
There may have a couple of explanations to this:
 - One is that the index may not be exactly the same as described above;
 - Another one is the order in the index. Because you have a composed index
the order of the columns in the index matters, and it seems the order is
(ms_cd, et_cd, etrys). I wonder if you could recreate this index with the
following order: (ms_cd, etrys, et_cd) and run the same query;
 There may be other problems happening there, but those are the ones I see
more evident from the description of the problem.

Giving a closer look to your query I really didn't understand the reasoning
to have that subselect as it seems only to validate the two relations have
common items or to validate a parent relationship, in this case you can
probably use a join to get the same result:

SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '0001';

You can also try to trick the optimizer, for example, what is the result
(and explain) of the below query?

WITH tbi (ry_cd) AS (
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '0001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs .etrys
WHERE tbs .ms_cd = 'MLD009';

You can alternatively try the CTE using the both columns in the JOIN clause.

On Thu, 27 Jul 2023 at 05:10, gzh  wrote:

> Thank you for your suggestion.
> The database is hosted on AWS RDS, with the instance having vCPU=2 and
> RAM=4GB.
> Could the low performance be due to the low configuration of AWS RDS?
> We are considering trying a higher configuration instance.
>

Well, adding more resources tends to improve performance, but it's usually
not linear and the improvement may not be as large as you want for the
extra price you are paying. I would first try to understand the performance
problem because using the "add more resources" approach may just delay the
problem and it tends to get worse with time as the dataset increases.

I hope it helps.

-- 
Regards,

*Charly Batista*
*Shanghai, China*
*carl...@gmail.com  *
*https://www.linkedin.com/in/charlybatista/
*

*Linux user #391083*


*“If you have an apple and I have an apple and we exchange these apples
then you and I will still each have one apple. But if you have an idea and
I have an idea and we exchange these ideas, then each of us will have two
ideas."  George Bernard Shaw (1856-1950)*


Re: How to improve the performance of my SQL query?

2023-07-27 Thread gzh
Thank you for your suggestion. 
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB. 
Could the low performance be due to the low configuration of AWS RDS? 
We are considering trying a higher configuration instance.








At 2023-07-27 11:36:20, "David Rowley"  wrote:
>On Wed, 26 Jul 2023 at 19:46, gzh  wrote:
>> QUERY PLAN (enable_seqscan=on)
>
>> Execution Time: 167183.133 ms
>
>> QUERY PLAN (enable_seqscan=off)
>
>> Execution Time: 22320.153 ms
>
>effective_cache_size and random_page_cost are the settings you should
>be adjusting to coax the planner into using the index.
>
>A rule of thumb for effective_cache_size would be to set it to about
>75% of RAM. There are certainly cases where lower would make more
>sense, certainly, 75% will make more sense than the default 4GB value
>in the majority of cases.
>
>For random_page_cost, the default of 4.0 has been the default since
>HDDs were common. SSDs are common now and, comparatively to sequential
>I/O, their random I/O is faster than that of an HDD, so you may get
>better results by lowering random_page_cost.
>
>David


Re: How to improve the performance of my SQL query?

2023-07-26 Thread David Rowley
On Wed, 26 Jul 2023 at 19:46, gzh  wrote:
> QUERY PLAN (enable_seqscan=on)

> Execution Time: 167183.133 ms

> QUERY PLAN (enable_seqscan=off)

> Execution Time: 22320.153 ms

effective_cache_size and random_page_cost are the settings you should
be adjusting to coax the planner into using the index.

A rule of thumb for effective_cache_size would be to set it to about
75% of RAM. There are certainly cases where lower would make more
sense, certainly, 75% will make more sense than the default 4GB value
in the majority of cases.

For random_page_cost, the default of 4.0 has been the default since
HDDs were common. SSDs are common now and, comparatively to sequential
I/O, their random I/O is faster than that of an HDD, so you may get
better results by lowering random_page_cost.

David




Re: How to improve the performance of my SQL query?

2023-07-26 Thread gzh
Thank you very much for taking the time to reply to my question. 

>You might want to check your description of the table definitions.
>Going by the above EXPLAIN ANALYZE output, it very much does not look
>like ms_cd is the primary key of TBL_SHA. If it is then it's very
>weird that you have 320 rows for MS_CD = 'MLD009'. You have some
>data corruption if that's the case. I suspect you've just not

>accurately described the table definition, however.
The primary key of the SHA table has six fields, and ms_cd is just one of them. 
I'm sorry, I didn't make that clear.


>Try executing the query after having done:
>
>SET enable_seqscan TO off;
>
>What plan does it use now?
>

>Is that plan faster or slower than the seq scan plan?
There's improvement, but it's still quite slow.

QUERY PLAN (enable_seqscan=on)
Limit  (cost=2693516.87..2693516.88 rows=1 width=8) (actual 
time=167089.822..167183.058 rows=1 loops=1)
  ->  Aggregate  (cost=2693516.87..2693516.88 rows=1 width=8) (actual 
time=167089.820..167183.056 rows=1 loops=1)
->  Nested Loop  (cost=1000.29..2688558.85 rows=1983209 width=9) 
(actual time=43544.753..166906.304 rows=2413500 loops=1)
  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=1.034..1.038 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))
Heap Fetches: 1
  ->  Gather  (cost=1000.00..2668718.45 rows=1983209 width=18) 
(actual time=43543.714..166447.333 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbl_sha  (cost=0.00..2469397.55 
rows=826337 width=18) (actual time=43537.056..166225.162 rows=804500 loops=3)
  Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = 
'0001'::bpchar))
  Rows Removed by Filter: 15362328
Planning Time: 2.942 ms
Execution Time: 167183.133 ms

SET enable_seqscan TO off;

QUERY PLAN (enable_seqscan=off)
Limit  (cost=2880973.06..2880973.07 rows=1 width=8) (actual 
time=22295.419..22320.102 rows=1 loops=1)
  ->  Aggregate  (cost=2880973.06..2880973.07 rows=1 width=8) (actual 
time=22295.418..22320.100 rows=1 loops=1)
->  Nested Loop  (cost=93112.74..2876169.16 rows=1921561 width=9) 
(actual time=265.880..22000.432 rows=2413500 loops=1)
  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=0.013..0.020 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))
Heap Fetches: 1
  ->  Gather  (cost=93112.45..2856945.24 rows=1921561 width=18) 
(actual time=265.864..21535.325 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Bitmap Heap Scan on tbl_sha  
(cost=92112.45..2663789.14 rows=800650 width=18) (actual 
time=260.540..21442.169 rows=804500 loops=3)
  Recheck Cond: (ms_cd = 'MLD009'::bpchar)
  Rows Removed by Index Recheck: 49
  Filter: (etrys = '0001'::bpchar)
  Rows Removed by Filter: 295500
  Heap Blocks: exact=13788 lossy=10565
  ->  Bitmap Index Scan on index_search_04_mscd_cdate  
(cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
rows=330 loops=1)
Index Cond: (ms_cd = 'MLD009'::bpchar)
Planning Time: 0.670 ms
Execution Time: 22320.153 ms





















At 2023-07-25 21:04:16, "David Rowley"  wrote:
>On Fri, 21 Jul 2023 at 13:44, gzh  wrote:
>>
>> The definitions of the columns used in SQL are as follows.
>>
>> TBL_SHA
>> ms_cd character(6) NOT NULL   -- PRIMARY KEY
>> et_cd character(8)
>> etrys character(8)
>
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>> and TBL_SHA.ETRYS in
>>(select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = 'MLD009'
>>AND TBL_INF.RY_CD = '0001'
>>)
>> - Execution Plan -
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.767..124168.769 rows=1 loops=1)
>> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) 
>> (actual time=97264.166..123920.769 rows=320 loops=1)
>>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
>> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
>> '0001'::bpchar))
>> Heap Fetches: 1
>>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
>> width=18) 

Re: How to improve the performance of my SQL query?

2023-07-25 Thread David Rowley
On Fri, 21 Jul 2023 at 13:44, gzh  wrote:
>
> The definitions of the columns used in SQL are as follows.
>
> TBL_SHA
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = 'MLD009'
> and TBL_SHA.ETRYS in
>(select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = 'MLD009'
>AND TBL_INF.RY_CD = '0001'
>)
> - Execution Plan -
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.767..124168.769 rows=1 loops=1)
> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
> time=97264.166..123920.769 rows=320 loops=1)
>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
> '0001'::bpchar))
> Heap Fetches: 1
>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
> Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = 
> '0001'::bpchar))
> Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 320 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not
accurately described the table definition, however.

David




Re: Re: How to improve the performance of my SQL query?

2023-07-25 Thread David Rowley
On Thu, 20 Jul 2023 at 23:36, gzh  wrote:
>
>
> Thank you very much for taking the time to reply to my question.
>
>
> Sorry, I provided incorrect information.
>
> The index also does not work in the following query statement.
>
>
> > select COUNT(ET_CD)
> > from TBL_SHA
> > WHERE MS_CD = '009'
> > AND ETRYS = '01'
>
> QUERY PLAN
> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual 
> time=128667.439..128668.250 rows=1 loops=1)
>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) 
> (actual time=128667.437..128668.246 rows=1 loops=1)
> ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual 
> time=128664.108..128668.233 rows=3 loops=1)
>   Workers Planned: 2
>   Workers Launched: 2
>   ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 
> width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
> ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 
> rows=1237762 width=9) (actual time=75357.455..128531.615 rows=107 loops=3)
>   Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
>   Rows Removed by Filter: 11833442
> Planning Time: 0.118 ms
> Execution Time: 128668.290 ms
>
> The TBL_SHA table has another index, as shown below.
>
> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, 
> BK_CD, FR_CD, RM_CD)
>
> When I take the following query statement, the result is returned quickly.
> Why does index_search_01 always not work?

The method to access the table is decided by the query planner based
on costs.  The costs are driven off the row estimates which are driven
from table statistics.  If the table statistics, for example say that
99% of rows have MS_CD = '009', then scanning an index on MS_CD is
unlikely to be a good idea as that would likely require random access
to the heap.  It's likely better to perform a table scan and then just
filter out the 1% of rows that don't match.

Try executing the query after having done:

SET enable_seqscan TO off;

What plan does it use now?

Is that plan faster or slower than the seq scan plan?

David




Re: How to improve the performance of my SQL query?

2023-07-24 Thread Laurenz Albe
On Tue, 2023-07-25 at 11:11 +0800, gzh wrote:
> > I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
> > Or do you have a version that is too old for SETTINGS?
> Sorry. Please refer to the following execution plan.
> 
> [...]
> Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = 
> '"$user", mdb'
> [...]

No unusual settings.  So that theory is dead.

> > One other idea: check if the index is INVALID (this will
> > be visible if you run "\d tablenane" in "psql").
> > Invalid indexes won't be used.

Did you check that?

After that, I am out of ideas.

Yours,
Laurenz Albe




Re: How to improve the performance of my SQL query?

2023-07-24 Thread gzh
Thank you for your reply.


>I think the whole query can just:
>select COUNT(ET_CD)
>from TBL_SHA
>WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '0001';
This is related to the business logic.


>if many duplicates rows returned, then there is no point of evaluate
>something like {1 in (1,1,1,1,1,1,)}
Because the primary key of the 'tbl_inf' table only consists of 'ms_cd' and 
'ry_cd' columns, the subquery will not return duplicate rows.

















At 2023-07-24 22:42:01, "jian he"  wrote:
>On Mon, Jul 24, 2023 at 5:54 PM gzh  wrote:
>>
>> >Did you change any parameters that have an impact on query planning?
>>
>> >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
>>
>> I added some parameters and re-executed the Execution Plan.
>>
>> Except for the index not taking effect, I still don't know the reason why 
>> the index is not working.
>>
>> Is it because there is too much data that meets the conditions?
>>
>>
>> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)
>>
>> select COUNT(ET_CD)
>>
>> from TBL_SHA
>>
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>>
>> and TBL_SHA.ETRYS in
>>
>>(select TBL_INF.RY_CD
>>
>> from TBL_INF
>>
>> WHERE TBL_INF.MS_CD = 'MLD009'
>>
>>AND TBL_INF.RY_CD = '0001'
>>
>>)
>>
>>
>> - Execution Plan -
>>
>> Limit  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
>> time=128691.521..128717.677 rows=1 loops=1)
>>
>>   Output: (count(tbl_sha.et_cd))
>>
>>   Buffers: shared hit=58948 read=2112758
>>
>>   I/O Timings: read=357249.120
>>
>>   ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
>> time=128691.519..128717.674 rows=1 loops=1)
>>
>> Output: count(tbl_sha.et_cd)
>>
>> Buffers: shared hit=58948 read=2112758
>>
>> I/O Timings: read=357249.120
>>
>> ->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) 
>> (actual time=2.364..128350.279 rows=2613500 loops=1)
>>
>>   Output: tbl_sha.et_cd
>>
>>   Buffers: shared hit=58948 read=2112758
>>
>>   I/O Timings: read=357249.120
>>
>>   ->  Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf  
>> (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)
>>
>> Output: tbl_inf.ms_cd, tbl_inf.ry_cd
>>
>> Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND 
>> (tbl_inf.ry_cd = '0001'::bpchar))
>>
>> Heap Fetches: 1
>>
>> Buffers: shared hit=4
>>
>>   ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) 
>> (actual time=2.315..127773.087 rows=2613500 loops=1)
>>
>> Output: tbl_sha.et_cd, tbl_sha.etrys
>>
>> Workers Planned: 2
>>
>> Workers Launched: 2
>>
>> Buffers: shared hit=58944 read=2112758
>>
>> I/O Timings: read=357249.120
>>
>> ->  Parallel Seq Scan on mtpdb.tbl_sha  
>> (cost=0.00..2454183.88 rows=1012745 width=18) (actual 
>> time=952.728..127583.089 rows=871167 loops=3)
>>
>>   Output: tbl_sha.et_cd, tbl_sha.etrys
>>
>>   Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND 
>> (tbl_sha.etrys = '0001'::bpchar))
>>
>>   Rows Removed by Filter: 14062278
>>
>>   Buffers: shared hit=58944 read=2112758
>>
>>   I/O Timings: read=357249.120
>>
>>   Worker 0:  actual time=1432.292..127762.181 
>> rows=988036 loops=1
>>
>> Buffers: shared hit=17875 read=706862
>>
>> I/O Timings: read=119193.744
>>
>>   Worker 1:  actual time=1425.878..127786.777 
>> rows=992381 loops=1
>>
>> Buffers: shared hit=19813 read=706359
>>
>> I/O Timings: read=119386.899
>>
>> Planning:
>>
>>   Buffers: shared hit=42
>>
>> Planning Time: 1.024 ms
>>
>> Execution Time: 128717.731 ms
>>
>>
>
>I think the whole query can just:
>select COUNT(ET_CD)
>from TBL_SHA
>WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '0001';
>
>> and TBL_SHA.ETRYS in
>>(select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = 'MLD009'
>>AND TBL_INF.RY_CD = '0001'
>>)
>
>if subquery after IN clause part, no rows returned then the whole
>query would return zero row.
>if many duplicates rows returned, then there is no point of evaluate
>something like {1 in (1,1,1,1,1,1,)}


Re: How to improve the performance of my SQL query?

2023-07-24 Thread gzh



>I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...

>Or do you have a version that is too old for SETTINGS?

Sorry. Please refer to the following execution plan.




EXPLAIN (ANALYZE, BUFFERS, SETTINGS) 

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

   (select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

   AND TBL_INF.RY_CD = '0001'

   )




- Execution Plan -

  Buffers: shared hit=13 read=2171693

  I/O Timings: read=365863.877

  ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
time=131595.624..131599.529 rows=1 loops=1)

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) 
(actual time=2.341..131256.445 rows=2513500 loops=1)

  Buffers: shared hit=13 read=2171693

  I/O Timings: read=365863.877

  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=1.471..1.476 rows=1 loops=1)

Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=2 read=2

I/O Timings: read=1.412

  ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) 
(actual time=0.866..130696.440 rows=2513500 loops=1)

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=11 read=2171691

I/O Timings: read=365862.464

->  Parallel Seq Scan on tbl_sha  (cost=0.00..2454183.88 
rows=1012745 width=18) (actual time=0.215..130476.981 rows=837833 loops=3)

  Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = 
'0001'::bpchar))

  Rows Removed by Filter: 13728945

  Buffers: shared hit=11 read=2171691

  I/O Timings: read=365862.464

Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = 
'"$user", mdb'

Planning:

  Buffers: shared hit=167 read=7

  I/O Timings: read=2.735

Planning Time: 3.733 ms

Execution Time: 131599.594 ms











At 2023-07-24 23:58:50, "Laurenz Albe"  wrote:
>On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:
>> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) 
>
>I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
>Or do you have a version that is too old for SETTINGS?
>
>One other idea: check if the index is INVALID (this will
>be visible if you run "\d tablenane" in "psql").
>Invalid indexes won't be used.
>
>Yours,
>Laurenz Albe


Re: How to improve the performance of my SQL query?

2023-07-24 Thread Laurenz Albe
On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:
> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) 

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
Or do you have a version that is too old for SETTINGS?

One other idea: check if the index is INVALID (this will
be visible if you run "\d tablenane" in "psql").
Invalid indexes won't be used.

Yours,
Laurenz Albe




Re: How to improve the performance of my SQL query?

2023-07-24 Thread jian he
On Mon, Jul 24, 2023 at 5:54 PM gzh  wrote:
>
> >Did you change any parameters that have an impact on query planning?
>
> >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
>
> I added some parameters and re-executed the Execution Plan.
>
> Except for the index not taking effect, I still don't know the reason why the 
> index is not working.
>
> Is it because there is too much data that meets the conditions?
>
>
> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)
>
> select COUNT(ET_CD)
>
> from TBL_SHA
>
> WHERE TBL_SHA.MS_CD = 'MLD009'
>
> and TBL_SHA.ETRYS in
>
>(select TBL_INF.RY_CD
>
> from TBL_INF
>
> WHERE TBL_INF.MS_CD = 'MLD009'
>
>AND TBL_INF.RY_CD = '0001'
>
>)
>
>
> - Execution Plan -
>
> Limit  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
> time=128691.521..128717.677 rows=1 loops=1)
>
>   Output: (count(tbl_sha.et_cd))
>
>   Buffers: shared hit=58948 read=2112758
>
>   I/O Timings: read=357249.120
>
>   ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
> time=128691.519..128717.674 rows=1 loops=1)
>
> Output: count(tbl_sha.et_cd)
>
> Buffers: shared hit=58948 read=2112758
>
> I/O Timings: read=357249.120
>
> ->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) 
> (actual time=2.364..128350.279 rows=2613500 loops=1)
>
>   Output: tbl_sha.et_cd
>
>   Buffers: shared hit=58948 read=2112758
>
>   I/O Timings: read=357249.120
>
>   ->  Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf  
> (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)
>
> Output: tbl_inf.ms_cd, tbl_inf.ry_cd
>
> Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND 
> (tbl_inf.ry_cd = '0001'::bpchar))
>
> Heap Fetches: 1
>
> Buffers: shared hit=4
>
>   ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) 
> (actual time=2.315..127773.087 rows=2613500 loops=1)
>
> Output: tbl_sha.et_cd, tbl_sha.etrys
>
> Workers Planned: 2
>
> Workers Launched: 2
>
> Buffers: shared hit=58944 read=2112758
>
> I/O Timings: read=357249.120
>
> ->  Parallel Seq Scan on mtpdb.tbl_sha  
> (cost=0.00..2454183.88 rows=1012745 width=18) (actual 
> time=952.728..127583.089 rows=871167 loops=3)
>
>   Output: tbl_sha.et_cd, tbl_sha.etrys
>
>   Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND 
> (tbl_sha.etrys = '0001'::bpchar))
>
>   Rows Removed by Filter: 14062278
>
>   Buffers: shared hit=58944 read=2112758
>
>   I/O Timings: read=357249.120
>
>   Worker 0:  actual time=1432.292..127762.181 
> rows=988036 loops=1
>
> Buffers: shared hit=17875 read=706862
>
> I/O Timings: read=119193.744
>
>   Worker 1:  actual time=1425.878..127786.777 
> rows=992381 loops=1
>
> Buffers: shared hit=19813 read=706359
>
> I/O Timings: read=119386.899
>
> Planning:
>
>   Buffers: shared hit=42
>
> Planning Time: 1.024 ms
>
> Execution Time: 128717.731 ms
>
>

I think the whole query can just:
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '0001';

> and TBL_SHA.ETRYS in
>(select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = 'MLD009'
>AND TBL_INF.RY_CD = '0001'
>)

if subquery after IN clause part, no rows returned then the whole
query would return zero row.
if many duplicates rows returned, then there is no point of evaluate
something like {1 in (1,1,1,1,1,1,)}




Re: How to improve the performance of my SQL query?

2023-07-24 Thread gzh
>Did you change any parameters that have an impact on query planning?

>You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

I added some parameters and re-executed the Execution Plan.

Except for the index not taking effect, I still don't know the reason why the 
index is not working. 

Is it because there is too much data that meets the conditions?




EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) 

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

   (select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

   AND TBL_INF.RY_CD = '0001'

   )




- Execution Plan -

Limit  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
time=128691.521..128717.677 rows=1 loops=1)

  Output: (count(tbl_sha.et_cd))

  Buffers: shared hit=58948 read=2112758

  I/O Timings: read=357249.120

  ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual 
time=128691.519..128717.674 rows=1 loops=1)

Output: count(tbl_sha.et_cd)

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) 
(actual time=2.364..128350.279 rows=2613500 loops=1)

  Output: tbl_sha.et_cd

  Buffers: shared hit=58948 read=2112758

  I/O Timings: read=357249.120

  ->  Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)

Output: tbl_inf.ms_cd, tbl_inf.ry_cd

Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND 
(tbl_inf.ry_cd = '0001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=4

  ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) 
(actual time=2.315..127773.087 rows=2613500 loops=1)

Output: tbl_sha.et_cd, tbl_sha.etrys

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

->  Parallel Seq Scan on mtpdb.tbl_sha  
(cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 
rows=871167 loops=3)

  Output: tbl_sha.et_cd, tbl_sha.etrys

  Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND 
(tbl_sha.etrys = '0001'::bpchar))

  Rows Removed by Filter: 14062278

  Buffers: shared hit=58944 read=2112758

  I/O Timings: read=357249.120

  Worker 0:  actual time=1432.292..127762.181 
rows=988036 loops=1

Buffers: shared hit=17875 read=706862

I/O Timings: read=119193.744

  Worker 1:  actual time=1425.878..127786.777 
rows=992381 loops=1

Buffers: shared hit=19813 read=706359

I/O Timings: read=119386.899

Planning:

  Buffers: shared hit=42

Planning Time: 1.024 ms

Execution Time: 128717.731 ms






















At 2023-07-24 13:43:46, "Laurenz Albe"  wrote:
>On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:
>> The definitions of the columns used in SQL are as follows.
>> 
>> TBL_SHA
>> 
>> ms_cd character(6) NOT NULL   -- PRIMARY KEY
>> et_cd character(8)
>> etrys character(8)
>> 
>> TBL_INF
>> 
>> ms_cd character(6) NOT NULL   -- PRIMARY KEY
>> ry_cd character(8) NOT NULL   -- PRIMARY KEY
>> 
>> I made some modifications to the data, and I realized that I should not 
>> change the length of the data. 
>> The actual data and its corresponding execution plan are shown below.
>> 
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>> and TBL_SHA.ETRYS in
>>(select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = 'MLD009'
>>AND TBL_INF.RY_CD = '0001'
>>)
>> - Execution Plan -
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.767..124168.769 rows=1 loops=1)
>> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) 
>> (actual time=97264.166..123920.769 rows=320 loops=1)
>>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
>> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
>> '0001'::bpchar))
>> Heap Fetches: 1
>>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
>> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
>> Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = 
>> 

Re: How to improve the performance of my SQL query?

2023-07-23 Thread Laurenz Albe
On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:
> The definitions of the columns used in SQL are as follows.
> 
> TBL_SHA
> 
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
> 
> TBL_INF
> 
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> ry_cd character(8) NOT NULL       -- PRIMARY KEY
> 
> I made some modifications to the data, and I realized that I should not 
> change the length of the data. 
> The actual data and its corresponding execution plan are shown below.
> 
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = 'MLD009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = 'MLD009'
>    AND TBL_INF.RY_CD = '0001'
>    )
> - Execution Plan -
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.767..124168.769 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
> time=97264.166..123920.769 rows=320 loops=1)
>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>                     Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
> '0001'::bpchar))
>                     Heap Fetches: 1
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
>                     Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = 
> '0001'::bpchar))
>                     Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms

Thanks.  That should definitely use a b-tree index defined on (ms_cd, etrsys).

Did you change any parameters that have an impact on query planning?
You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

Yours,
Laurenz Albe




Re: How to improve the performance of my SQL query?

2023-07-20 Thread gzh
The definitions of the columns used in SQL are as follows.




TBL_SHA




ms_cd character(6) NOT NULL   -- PRIMARY KEY

et_cd character(8)

etrys character(8)




TBL_INF




ms_cd character(6) NOT NULL   -- PRIMARY KEY

ry_cd character(8) NOT NULL   -- PRIMARY KEY




I made some modifications to the data, and I realized that I should not change 
the length of the data. 

The actual data and its corresponding execution plan are shown below.




explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

   (select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

   AND TBL_INF.RY_CD = '0001'

   )

- Execution Plan -

Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
time=124168.769..124168.771 rows=1 loops=1)

  ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
time=124168.767..124168.769 rows=1 loops=1)

->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
time=97264.166..123920.769 rows=320 loops=1)

  ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
(cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
'0001'::bpchar))

Heap Fetches: 1

  ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
width=18) (actual time=97264.138..123554.792 rows=320 loops=1)

Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = 
'0001'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms




















At 2023-07-20 22:48:19, "Laurenz Albe"  wrote:
>On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:
>> The information I provided is incorrect, please see my previous reply.
>
>My question remains: I would like to see the table definitions.
>
>Also, did you ANALYZE the tables?
>
>Yours,
>Laurenz Albe


Re: Re: How to improve the performance of my SQL query?

2023-07-20 Thread jian he
On Thu, Jul 20, 2023 at 7:36 PM gzh  wrote:
>
>
> Thank you very much for taking the time to reply to my question.
>
>
> Sorry, I provided incorrect information.
>
> The index also does not work in the following query statement.
>
>
> > select COUNT(ET_CD)
>
> > from TBL_SHA
>
> > WHERE MS_CD = '009'
>
> > AND ETRYS = '01'
>
>
> QUERY PLAN
>
> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual 
> time=128667.439..128668.250 rows=1 loops=1)
>
>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) 
> (actual time=128667.437..128668.246 rows=1 loops=1)
>
> ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual 
> time=128664.108..128668.233 rows=3 loops=1)
>
>   Workers Planned: 2
>
>   Workers Launched: 2
>
>   ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 
> width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
>
> ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 
> rows=1237762 width=9) (actual time=75357.455..128531.615 rows=107 loops=3)
>
>   Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
>
>   Rows Removed by Filter: 11833442
>
> Planning Time: 0.118 ms
>
> Execution Time: 128668.290 ms
>
>
> The TBL_SHA table has another index, as shown below.
>
>
> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
>
> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, 
> BK_CD, FR_CD, RM_CD)
>

>   Rows Removed by Filter: 11833442
select (38700325 - 11833442) /38700325.0;
is 0.69 approx.
So I think it says around 69%  of rows satisfy the query condition.

but I am not sure in the following 2 cases, whether the actual rows
are noisy or not. I can not find the doc explaining it.
> Partial Aggregate  (actual time=128655.256..128655.258 rows=1 loops=3)
> Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)




Re: How to improve the performance of my SQL query?

2023-07-20 Thread Laurenz Albe
On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:
> The information I provided is incorrect, please see my previous reply.

My question remains: I would like to see the table definitions.

Also, did you ANALYZE the tables?

Yours,
Laurenz Albe




Re: How to improve the performance of my SQL query?

2023-07-20 Thread gzh
Thank you for your reply.
The information I provided is incorrect, please see my previous reply.


>What I cannot see is if the columns are defined as "character" or whether you 
>bind
>the parameters as "character".  Can you show us the table definition of 
>"TBL_SHA"

>and "TBL_INF"?
For information security reasons, I can't provide the table definition, these 
columns are defined as "character".
















At 2023-07-20 19:58:59, "Laurenz Albe"  wrote:
>On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:
>> I'm running into some performance issues with my SQL query.
>> The following SQL query is taking a long time to execute.
>> 
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = '009'
>> and TBL_SHA.ETRYS in
>>(select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = '009'
>>AND TBL_INF.RY_CD = '01'
>>)
>> - Execution Plan -
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.767..124168.769 rows=1 loops=1)
>> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) 
>> (actual time=97264.166..123920.769 rows=320 loops=1)
>>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
>> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = 
>> '01'::bpchar))
>> Heap Fetches: 1
>>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
>> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
>> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
>> '01'::bpchar))
>> Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>> 
>> 
>> The index is defined as follows.
>> 
>> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>
>Actual rows = 320, rows removed by filter is ten times as much.
>It should use an index.
>
>> When I take the following sql statement, the index works fine and the query 
>> is fast.
>> 
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE MS_CD = '009'
>> AND ETRYS = '01'
>> 
>> The amount of data in the table is as follows.
>> TBL_SHA38700325
>> TBL_INF35546
>
>This looks very much like it is a problem with the data types.
>I see that you are using "character", which you shouldn't do.
>
>What I cannot see is if the columns are defined as "character" or whether you 
>bind
>the parameters as "character".  Can you show us the table definition of 
>"TBL_SHA"
>and "TBL_INF"?
>
>Yours,
>Laurenz Albe


Re: How to improve the performance of my SQL query?

2023-07-20 Thread Laurenz Albe
On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:
> I'm running into some performance issues with my SQL query.
> The following SQL query is taking a long time to execute.
> 
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = '009'
>    AND TBL_INF.RY_CD = '01'
>    )
> - Execution Plan -
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.767..124168.769 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
> time=97264.166..123920.769 rows=320 loops=1)
>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>                     Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = 
> '01'::bpchar))
>                     Heap Fetches: 1
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
>                     Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
>                     Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms
> 
> 
> The index is defined as follows.
> 
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

Actual rows = 320, rows removed by filter is ten times as much.
It should use an index.

> When I take the following sql statement, the index works fine and the query 
> is fast.
> 
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '01'
> 
> The amount of data in the table is as follows.
> TBL_SHA    38700325
> TBL_INF    35546

This looks very much like it is a problem with the data types.
I see that you are using "character", which you shouldn't do.

What I cannot see is if the columns are defined as "character" or whether you 
bind
the parameters as "character".  Can you show us the table definition of 
"TBL_SHA"
and "TBL_INF"?

Yours,
Laurenz Albe




Re:Re: How to improve the performance of my SQL query?

2023-07-20 Thread gzh



Thank you very much for taking the time to reply to my question. 




Sorry, I provided incorrect information.

The index also does not work in the following query statement.




> select COUNT(ET_CD)

> from TBL_SHA

> WHERE MS_CD = '009'

> AND ETRYS = '01'




QUERY PLAN

Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual 
time=128667.439..128668.250 rows=1 loops=1)

  ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) (actual 
time=128667.437..128668.246 rows=1 loops=1)

->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual 
time=128664.108..128668.233 rows=3 loops=1)

  Workers Planned: 2

  Workers Launched: 2

  ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 
width=8) (actual time=128655.256..128655.258 rows=1 loops=3)

->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 
rows=1237762 width=9) (actual time=75357.455..128531.615 rows=107 loops=3)

  Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
'01'::bpchar))

  Rows Removed by Filter: 11833442

Planning Time: 0.118 ms

Execution Time: 128668.290 ms




The TBL_SHA table has another index, as shown below.




CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, 
FR_CD, RM_CD)




When I take the following query statement, the result is returned quickly.

Why does index_search_01 always not work?




explain analyze

select TO_CHAR(MAX(TBL_SHA.ET_DAY),'MMDD') AS ET_DAY

from TBL_SHA

WHERE MS_CD = '008'

AND ET_CD = '03'




QUERY PLAN

Limit  (cost=4.11..4.13 rows=1 width=32) (actual time=0.043..0.044 rows=1 
loops=1)

  InitPlan 1 (returns $0)

->  Limit  (cost=0.56..4.11 rows=1 width=8) (actual time=0.032..0.033 
rows=1 loops=1)

  ->  Index Scan Backward using index_search_01 on TBL_SHA  
(cost=0.56..10836962.40 rows=3054052 width=8) (actual time=0.032..0.032 rows=1 
loops=1)

Index Cond: ((MS_CD = '008'::bpchar) AND (ent_day IS NOT NULL))

Filter: (ET_CD = '03'::bpchar)

  ->  Result  (cost=4.11..4.13 rows=1 width=32) (actual time=0.042..0.042 
rows=1 loops=1)

Planning Time: 0.228 ms

Execution Time: 0.070 ms

















At 2023-07-20 16:07:15, "Erik Wienhold"  wrote:
>> On 20/07/2023 09:09 CEST gzh  wrote:
>>
>> I'm running into some performance issues with my SQL query.
>> The following SQL query is taking a long time to execute.
>>
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = '009'
>> and TBL_SHA.ETRYS in
>> (select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = '009'
>> AND TBL_INF.RY_CD = '01'
>> )
>> - Execution Plan -
>> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.769..124168.771 rows=1 loops=1)
>> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.767..124168.769 rows=1 loops=1)
>> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
>> time=97264.166..123920.769 rows=320 loops=1)
>> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 
>> width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '01'::bpchar))
>> Heap Fetches: 1
>> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual 
>> time=97264.138..123554.792 rows=320 loops=1)
>> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '01'::bpchar))
>> Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>> 
>>
>> The index is defined as follows.
>>
>> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>>
>>
>> When I take the following sql statement, the index works fine and the query
>> is fast.
>>
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE MS_CD = '009'
>> AND ETRYS = '01'
>>
>> The amount of data in the table is as follows.
>> TBL_SHA 38700325
>> TBL_INF 35546
>>
>> Any suggestions for improving the performance of the query would be greatly
>> appreciated.
>
>You can try EXISTS instead of IN to correlate the subquery and the outer query:
>
>   SELECT count(et_cd)
>   FROM tbl_sha
>   WHERE tbl_sha.ms_cd = '009'
>   AND tbl_sha.etrys = '01'
>   AND EXISTS (
> SELECT
> FROM tbl_inf
> WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
> AND tbl_inf.ry_cd = tbl_sha.etrys
>   )
>
>--
>Erik


Re: How to improve the performance of my SQL query?

2023-07-20 Thread Peter J. Holzer
On 2023-07-20 15:09:22 +0800, gzh wrote:
> Hi everyone,
> 
> 
> I'm running into some performance issues with my SQL query.
> 
> The following SQL query is taking a long time to execute.
> 
> 
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
>(select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = '009'
>AND TBL_INF.RY_CD = '01'
>)
> 
> - Execution Plan -
[...]
>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18)
>(actual time=97264.138..123554.792 
> rows=320 loops=1)
> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
> Rows Removed by Filter: 32000325
[...]
> 
> 
> 
> The index is defined as follows.
> 
> 
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
> 
> 
> 
> When I take the following sql statement, the index works fine and the query is
> fast.
> 
> 
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '01'

What's the plan for that query?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to improve the performance of my SQL query?

2023-07-20 Thread Erik Wienhold
> On 20/07/2023 09:09 CEST gzh  wrote:
>
> I'm running into some performance issues with my SQL query.
> The following SQL query is taking a long time to execute.
>
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
> (select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = '009'
> AND TBL_INF.RY_CD = '01'
> )
> - Execution Plan -
> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.769..124168.771 rows=1 loops=1)
> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.767..124168.769 rows=1 loops=1)
> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
> time=97264.166..123920.769 rows=320 loops=1)
> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 
> width=9) (actual time=0.025..0.030 rows=1 loops=1)
> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '01'::bpchar))
> Heap Fetches: 1
> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual 
> time=97264.138..123554.792 rows=320 loops=1)
> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '01'::bpchar))
> Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms
> 
>
> The index is defined as follows.
>
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>
>
> When I take the following sql statement, the index works fine and the query
> is fast.
>
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '01'
>
> The amount of data in the table is as follows.
> TBL_SHA 38700325
> TBL_INF 35546
>
> Any suggestions for improving the performance of the query would be greatly
> appreciated.

You can try EXISTS instead of IN to correlate the subquery and the outer query:

SELECT count(et_cd)
FROM tbl_sha
WHERE tbl_sha.ms_cd = '009'
AND tbl_sha.etrys = '01'
AND EXISTS (
  SELECT
  FROM tbl_inf
  WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
  AND tbl_inf.ry_cd = tbl_sha.etrys
)

--
Erik




How to improve the performance of my SQL query?

2023-07-20 Thread gzh
Hi everyone,




I'm running into some performance issues with my SQL query.

The following SQL query is taking a long time to execute.




explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = '009'

and TBL_SHA.ETRYS in

   (select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = '009'

   AND TBL_INF.RY_CD = '01'

   )

- Execution Plan -

Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
time=124168.769..124168.771 rows=1 loops=1)

  ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
time=124168.767..124168.769 rows=1 loops=1)

->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
time=97264.166..123920.769 rows=320 loops=1)

  ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
(cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = 
'01'::bpchar))

Heap Fetches: 1

  ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
width=18) (actual time=97264.138..123554.792 rows=320 loops=1)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
'01'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms






The index is defined as follows.




CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);







When I take the following sql statement, the index works fine and the query is 
fast.




select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '01'




The amount of data in the table is as follows.

TBL_SHA38700325

TBL_INF35546




Any suggestions for improving the performance of the query would be greatly 
appreciated.




Thanks in advance!