[PERFORM] Performance issues with custom functions

Edward Di Geronimo Jr. Wed, 26 Oct 2005 13:37:42 -0700

I currently have an infrastructure that's based around SQL Server 2000.I'm trying to move some of the data over to Postgres, partly to reducethe load on the SQL Server machine and partly because some queries I'dlike to run are too slow to be usuable on SQL Server. Mostly likely overtime more and more data will move to Postgres. To help with thistransition, I created a Postgres plugin which queries the contents ofSQL Server tables via ODBC and returns a recordset. I then create viewsaround the function and I can then read from the SQL Server tables as ifthey were local to Postgres.

I have four tables involved in this query. The major one isprovider_location, which has about 2 million rows and is stored inPostgres. The other three are stored in SQL Server and accessed viaviews like I mentioned above. They are network, network_state, andxlat_tbl, and contain about 40, 250, and 500 rows. A simple select *from any of the views takes somewhere around 50ms.

This query in question was written for SQL Server. I have no idea why itwas written in the form it was, but it ran at a reasonable speed whenall the tables were on one machine. Running the original query (afteradjusting for syntax differences) on Postgres resulted in a query thatwould run for hours, continually allocating more RAM. I eventually hadto kill the process as it was devouring swap space. My assumption isthat Postgres is doing the ODBC query for each row of a join somewhere,even though the function is marked stable (immutable didn't make adifference).

Flattening the query made it run in a few minutes. I think the flattenedquery is easier to read, and it runs faster, so I'm not complaining thatI can't use the original query. But I'd like to know exactly what causesthe bottleneck in the original query, and if there are other approachesto solving the issue in case I need them in future queries.

Below is the original query, the explain output, the modified query, theexplain output, and the explain analyze output.


Ed

select

pl.network_id,n.name asnetwork_name,pl.state_cd,count(pl.state_cd) as provider_count

       from development.provider_location pl,development.network n
       where pl.network_id in (select ns.network_id
                       from development.network_state ns
                         where ns.from_date < current_time

and (ns.thru_date > current_time orns.thru_date is null)

                       and (ns.state_cd = pl.state_cd or ns.state_cd='')
                         )
       and pl.network_id = n.network_id
       and pl.state_cd is not null

and pl.state_cd in (select field_value from development.xlat_tblwhere field_name ='State_CD')

       group by pl.state_cd,n.name,pl.network_id
       order by pl.state_cd,network_name;

Explain:

GroupAggregate  (cost=80548547.83..80549256.80 rows=47265 width=52)
 ->  Sort  (cost=80548547.83..80548665.99 rows=47265 width=52)
       Sort Key: pl.state_cd, odbc_select.name, pl.network_id
       ->  Hash Join  (cost=30.01..80543806.14 rows=47265 width=52)

Hash Cond: (("outer".network_id)::text =("inner".network_id)::text)

             ->  Hash IN Join  (cost=15.01..80540931.61 rows=9453 width=20)

Hash Cond: (("outer".state_cd)::text =("inner".field_value)::text)-> Seq Scan on provider_location pl(cost=0.00..80535150.29 rows=1890593 width=20)

                         Filter: ((state_cd IS NOT NULL) AND (subplan))
                         SubPlan

-> Function Scan on odbc_select(cost=0.00..42.50 rows=2 width=32)Filter: (((from_date)::text <(('now'::text)::time(6) with time zone)::text) AND (((thru_date)::text >(('now'::text)::time(6) with time zone)::text) OR (thru_date IS NULL))AND (((state_cd)::text = ($0)::text) OR ((state_cd)::text = ''::text)))

                   ->  Hash  (cost=15.00..15.00 rows=5 width=32)

-> Function Scan on odbc_select(cost=0.00..15.00 rows=5 width=32)Filter: ((field_name)::text ='State_CD'::text)

             ->  Hash  (cost=12.50..12.50 rows=1000 width=64)

-> Function Scan on odbc_select (cost=0.00..12.50rows=1000 width=64)



Flattened query:

select
   pl.network_id,
   n.name as network_name,
   pl.state_cd,
   count(pl.state_cd) as provider_count
from
   development.network n,
   development.network_state ns,
   development.xlat_tbl xt,
   development.provider_location pl
where
   xt.field_name = 'State_CD'
   and n.network_id = ns.network_id
   and ns.from_date < current_timestamp
   and (ns.thru_date > current_timestamp or ns.thru_date is null)
   and (ns.state_cd = pl.state_cd or ns.state_cd='')
   and pl.network_id = n.network_id
   and pl.state_cd is not null
   and pl.state_cd = xt.field_value
group by
   pl.state_cd,
   n.name,
   pl.network_id
order by
   pl.state_cd,
   network_name;

Explain:

GroupAggregate  (cost=190089.94..190129.90 rows=2664 width=52)
 ->  Sort  (cost=190089.94..190096.60 rows=2664 width=52)
       Sort Key: pl.state_cd, odbc_select.name, pl.network_id
       ->  Merge Join  (cost=189895.73..189938.37 rows=2664 width=52)
             Merge Cond: ("outer"."?column4?" = "inner"."?column3?")
             ->  Sort  (cost=189833.40..189834.73 rows=533 width=52)
                   Sort Key: (pl.network_id)::text
                   ->  Hash Join  (cost=42.80..189809.26 rows=533 width=52)

Hash Cond: (("outer".network_id)::text =("inner".network_id)::text)Join Filter: ((("inner".state_cd)::text =("outer".state_cd)::text) OR (("inner".state_cd)::text = ''::text))-> Hash Join (cost=15.01..185908.10rows=94530 width=20)Hash Cond: (("outer".state_cd)::text =("inner".field_value)::text)-> Seq Scan on provider_location pl(cost=0.00..166041.86 rows=3781186 width=20)

                                     Filter: (state_cd IS NOT NULL)

-> Hash (cost=15.00..15.00 rows=5width=32)-> Function Scan on odbc_select(cost=0.00..15.00 rows=5 width=32)Filter: ((field_name)::text= 'State_CD'::text)

                         ->  Hash  (cost=27.50..27.50 rows=113 width=64)

-> Function Scan on odbc_select(cost=0.00..27.50 rows=113 width=64)Filter: ((from_date <('now'::text)::timestamp(6) with time zone) AND ((thru_date >('now'::text)::timestamp(6) with time zone) OR (thru_date IS NULL)))

             ->  Sort  (cost=62.33..64.83 rows=1000 width=64)
                   Sort Key: (odbc_select.network_id)::text

-> Function Scan on odbc_select (cost=0.00..12.50rows=1000 width=64)


Explain Analyze:

"GroupAggregate (cost=190089.94..190129.90 rows=2664 width=52) (actualtime=254757.742..261725.786 rows=350 loops=1)"" -> Sort (cost=190089.94..190096.60 rows=2664 width=52) (actualtime=254757.438..257267.224 rows=1316774 loops=1)"

"        Sort Key: pl.state_cd, odbc_select.name, pl.network_id"

" -> Merge Join (cost=189895.73..189938.37 rows=2664 width=52)(actual time=189325.877..203579.050 rows=1316774 loops=1)"

"              Merge Cond: ("outer"."?column4?" = "inner"."?column3?")"

" -> Sort (cost=189833.40..189834.73 rows=533 width=52)(actual time=189282.504..192284.766 rows=1316774 loops=1)"

"                    Sort Key: (pl.network_id)::text"

" -> Hash Join (cost=42.80..189809.26 rows=533width=52) (actual time=1177.758..151180.472 rows=1316774 loops=1)"" Hash Cond: (("outer".network_id)::text =("inner".network_id)::text)"" Join Filter: ((("inner".state_cd)::text =("outer".state_cd)::text) OR (("inner".state_cd)::text = ''::text))"" -> Hash Join (cost=15.01..185908.10rows=94530 width=20) (actual time=1095.949..50495.766 rows=1890457 loops=1)"" Hash Cond: (("outer".state_cd)::text =("inner".field_value)::text)"" -> Seq Scan on provider_location pl(cost=0.00..166041.86 rows=3781186 width=20) (actualtime=1071.011..36224.961 rows=1891183 loops=1)"

"                                      Filter: (state_cd IS NOT NULL)"

" -> Hash (cost=15.00..15.00 rows=5width=32) (actual time=24.832..24.832 rows=0 loops=1)"" -> Function Scan on odbc_select(cost=0.00..15.00 rows=5 width=32) (actual time=24.469..24.724 rows=51loops=1)"" Filter: ((field_name)::text= 'State_CD'::text)"" -> Hash (cost=27.50..27.50 rows=113width=64) (actual time=81.684..81.684 rows=0 loops=1)"" -> Function Scan on odbc_select(cost=0.00..27.50 rows=113 width=64) (actual time=75.288..81.200rows=211 loops=1)"" Filter: ((from_date <('now'::text)::timestamp(6) with time zone) AND ((thru_date >('now'::text)::timestamp(6) with time zone) OR (thru_date IS NULL)))"" -> Sort (cost=62.33..64.83 rows=1000 width=64) (actualtime=43.301..1258.901 rows=1289952 loops=1)"

"                    Sort Key: (odbc_select.network_id)::text"

" -> Function Scan on odbc_select (cost=0.00..12.50rows=1000 width=64) (actual time=43.010..43.109 rows=34 loops=1)"

"Total runtime: 261902.966 ms"


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[PERFORM] Performance issues with custom functions

Reply via email to