[PERFORM] Seq scan on zero-parameters function

2004-02-06 Thread Octavio Alvarez
Hi!

   I'd like to know if this is expected behavior. These are two couples of
queries. In each couple, the first one has a WHERE field = function()
condition, just like the second one, but in the form WHERE field =
(SELECT function()). In my opinion, both should have the same execution
plan, as the function has no parameters and, therefore, is constant.

   I'm concerned about this, because the second form looks like a workaround.

*** TESTED IN: PostgreSQL 7.4.1 on i686-pc-cygwin ***

pgdb=# explain analyze select count(*) from t_students where period =
(select current_period_id());
 QUERY
PLAN
---
 Aggregate  (cost=127.84..127.84 rows=1 width=0) (actual time=1.000..1.000
rows=1 loops=1)
   InitPlan
 -  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=1.000..1.000 rows=1 loops=1)
   -  Index Scan using i_t_students__period on t_students 
(cost=0.00..127.71 rows=44 width=0) (actual time=1.000..1.000 rows=21
loop=1)
 Index Cond: (period = $0)
 Total runtime: 1.000 ms
(6 rows)

pgdb=# explain analyze select count(*) from t_students where period =
(select current_period_id());
 QUERY
PLAN
---
 Aggregate  (cost=127.84..127.84 rows=1 width=0) (actual time=1.000..1.000
rows=1 loops=1)
   InitPlan
 -  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=1.000..1.000 rows=1 loops=1)
   -  Index Scan using i_t_students__period on t_students 
(cost=0.00..127.71 rows=44 width=0) (actual time=1.000..1.000 rows=21
loop=1)
 Index Cond: (period = $0)
 Total runtime: 1.000 ms
(6 rows)

pgdb=# select version();
version
---
 PostgreSQL 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1
(cygming special)
(1 row)

pgdb=#

*** TESTED IN: PostgreSQL 7.3.4 on i386-redhat-linux-gnu ***

pgdb=# explain analyze select count(*) from t_students where period =
current_period_id();
  QUERY PLAN
---
 Aggregate  (cost=182.32..182.32 rows=1 width=0) (actual
time=49077.38..49077.38 rows=1 loops=1)
   -  Seq Scan on t_students  (cost=0.00..182.22 rows=43 width=0) (actual
time=17993.89..49077.13 rows=21 loops=1)
 Filter: (period = current_period_id())
 Total runtime: 49077.61 msec
(4 rows)

pgdb=# explain analyze select count(*) from t_students where period =
(select current_period_id());
  QUERY
PLAN
--
 Aggregate  (cost=125.19..125.19 rows=1 width=0) (actual
time=131.59..131.60 rows=1 loops=1)
   InitPlan
 -  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=41.05..41.06 rows=1 loops=1)
   -  Index Scan using i_t_students__period on t_students 
(cost=0.00..125.08 rows=43 width=0) (actual time=131.28..131.48 rows=21
loops=1)
 Index Cond: (period = $0)
 Total runtime: 131.95 msec
(6 rows)

pgdb=# select version();
 version
-
 PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC 2.96
(1 row)




-- 
Octavio Alvarez.
E-mail: [EMAIL PROTECTED]

Agradezco que sus correos sean enviados siempre a esta dirección.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Seq scan on zero-parameters function

2004-02-06 Thread Octavio Alvarez

Tomasz Myrta said:
 Dnia 2004-02-06 08:19, U¿ytkownik Octavio Alvarez napisa³:
 In each couple, the first one has a WHERE field = function()
 condition, just like the second one, but in the form WHERE field =
 (SELECT function()). In my opinion, both should have the same execution
 plan, as the function has no parameters and, therefore, is constant.

 Nope.

 What would you say about function without params returning timeofday()?
 Is it constant?

No... :-P ;-)

 If you are sure, that your function returns constant value - declare it
 as IMMUTABLE. (look at CREATE FUNCTION documentation)

Thanks for the hint.

In fact, my current_period_id() is based on time, but it should be
constant along the query execution. I mean, I don't want some records
filtered with some values and other with other values... I'll have an
uncongruent recordset.

Say SELECT [field-list] FROM [complex-join] WHERE sec = datepart('second',
now()); Now suppose the query takes always more than 1 second because of
the complex-join or whatever reason: I will naver have a congruent
recordset.

IMMUTABLE wouldn't help here, only wrapping the function in a subquery. Is
this expected behavior? Is this standards compliant (if it can be
qualified as such)?

Octavio.

-- 
Octavio Alvarez.
E-mail: [EMAIL PROTECTED]

Agradezco que sus correos sean enviados siempre a esta dirección.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?

2004-02-06 Thread Tom Lane
Bruno BAGUETTE [EMAIL PROTECTED] writes:
 Do you see a way to get better performances with this query which takes
 currently 655.07 msec to be done.

 levure= explain analyze SELECT distinct lower(substr(l_name, 1, 1)) AS
 initiale FROM people
 levure- UNION
 levure- SELECT distinct lower(substr(org_name, 1, 1)) AS initiale FROM
 organizations
 levure- ORDER BY initiale;

This is inherently a bit inefficient since the UNION implies a DISTINCT
step, thus partially repeating the DISTINCT work done inside each SELECT.
It would likely be a tad faster to drop the DISTINCTs from the
subselects and rely on UNION to do the filtering.  However, you're still
gonna have a big SORT/UNIQUE step.

As of PG 7.4 you could probably get a performance win by converting the
thing to use GROUP BY instead of DISTINCT or UNION:

select initiale from (
  select lower(substr(l_name,1,1)) as initiale from people
  union all
  select lower(substr(org_name,1,1)) as initiale from organizations
) ss
group by initiale order by initiale;

This should use a HashAggregate to do the unique-ification.  I think
that will be faster than Sort/Unique.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Increase performance of a UNION query that thakes

2004-02-06 Thread Stephan Szabo

On Fri, 6 Feb 2004, Bruno BAGUETTE wrote:

 I was thinking that a index on lower(substr(l_name, 1, 1)) and another
 index on lower(substr(org_name, 1, 1)) should gives better performances.
 When I've to create theses two indexes, it seems like this is not
 allowed :

 levure= CREATE INDEX firstchar_lastname_idx ON
 people(lower(substr(l_name, 1, 1)));
 ERROR:  parser: parse error at or near ( at character 59

In 7.4, I believe you would say
 on people((lower(substr(l_name,1,1
but I'm not sure that index would really help in practice.

 Do you have another idea to get better performances ?

In addition to what Tom said, the row estimates look suspiciously default.
You mention vacuuming, but do you ever analyze the tables?

Also, what do you have sort_mem set to?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


RE : [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?

2004-02-06 Thread Bruno BAGUETTE
re-Hello,

As suggested by Tom, I've removed the distinct and tried it's query :

levure= explain analyze select initiale from (
levure(   select lower(substr(l_name,1,1)) as initiale from people
levure(   union all
levure(   select lower(substr(org_name,1,1)) as initiale from
organizations
levure( ) ss
levure- group by initiale order by initiale;
   QUERY
PLAN 


-
 Group  (cost=1018.48..1074.32 rows=1117 width=17) (actual
time=783.47..867.61 rows=39 loops=1)
   -  Sort  (cost=1018.48..1046.40 rows=11167 width=17) (actual
time=782.18..801.68 rows=11167 loops=1)
 Sort Key: initiale
 -  Subquery Scan ss  (cost=0.00..267.67 rows=11167 width=17)
(actual time=0.23..330.31 rows=11167 loops=1)
   -  Append  (cost=0.00..267.67 rows=11167 width=17)
(actual time=0.22..263.69 rows=11167 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..87.93
rows=4093 width=15) (actual time=0.22..79.51 rows=4093 loops=1)
   -  Seq Scan on people  (cost=0.00..87.93
rows=4093 width=15) (actual time=0.20..53.82 rows=4093 loops=1)
 -  Subquery Scan *SELECT* 2  (cost=0.00..179.74
rows=7074 width=17) (actual time=0.24..146.12 rows=7074 loops=1)
   -  Seq Scan on organizations
(cost=0.00..179.74 rows=7074 width=17) (actual time=0.23..100.70
rows=7074 loops=1)
 Total runtime: 874.79 msec
(10 rows)


That seems to be 200 msec longer that my first query... Indeed, I've
noticed something strange : now, if I rerun my initial query, I get
worse runtime than this morning :


levure= EXPLAIN ANALYZE SELECT lower(substr(l_name, 1, 1)) AS initiale
FROM people
levure- UNION
levure- SELECT lower(substr(org_name, 1, 1)) AS initiale FROM
organizations
levure- ORDER BY initiale;
   QUERY
PLAN  


 Sort  (cost=1130.85..1133.64 rows=1117 width=17) (actual
time=802.52..802.58 rows=39 loops=1)
   Sort Key: initiale
   -  Unique  (cost=1018.48..1074.32 rows=1117 width=17) (actual
time=712.04..801.83 rows=39 loops=1)
 -  Sort  (cost=1018.48..1046.40 rows=11167 width=17) (actual
time=712.03..732.63 rows=11167 loops=1)
   Sort Key: initiale
   -  Append  (cost=0.00..267.67 rows=11167 width=17)
(actual time=0.21..263.54 rows=11167 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..87.93
rows=4093 width=15) (actual time=0.20..80.47 rows=4093 loops=1)
   -  Seq Scan on people  (cost=0.00..87.93
rows=4093 width=15) (actual time=0.19..54.14 rows=4093 loops=1)
 -  Subquery Scan *SELECT* 2  (cost=0.00..179.74
rows=7074 width=17) (actual time=0.28..144.82 rows=7074 loops=1)
   -  Seq Scan on organizations
(cost=0.00..179.74 rows=7074 width=17) (actual time=0.27..99.06
rows=7074 loops=1)
 Total runtime: 806.47 msec
(11 rows)


I don't understand why this runtime has changed because no data has been
added/updated/deleted since several weeks (I'm working on a copy of the
production database. And this copy is not accessible for the users).

My PostgreSQL version is PostgreSQL 7.3.2, I have to ask to the
administrator if it can be upgraded to 7.4 in the production server.

Thanks in advance for your help.

---
Bruno BAGUETTE - [EMAIL PROTECTED] 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


RE : [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?

2004-02-06 Thread Bruno BAGUETTE
 In addition to what Tom said, the row estimates look 
 suspiciously default. You mention vacuuming, but do you ever 
 analyze the tables?

I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL
databases on the server, twice a day, sometimes more.

 Also, what do you have sort_mem set to?

[EMAIL PROTECTED] data]# cat postgresql.conf | grep sort_mem
sort_mem = 6144 # min 64, size in KB 

Do you think I should increase that value ?

It's not so easy to do a good setup of that postgresql.conf file, is
there any tool that suggests some values for that ?

Thanks in advance for your tips :-)

---
Bruno BAGUETTE - [EMAIL PROTECTED] 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: RE : [PERFORM] Increase performance of a UNION query that thakes

2004-02-06 Thread Stephan Szabo
On Fri, 6 Feb 2004, Bruno BAGUETTE wrote:

  In addition to what Tom said, the row estimates look
  suspiciously default. You mention vacuuming, but do you ever
  analyze the tables?

 I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL
 databases on the server, twice a day, sometimes more.

Wierd, because you're getting 1000 estimated on both people and
organizations.  What does pg_class have to say about those two tables?

  Also, what do you have sort_mem set to?

 [EMAIL PROTECTED] data]# cat postgresql.conf | grep sort_mem
 sort_mem = 6144 # min 64, size in KB

 Do you think I should increase that value ?

Hmm, I'd expect that the sort would fit in that space in general.  If you
want to try different values, you can set sort_mem from psql rather than
changing the configuration file.



On my machine the index does actually help, although I needed to lower
random_page_cost a little from its default of 4 to get it to use it
preferentially, but I'm also getting times about 1/3 of yours (and my
machine is pretty poor) so I think I may not have data that matches yours
very well.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] COPY with INDEXES question

2004-02-06 Thread Rod Taylor
On Thu, 2004-02-05 at 19:46, Slavisa Garic wrote:
 Hi,
 
 I have a quick question. In order to speed up insertion of large number of
 rows (100s of thousands) I replaced the INSERT with the COPY. This works
 fine but one question popped into my mind. Does copy updates indexes on
 that table if there are some defined?

Copy does nearly everything that standard inserts to. RULES are the only
thing that come to mind. Triggers, indexes, constraints, etc. are all
applied.

-- 
Rod Taylor rbt [at] rbt [dot] ca

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?

2004-02-06 Thread Jan Wieck
Mike Nolan wrote:
Seriously, I am tired of this kind of question. You gotta get bold 
enough to stand up in a meeting like that, say guy's, you can ask me 
how this compares to Oracle ... but if you're seriously asking me how 
this compares to MySQL, call me again when you've done your homework.
Can they call you at the unemployment office?
It might not work with the words I used above, but the point I tried to 
make is that the hardest thing you can sell is a no. I mean, not 
just saying no, but selling it in a way that the customer will not go 
with the next idiot who claims we can do that.

If the customer has a stupid idea, like envisioning an enterprise 
solution based on ImSOL, there is no way you will be able to deliver it. 
Paying customer or not, you will fail if you bow to their strategic 
decisions and ignore knowing that the stuff they want to use just 
doesn't fit.

That is absolutely not ImSOL specific. If someone comes to me and asks 
for a HA scenario with zero transaction loss during failover, we can 
discuss a little if this is really what he needs or not, but if he needs 
that, the solution will be Oracle or DB2, for sure I will not claim that 
PostgreSQL can do that, because it cannot.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Database conversion woes...

2004-02-06 Thread Kevin Carpenter
First just wanted to say thank you all for the quick and helpful 
answers.  With all the input I know I am on the right track.  With that 
in mind I created a perl script to do my migrations and to do it based 
on moving from a db name to a schema name.  I had done alot of the 
reading on converting based on the miss match of data types that MySQL 
likes to use.  I must say it is VERY nice having a intelligent system 
that say won't let a date of '-00-00' be entered.  Luckily I didn't 
have to deal with any enumerations.

So the conversion goes on.  I will definitely be back and forth in here 
as I get the new queries written and start migrating all I can back into 
the pg backend using plpgsql or c for the stored procedures where 
required.  The mammoth replicator has been working well.  I had tried 
the pgsql-r and had limited success with it, and dbmirror was just 
taking to long having to do 4 db transactions just to mirror one 
command.  I have eserv but was never really a java kind of guy.

Alright then - back to my code.  Again thanks for the help and info.

Kevin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Why is query selecting sequential?

2004-02-06 Thread Josh Berkus
Karl,

  SubPlan
-  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8)
  Filter: ((login = '%s'::text) AND (forum = '%s'::text) AND 
(number = $0))

 Why is the subplan using a sequential scan?  At minimum the index on the 
 post number (forumlog_number) should be used, no?  What would be even
 better would be a set of indices that allow at least two (or even all three)
 of the keys in the inside SELECT to be used.

It's using a seq scan because you have only 1 row in the table. Don't 
bother testing performance before your database is populated.

PostgreSQL doesn't just use an index because it's there; it uses and index 
because it's faster than not using one.

If there is more than one row in the table, then:
1) run ANALYZE forumlog;
2) Send us the EXPLAIN ANALYZE, not just the explain for the query.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Why is query selecting sequential?

2004-02-06 Thread Josh Berkus
Karl,

Well, still with only 5 rows in the forumlog table you're not going get 
realistic results compared to a loaded database.  However, you are making 
things difficult for the parser with awkward query syntax; what you currently 
have encourages a sequential loop.

If there are potentially several rows in forumlog for each row in post, then 
your query won't work either.

 akcs= explain analyze select forum, (replied  (select lastview from 
forumlog where forumlog.login='genesis' and forumlog.forum='General' and 
number=post.number)) as newflag, * from post where forum = 'General' and 
toppost = 1 order by pinned desc, replied desc;   

Instead:

if only one row in forumlog per row in post:

SELECT (replied  lastview) AS newflag, post.* 
FROM post, forumlog
WHERE post.forum = 'General' and toppost = 1 and forumlog.login = 'genesis'
and forumlog.forum='General' and forumlog.number=post.number;

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 7.3 vs 7.4 performance

2004-02-06 Thread Orion Henry
On Fri, 2004-02-06 at 02:43, Hannu Krosing wrote:
 Orion Henry kirjutas N, 05.02.2004 kell 07:16:
  I've done some testing of 7.3.4 vs 7.4.1 and found 7.4.1 to be 20%-30%
  slower than 7.3.4.  Is this common knowledge or am I just unlucky with
  my query/data selection?
  
  Things of note that might matter: the machine is a dual Opteron 1.4GHz
  running Fedora Core 1 Test 1 for X86_64.  The 7.3.4 was from the Fedora
  distro and the 7.4.1 was the PGDG package.
 
 Are you sure that it is not the case that it is not tha case that 7.3.4
 is 64 bit and the PGDG package is 32 ?

Yes sure... I don't know if they were compiled with differing
optimizations or compilers though...

  The database is 3.5 Gigs with 10 millions rows and the machine had 1 Gig or ram.
  
  Oh... as a side note I'm happy to announce that the 2.6 Linux kernel has
  more than DOUBLED the speed of all my Postgres queries over the 2.4. =)
 
 Is this on this same hardware ?

No.  I havent gotten the 2.6 kernel working on the Opteron yet.  The 2x speedup 
was on a dual Athlon 2GHz.



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] 7.3 vs 7.4 performance

2004-02-06 Thread Orion Henry




On Wed, 2004-02-04 at 21:27, Josh Berkus wrote:

Orion,

 I've done some testing of 7.3.4 vs 7.4.1 and found 7.4.1 to be 20%-30%
 slower than 7.3.4.  Is this common knowledge or am I just unlucky with
 my query/data selection?

No, it's not common knowledge.  It should be the other way around.   Perhaps 
it's the queries you picked?   Even so .  feel free to post individual 
EXPLAIN ANALYZEs to the list.


Thank you...

Here's one good example of 7.3 beating 7.4 soundly:
Again this could me some compile option since I built the 7.4 RPM 
from source and I got the 7.3 from Fedora or something to
do with the Opteron architecture. (Yes the compiled postgres
is 64 bit)

SELECT cid,media_name,media_type,count(*) as count,sum(a_amount) 
as a,sum(case when b_amount  0 then b_amount else 0 end) as b,
sum(case when b_amount  0 then b_amount else 0 end) as c 
FROM transdata JOIN media_info ON (media = media_type) 
WHERE cid = 140100 AND demo is not null 
AND trans_date between date '2004-01-01' 
AND date_trunc('month',date '2004-01-01' + interval '32 days') 
GROUP BY cid,media_name,media_type;

Here's 7.3's time and explain

real 0m34.260s
user 0m0.010s
sys 0m0.000s

---
 Aggregate (cost=7411.88..7415.32 rows=17 width=25)
 - Group (cost=7411.88..7413.60 rows=172 width=25)
 - Sort (cost=7411.88..7412.31 rows=172 width=25)
 Sort Key: transdata.cid, media_info.media_name, transdata.media_type
 - Hash Join (cost=1.22..7405.50 rows=172 width=25)
 Hash Cond: (outer.media_type = inner.media)
 - Index Scan using transdata_date_index on transdata (cost=0.00..7401.27 rows=172 width=14)
 Index Cond: ((trans_date = ('2004-01-01'::date)::timestamp with time zone) AND (trans_date = ('2004-02-01 00:00:00'::timestamp without time zone)::timestamp with time zone))
 Filter: ((cid = 140100) AND (demo IS NOT NULL))
 - Hash (cost=1.18..1.18 rows=18 width=11)
 - Seq Scan on media_info (cost=0.00..1.18 rows=18 width=11)


Here's 7.4's time and explain

real 0m43.052s
user 0m0.000s
sys 0m0.020s

 QUERY PLAN
---
 HashAggregate (cost=8098.26..8098.29 rows=2 width=23)
 - Hash Join (cost=1.22..8095.48 rows=159 width=23)
 Hash Cond: (outer.media_type = inner.media)
 - Index Scan using transdata_date_index on transdata (cost=0.00..8091.87 rows=159 width=14)
 Index Cond: ((trans_date = ('2004-01-01'::date)::timestamp with time zone) AND (trans_date = ('2004-02-01 00:00:00'::timestamp without time zone)::timestamp with time zone))
 Filter: ((cid = 140100) AND (demo IS NOT NULL))
 - Hash (cost=1.18..1.18 rows=18 width=11)
 - Seq Scan on media_info (cost=0.00..1.18 rows=18 width=11)








signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] 7.3 vs 7.4 performance

2004-02-06 Thread Josh Berkus
Orion,

 Here's one good example of 7.3 beating 7.4 soundly:
 Again this could me some compile option since I built the 7.4 RPM 
 from source and I got the 7.3 from Fedora or something to
 do with the Opteron architecture.  (Yes the compiled postgres
 is 64 bit)

Need an EXPLAIN ANALYZE, not just an EXPLAIN.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])