[PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

I have a simple case, selecting on a LIKE where clause over a single
column that has an index on it.  On windows it uses the index - on
linux it does not.  I have exactly the same scema and data in each,
and I have run the necessary analyze commands on both.

Windows is running 8.1.4
Linux is running from RPM postgresql-server-8.1.4-1.FC5.1

There are 1 million rows in the table - a number I would expect to
lower the score of a sequential scan for the planner.  There is an
index on 'c_number'.

On windows I get this:

orderstest=# explain analyze select * from t_order where c_number like '0001%';
  QUERY PLAN
-
Index Scan using t_order_c_number on t_order  (cost=0.00..26.53
rows=928 width=43) (actual time=0.029..2.857 rows=1000 loops=1)
  Index Cond: (((c_number)::text = '0001'::character varying) AND
((c_number)::text  '0002'::character varying))
  Filter: ((c_number)::text ~~ '0001%'::text)
Total runtime: 4.572 ms
(4 rows)

Great - the index is used, and the query is lightning fast.

On Linux I get this:

orderstest=# explain analyze select c_number from t_order where
c_number like '0001%';
 QUERY PLAN
--
Seq Scan on t_order  (cost=0.00..20835.00 rows=983 width=11) (actual
time=1.364..1195.064 rows=1000 loops=1)
  Filter: ((c_number)::text ~~ '0001%'::text)
Total runtime: 1197.312 ms
(3 rows)

I just can't use this level of performance in my application.

On my linux box, the only way I can get it to use the index is to use
the = operator.  If I use anything else, a seq scan is used.

Disabling sequence scans in the config has no effect.  It still does
not use the index for anything other than an = comparison.

Here is a dump of the table description:

orderstest=# \d t_order;
  Table public.t_order
   Column |  Type  | Modifiers
---++---
id| bigint | not null
c_number  | character varying(255) |
customer_id   | bigint |
origincountry_id  | bigint |
destinationcountry_id | bigint |
Indexes:
   t_order_pkey PRIMARY KEY, btree (id)
   t_order_c_number btree (c_number)
   _3 btree (destinationcountry_id)
   _4 btree (origincountry_id)
   _5 btree (customer_id)
Foreign-key constraints:
   fk9efdd3a33dbb666c FOREIGN KEY (destinationcountry_id)
REFERENCES go_country(id)
   fk9efdd3a37d3dd384 FOREIGN KEY (origincountry_id) REFERENCES
go_country(id)
   fk9efdd3a38654c9d3 FOREIGN KEY (customer_id) REFERENCES t_party(id)

That dump is exactly the same on both machines.

The only major difference between the hardware is that the windows
machine has 2gb RAM and a setting of 1 shared memory pages,
whereas the linux machine has 756Mb RAM and a setting of 3000 shared
memory pages (max. shared memory allocation of 32Mb).  I can't see any
other differences in configuration.

Disk throughput on both is reasonable (40Mb/second buffered reads)

Can anyone explain the difference in the planner behaviour on the two
systems, using what appears to be the same version of postgres?

--
Simon Godden

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Unsubscribe

2006-10-04 Thread Luc Delgado

 Please unsubscribe me!  Thank you!


 Also, it would be better to have a message foot saying how to unsubscribe.






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Heikki Linnakangas

simon godden wrote:

The only major difference between the hardware is that the windows
machine has 2gb RAM and a setting of 1 shared memory pages,
whereas the linux machine has 756Mb RAM and a setting of 3000 shared
memory pages (max. shared memory allocation of 32Mb).  I can't see any
other differences in configuration.

You can increase the max shared memory size if you have root access. See

http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS

Scroll down for Linux-specific instructions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

(Sending again because I forgot to reply to all)

On 10/4/06, Heikki Linnakangas [EMAIL PROTECTED] wrote:

You can increase the max shared memory size if you have root access. See

http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS

Scroll down for Linux-specific instructions.


Thanks for the link.

Are you saying that the shared memory size is the issue here?  Please
can you explain how it would cause a seq scan rather than an index
scan.

I would like to understand the issue before making changes.

--
Simon Godden

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

On 10/4/06, Richard Huxton dev@archonet.com wrote:


Issue set enable_seqscan=false and then run your explain analyse. If
your query uses the index, what is the estimated cost? If the estimated
cost is larger than a seq-scan that would indicate your configuration
settings are badly out-of-range.


I did that and it still used seq-scan.



If the index isn't used, then we have problem #3. I think this is what
you are actually seeing. Your locale is something other than C and PG
doesn't know how to use like with indexes. Read up on operator classes
or change your locale.
http://www.postgresql.org/docs/8.1/static/indexes-opclass.html



Aha - that sounds like it - this is the output from locale

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

I guess it cannot determine the collating sequence?

I'm not too familiar with unix locale issues - does this output match
your problem description?

Can you explain how to change my locale to 'C'?  (I'm quite happy for
you to tell me to RTFM, as I know this is not a linux user mailing
list :)

--
Simon Godden

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Richard Huxton

simon godden wrote:

(Sending again because I forgot to reply to all)

On 10/4/06, Heikki Linnakangas [EMAIL PROTECTED] wrote:

You can increase the max shared memory size if you have root access. See

http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS 



Scroll down for Linux-specific instructions.


Thanks for the link.

Are you saying that the shared memory size is the issue here?  Please
can you explain how it would cause a seq scan rather than an index
scan.

I would like to understand the issue before making changes.


It *might* be shared-memory settings. It's almost certainly something to 
do with setup. If you have the same data and the same query and can 
reliably produce different results then something else must be different.


If you look at the explain output from both, PG knows the seq-scan is 
going to be expensive (cost=20835) so the Linux box either

1. Doesn't have the index (and you say it does, so it's not this).
2. Thinks the index will be even more expensive.
3. Can't use the index at all.

Issue set enable_seqscan=false and then run your explain analyse. If 
your query uses the index, what is the estimated cost? If the estimated 
cost is larger than a seq-scan that would indicate your configuration 
settings are badly out-of-range.


If the index isn't used, then we have problem #3. I think this is what 
you are actually seeing. Your locale is something other than C and PG 
doesn't know how to use like with indexes. Read up on operator classes 
or change your locale.

http://www.postgresql.org/docs/8.1/static/indexes-opclass.html

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] slow queue-like empty table

2006-10-04 Thread Tobias Brox
[Csaba Nagy - Thu at 10:45:35AM +0200]
 So you should check for idle in transaction sessions, those are bad...
 or any other long running transaction.

Thank you (and others) for pointing this out, you certainly set us on
the right track.  We did have some few unclosed transactions;
transactions not beeing ended by rollback or commit.  We've been
fixing this, beating up the programmers responsible and continued
monitoring.

I don't think it's only due to those queue-like tables, we've really
seen a significant improvement on the graphs showing load and cpu usage
on the database server after we killed all the idle in transaction.  I
can safely relax still some weeks before I need to do more optimization
work :-)

(oh, btw, we didn't really beat up the programmers ... too big
geographical distances ;-)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-04 Thread Markus Schaber
Hi, Alex,

Alex Stapleton wrote:

 explain analyze is more helpful because it prints the times.
 
 You can always use the \timing flag in psql ;)

Have you ever tried EXPLAIN ANALYZE?

\timing gives you one total timing, but EXPLAIN ANALYZE gives you
timings for sub-plans, including real row counts etc.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Dave Dutcher
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Adnan DURSUN
 i want to be can read an execution plan when 
 i look at it. 
 So, is there any doc about how it should be read ?


You are asking how to read the output from EXPLAIN?  This page is a good
place to start:

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Richard Huxton

simon godden wrote:

I did that, e.g. initdb --locale=C, re-created all my data and have
exactly the same problem.

I have two indexes, one with no options, and one with the varchar
operator options.

So the situation now is:
If I do a like query it uses the index with the varchar options;
If I do a = query, it uses the index with no options;
If I do a  or  or any other operator, it reverts back to a seq-scan!

I am on FC5 - any further ideas?  Did I need to do anything specific
about collating sequence?  I thought that the --locale=C would set
that for all options.


From psql, a show all command will list all your config settings and 
let you check the lc_xxx values are correct.


Make sure you've analysed the database after restoring, otherwise it 
will have bad stats available.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson
On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote:
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  Adnan DURSUN
  i want to be can read an execution plan when 
  i look at it. 
  So, is there any doc about how it should be read ?
 
 
 You are asking how to read the output from EXPLAIN?  This page is a good
 place to start:
 
 http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 

Robert Treat's Explaining Explain presentation from OSCON is also very
good:

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22

Brad.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

On 10/4/06, Richard Huxton dev@archonet.com wrote:

simon godden wrote:

 From psql, a show all command will list all your config settings and
let you check the lc_xxx values are correct.


lc_collate is C, as are all the other lc settings.

I have run the analyze commands.

Still the same.

--
Simon Godden

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson
On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote:
* When any session updates the data that already in shared
 buffer, 
 does Postgres synchronize the data both disk and shared buffers area 
  immediately ?
 
 Not necessarily true. When a block is modified in the shared buffers,
 the modified block is written to the Postgres WAL log. A periodic DB
 checkpoint is performed to flush the modified blocks in the shared
 buffers to the data files.

Postgres 8.0 and beyond have a process called bgwriter that continually
flushes dirty buffers to disk, to minimize the work that needs to be
done at checkpoint time.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Richard Huxton

simon godden wrote:

On 10/4/06, Richard Huxton dev@archonet.com wrote:

simon godden wrote:

 From psql, a show all command will list all your config settings and
let you check the lc_xxx values are correct.


lc_collate is C, as are all the other lc settings.

I have run the analyze commands.

Still the same.


Can you post EXPLAIN ANALYSE for the LIKE and  queries that should be 
using the index? With enable_seqscan on and off please.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Dave Dutcher
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 
 lc_collate is C, as are all the other lc settings.
 
 I have run the analyze commands.
 
 Still the same.


That is strange.  I figured it had to be related to the locale and the LIKE
operator.  I'm not an expert on these locale issues, but I'd be curious to
see if it would start using an index if you added an index like this:

CREATE INDEX test_index ON t_order (c_number varchar_pattern_ops);

Dave


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

Can you post EXPLAIN ANALYSE for the LIKE and  queries that should be
using the index? With enable_seqscan on and off please.



OK - I don't know what happened, but now my linux installation is
behaving like the windows one.  I honestly don't know what changed,
which I know doesn't help people determine the cause of my issue

But I still have a problem with  and , on both environments.

Now, both LIKE and = are using the index with no options on it.

But the other operators are not.

Firstly, with enable_seqscan on:

orderstest=# explain analyze select c_number from t_order where
c_number like '1%';
QUERY PLAN
-
Index Scan using t_order_c_number on t_order  (cost=0.00..3.01 rows=1
width=11) (actual time=0.167..0.610 rows=100 loops=1)
  Index Cond: (((c_number)::text = '1'::character varying) AND
((c_number)::text  '2'::character varying))
  Filter: ((c_number)::text ~~ '1%'::text)
Total runtime: 0.921 ms
(4 rows)

orderstest=# explain analyze select c_number from t_order where
c_number  '0001';
   QUERY PLAN
---
Seq Scan on t_order  (cost=0.00..18312.50 rows=878359 width=11)
(actual time=1.102..4364.704 rows=878000 loops=1)
  Filter: ((c_number)::text  '0001'::text)
Total runtime: 6431.968 ms
(3 rows)

And now with enable_seqscan off:

orderstest=# explain analyze select c_number from t_order where
c_number like '1%';
QUERY PLAN
-
Index Scan using t_order_c_number on t_order  (cost=0.00..3.01 rows=1
width=11) (actual time=0.245..0.674 rows=100 loops=1)
  Index Cond: (((c_number)::text = '1'::character varying) AND
((c_number)::text  '2'::character varying))
  Filter: ((c_number)::text ~~ '1%'::text)
Total runtime: 0.971 ms
(4 rows)

(Just the same)

orderstest=# explain analyze select c_number from t_order where
c_number  '0001';
QUERY
PLAN

Index Scan using t_order_c_number on t_order  (cost=0.00..22087.31
rows=878912 width=11) (actual time=0.230..3504.909 rows=878000
loops=1)
  Index Cond: ((c_number)::text  '0001'::text)
Total runtime: 5425.931 ms
(3 rows)

(Now using the index but getting awful performance out of it - how's that?)

The difference seems to be whether it is treating the index condition
as 'character varying' or 'text'.

Basically, can I do   = = on a varchar without causing a seq-scan?

--
Simon Godden

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread simon godden

I think I am being stupid now.

The  query was returning so many rows (87% of the rows in the table)
that a seq-scan was of course the best way.

Sorry - all is now working and the problem was the locale issue.

Thanks so much for your help everyone.

--
Simon Godden

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Bruno Wolff III
On Wed, Oct 04, 2006 at 10:03:00 +0200,
  Luc Delgado [EMAIL PROTECTED] wrote:
 
  Please unsubscribe me!  Thank you!

If you really can't figure out how to unsubscribe from a list, you should
contact the list owner, not the list. The list members can't unsubscribe you
(and it isn't their job to) and the owner may not be subscribed to the
list. The convention for lists is that adding '-owner' to the local part
of the list email address will be an address for the owner.
A good place to search to find out how to unsubscribe to a list is to search
for the mailing lists using google. Usually the information on how to subscribe
and unsubscribe are in the same place and you were able to find out how
to subscribe in the first place, so you should be able to figure out how
to unsubscribe by yourself as well.

 
 
  Also, it would be better to have a message foot saying how to unsubscribe.

No, the standard is that the list information is kept in the headers so that
it can be extracted by mail clients that care to. There is an RFC describing
these headers. They are supplied by the mailing list software used for the
Postgres mailing lists. Have your mail client display full headers for one
of the list messages to get the instructions from there.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Joshua D. Drake
Bruno Wolff III wrote:
 On Wed, Oct 04, 2006 at 10:03:00 +0200,
   Luc Delgado [EMAIL PROTECTED] wrote:
  Please unsubscribe me!  Thank you!
 
 If you really can't figure out how to unsubscribe from a list, you should
 contact the list owner, not the list. The list members can't unsubscribe you
 (and it isn't their job to) and the owner may not be subscribed to the
 list. 

Although I 100% agree with you Bruno, it should be noted that our lists
are a closed box for most people. They don't follow what is largely
considered standard amongst lists which is to have list information at
the bottom of each e-mail.

It is ridiculous that this community expects people to read email
headers to figure out how to unsubscribe from our lists.

 The convention for lists is that adding '-owner' to the local part
 of the list email address will be an address for the owner.
 A good place to search to find out how to unsubscribe to a list is to search
 for the mailing lists using google. Usually the information on how to 
 subscribe
 and unsubscribe are in the same place and you were able to find out how
 to subscribe in the first place, so you should be able to figure out how
 to unsubscribe by yourself as well.

Nobody should have to work that hard to unsubscribe.


 

  Also, it would be better to have a message foot saying how to unsubscribe.

Yes, it definitely would.

 
 No, the standard is that the list information is kept in the headers so that
 it can be extracted by mail clients that care to. There is an RFC describing
 these headers. They are supplied by the mailing list software used for the
 Postgres mailing lists. Have your mail client display full headers for one
 of the list messages to get the instructions from there.

Who the heck cares what the RFC says. The RFC is irrelevant if the mail
clients don't support it. The clients that are most widely in use, do
not support unsubscribing from lists via the email headers.



Joshua D. Drake



 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Tobias Brox
To be a bit constructive, could it be an idea to add unsubscribe
information as one of the standard tailer tips?  Then unsubscribe info
wouldn't appear in every mail, but often enough for people considering
to unsubscribe.  To be totally non-constructive, let me add a bit to the
noise below:

[Bruno]
  If you really can't figure out how to unsubscribe from a list, you should
  contact the list owner, not the list. The list members can't unsubscribe you
  (and it isn't their job to) and the owner may not be subscribed to the
  list. 

If he can't find out how to unsubscribe from the list, how can he be
expected to figure out the owner address?

[Joshua]
 It is ridiculous that this community expects people to read email
 headers to figure out how to unsubscribe from our lists.

I always check the headers when I want to unsubscribe from any mailing
list, and I think most people on this list have above average knowledge
of such technical details.  Of course, on a list with this many
recepients there will always be some exceptions ...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Nolan Cafferky




This seems to be the nearly unanimous response to people posting an
unsubscribe request to the postgres mailing lists. I emphatically
agree with the argument - people should know better than that, and the
information included in the e-mail headers should be more than
sufficient. Every conceivable avenue of discovering how to unsubscribe,
other than the list software attaching a footer on each e-mail, is
available to pursue.

I also don't care about that argument in this situation. People
ignorantly posting an unsubscribe to the list get this kind of response
because it's an annoyance to the list users, not necessarily because we
care about educating that particular person. The posts obviously don't
help future unsubscribers who aren't willing to track the information
down anyway. We should be addressing this from the standpoint of what
benefits long-term list users the most. The real question is: which is
more annoying to list users, the occasional unsubscribe posted to the
list (with accompanying responses), or a one-line footer on each post
providing a link to unsubscribe instructions?

Bruno Wolff III wrote:

  On Wed, Oct 04, 2006 at 10:03:00 +0200,
  Luc Delgado [EMAIL PROTECTED] wrote:
  
  
Please unsubscribe me! Thank you!

  
  
If you really can't figure out how to unsubscribe from a list, you should
contact the list owner, not the list. The list members can't unsubscribe you
(and it isn't their job to) and the owner may not be subscribed to the
list. The convention for lists is that adding '-owner' to the local part
of the list email address will be an address for the owner.
A good place to search to find out how to unsubscribe to a list is to search
for the mailing lists using google. Usually the information on how to subscribe
and unsubscribe are in the same place and you were able to find out how
to subscribe in the first place, so you should be able to figure out how
to unsubscribe by yourself as well.

  
  

Also, it would be better to have a message foot saying how to unsubscribe.

  
  
No, the standard is that the list information is kept in the headers so that
it can be extracted by mail clients that care to. There is an RFC describing
these headers. They are supplied by the mailing list software used for the
Postgres mailing lists. Have your mail client display full headers for one
of the list messages to get the instructions from there.
  


-- 
Nolan Cafferky
Software Developer
IT Department
RBS Interactive
[EMAIL PROTECTED]




Re: [PERFORM] Unsubscribe

2006-10-04 Thread Joshua D. Drake

 [Joshua]
 It is ridiculous that this community expects people to read email
 headers to figure out how to unsubscribe from our lists.
 
 I always check the headers when I want to unsubscribe from any mailing
 list, and I think most people on this list have above average knowledge
 of such technical details.  Of course, on a list with this many
 recepients there will always be some exceptions ...

I would consider myself above average knowledge of such technical
details and I didn't know the list information was in the headers until
recently (the last time all of this came up).

Now, I of course did know that there were headers, and I can use them to
diagnose problems but I was unaware of an RFC that explicitly stated how
the headers were supposed to be sent for mailing lists.

However, that is besides the point. It is still ridiculous to expect
anyone to read the headers just to unsubscribe from a list.

If we didn't want to add it for each list we could just add a link here:

http://www.postgresql.org/community/lists/subscribe

Sincerely,

Joshua D. Drake




-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Joshua D. Drake

 I also don't care about that argument in this situation. People
 ignorantly posting an unsubscribe to the list get this kind of response
 because it's an annoyance to the list users,

Over time especially now, we will see many more users versus
developers. Most users will never know how (nor should they have to)
read email headers.

 benefits long-term list users the most. The real question is: which is
 more annoying to list users, the occasional unsubscribe posted to the
 list (with accompanying responses), or a one-line footer on each post
 providing a link to unsubscribe instructions?

Good point, because I guarantee you every time someone pulls this
elitist dung about email headers, I am going to step in and say something.

So if you want to shut me up, lets get the footer added.

Joshua D. Drake
-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Bruno Wolff III
On Wed, Oct 04, 2006 at 08:30:03 -0700,
  Joshua D. Drake [EMAIL PROTECTED] wrote:
 
 Although I 100% agree with you Bruno, it should be noted that our lists
 are a closed box for most people. They don't follow what is largely
 considered standard amongst lists which is to have list information at
 the bottom of each e-mail.

There are reasons you don't want to do that. Footers work OK for single
part email messages. They don't make so much sense in multipart messages.
You can probably take a crap shoot and add the footer to the first
text/plain part and not break things. This won't work so well for multipart
alternative messages that have text/plain and text/html parts. You could
also try to insert a footer in to the html part, but thats a bit trickier
since you can't just put it at the end.

However, since the postgres lists are mostly just using text/plain parts
for message bodies and there are already footers being used to distribute
tips, it wouldn't make things significantly worse to add unsubscribe
information as well.

I would prefer just making the unsubscribe instructions easy to find on
the web.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Unsubscribe

2006-10-04 Thread D'Arcy J.M. Cain
On Wed, 04 Oct 2006 09:00:45 -0700
Joshua D. Drake [EMAIL PROTECTED] wrote:
 So if you want to shut me up, lets get the footer added.

Of course, that doesn't fix the problem 100%.  I am on lists that do
show that info in the footer and people still send unsubscribe messages
to the list.

By the way, mailman has a nice feature that sends messages that look
like admin requests (such as unsubscribe) to the admin.  That cuts down
on the noise quite a bit.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Joshua D. Drake
D'Arcy J.M. Cain wrote:
 On Wed, 04 Oct 2006 09:00:45 -0700
 Joshua D. Drake [EMAIL PROTECTED] wrote:
 So if you want to shut me up, lets get the footer added.
 
 Of course, that doesn't fix the problem 100%.  I am on lists that do
 show that info in the footer and people still send unsubscribe messages
 to the list.

Sure, but what is more helpful? A reply that snips everything but the
footer that has those instructions or a replay that shows email headers
that look basically like some weird code to users?

 
 By the way, mailman has a nice feature that sends messages that look
 like admin requests (such as unsubscribe) to the admin.  That cuts down
 on the noise quite a bit.

Well you don't have to convince me to use mailman ;)... but the *ahem*
list administrators of this project would rather have their toenails
removed with a dull spoon.

Sincerely,

Joshua D. Drake


 


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Joshua D. Drake

 I would prefer just making the unsubscribe instructions easy to find on
 the web.

They actually reasonably are. If you go to www-community/support-lists

Sincerely,

Joshua D. Drake


 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Mark Lewis
I'd prefer to have a short footer link called something like Mailing
List Page which would take you to a page where you could subscribe,
unsubscribe, or view the archives.  I think that making the link short
and also making it a quick shortcut away from the archives tips the
scales in terms of utility vs. annoyance.  One of the tips that shows up
in the footers today is just a link to the archives anyway.

-- Mark Lewis

On Wed, 2006-10-04 at 11:28 -0500, Bruno Wolff III wrote:
 On Wed, Oct 04, 2006 at 08:30:03 -0700,
   Joshua D. Drake [EMAIL PROTECTED] wrote:
  
  Although I 100% agree with you Bruno, it should be noted that our lists
  are a closed box for most people. They don't follow what is largely
  considered standard amongst lists which is to have list information at
  the bottom of each e-mail.
 
 There are reasons you don't want to do that. Footers work OK for single
 part email messages. They don't make so much sense in multipart messages.
 You can probably take a crap shoot and add the footer to the first
 text/plain part and not break things. This won't work so well for multipart
 alternative messages that have text/plain and text/html parts. You could
 also try to insert a footer in to the html part, but thats a bit trickier
 since you can't just put it at the end.
 
 However, since the postgres lists are mostly just using text/plain parts
 for message bodies and there are already footers being used to distribute
 tips, it wouldn't make things significantly worse to add unsubscribe
 information as well.
 
 I would prefer just making the unsubscribe instructions easy to find on
 the web.
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Csaba Nagy
On Wed, 2006-10-04 at 18:02, Csaba Nagy wrote:
  If we didn't want to add it for each list we could just add a link here:
  
  http://www.postgresql.org/community/lists/subscribe

OK, now that I had a second look on that page, it does contain
unsubscription info... but it's well hidden for the fugitive look... the
caption is a big Subscribe to Lists, you wouldn't think at a first
glance think that the form is actually used to unsubscribe too, would
you ?

So maybe it's just that the text should be more explicit about what it
actually does...

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Steve Atkins


On Oct 4, 2006, at 9:00 AM, Joshua D. Drake wrote:




I also don't care about that argument in this situation. People
ignorantly posting an unsubscribe to the list get this kind of  
response

because it's an annoyance to the list users,


Over time especially now, we will see many more users versus
developers. Most users will never know how (nor should they  
have to)

read email headers.


They should know how to participate in mailing lists. That's
unrelated to whether you're a developer or a user.

The same webpage where you subscribe to a mailing list,
you can also unsubscribe. This is not some weird technical
voodoo, just that some people prefer to waste a thousand
other peoples time than spend a minute or two of their own.
Fortunately, they're a tiny minority.



benefits long-term list users the most. The real question is:  
which is

more annoying to list users, the occasional unsubscribe posted to the
list (with accompanying responses), or a one-line footer on each post
providing a link to unsubscribe instructions?


Good point, because I guarantee you every time someone pulls this
elitist dung about email headers, I am going to step in and say  
something.


So if you want to shut me up, lets get the footer added.


Judging from experience on other lists, it won't help. The tiny
minority of people who are unable to unsubscribe will continue
to be unable to unsubscribe. It won't hurt, though.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Geoffrey

Steve Atkins wrote:


On Oct 4, 2006, at 9:00 AM, Joshua D. Drake wrote:




I also don't care about that argument in this situation. People
ignorantly posting an unsubscribe to the list get this kind of response
because it's an annoyance to the list users,


Over time especially now, we will see many more users versus
developers. Most users will never know how (nor should they have to)
read email headers.


They should know how to participate in mailing lists. That's
unrelated to whether you're a developer or a user.

The same webpage where you subscribe to a mailing list,
you can also unsubscribe. This is not some weird technical
voodoo, just that some people prefer to waste a thousand
other peoples time than spend a minute or two of their own.
Fortunately, they're a tiny minority.


I believe that if you could get an honest response, you'd find a lot of 
these folks are just plain lazy.  They don't want to recall how to 
unsubscribe and figure sending mail to the list will get the required 
result.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Poor performance on very simple query ?

2006-10-04 Thread Markus Schaber
Hi, Tobias,

Tobias Brox wrote:

 How can you have a default value on a primary key?

Just declare the column with both a default value and a primary key
constraint.

It makes sense when the default value is calculated instead of a
constant, by calling a function that generates the key.

In fact, the SERIAL type does nothing but defining a sequence, and then
use nextval('sequencename') as default.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] UPDATE becomes mired / win32

2006-10-04 Thread Steve Peterson
I'm having an interesting (perhaps anomalous) variability in UPDATE 
performance on a table in my database, and wanted to see if there was 
any interest in looking further before I destroy the evidence and move on.


The table, VOTER, contains 3,090,013 rows and each row is about 120 
bytes wide.  It's loaded via a batch process in one shot, and the 
load is followed by an VACUUM FULL ANALYZE.  Its structure is shown 
at the bottom of the message.


If I run the statement:

(1):  UPDATE voter SET gender = 'U';

on the table in this condition, the query effectively never ends -- 
I've allowed it to run for 12-14 hours before giving up.  The plan 
for that statement is:


Seq Scan on voter  (cost=0.00..145117.38 rows=3127738 width=120)

However, if I do the following:

(2):  CREATE TABLE voter_copy AS SELECT * FROM voter;
(3):  UPDATE voter_copy SET gender = 'U';

the query is much faster --

Seq Scan on voter_copy  (cost=0.00..96231.35 rows=3090635 width=120) 
(actual time=108.056..43203.696 rows=3090013 loops=1)

Total runtime: 117315.731 ms

When (1) is running, the machine is very nearly idle, with no 
postmaster taking more than 1 or 2 % of the CPU.  When (3) is 
running, about 60% CPU utilization occurs.


The same behavior occurs if the table is dumped and reloaded.

My environment is Windows XP SP2 and I'm on Postgresql 8.1.4 
installed via the msi installer.  Hardware is an Athlon 2000+ 
1.67ghx, with 1G RAM.  The database is hosted on a Seagate Barracuda 
7200.10 connected via a FastTrak 4300 without any RAID 
configuration.  dd shows a write speed of 39 MB/s and read speed of 
44 MB/s.  The server configuration deviates from the default in these 
statements:


fsync = off
shared_buffers = 25000
work_mem = 5
maintenance_work_mem = 10

CREATE TABLE voter
(
  voter_id int4,
  sos_voter_id varchar(20),
  household_id int4,
  first_name varchar(40),
  middle_name varchar(40),
  last_name varchar(40),
  name_suffix varchar(10),
  phone_number varchar(10),
  bad_phone_no bool,
  registration_date date,
  birth_year int4,
  gender char(1),
  pri_ind char(1),
  gen_1992_primary_party char(1),
  council_votes int2,
  primary_votes int2,
  council_primary_votes int2,
  special_votes int2,
  presidential_votes int2,
  votes int2,
  absentee_votes int2,
  last_voted_date date,
  first_voted_date date,
  rating char(1),
  score float4,
  general_votes int2
)
WITHOUT OIDS;


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Multi-key index not beeing used - bug?

2006-10-04 Thread Tobias Brox
Look at this:

NBET= explain select * from account_transaction where users_id=123456 order by 
created desc limit 10;
QUERY PLAN
---
 Limit  (cost=0.00..27.40 rows=10 width=213)
   -  Index Scan Backward using account_transaction_on_user_and_timestamp on 
account_transaction  (cost=0.00..1189.19 rows=434 width=213)
 Index Cond: (users_id = 123456)
(3 rows)

NBET= explain select * from account_transaction where users_id=123456 order by 
created desc, id desc limit 10;
  QUERY PLAN
--
 Limit  (cost=1114.02..1114.04 rows=10 width=213)
   -  Sort  (cost=1114.02..1115.10 rows=434 width=213)
 Sort Key: created, id
 -  Index Scan using account_transaction_by_users_id on 
account_transaction  (cost=0.00..1095.01 rows=434 width=213)
   Index Cond: (users_id = 123456)
(5 rows)

In case the explains doesn't explain themself good enough: we have a
transaction table with ID (primary key, serial), created (a timestamp)
and a users_id.  Some of the users have generated thousands of
transactions, and the above query is a simplified version of the query
used to show the users their last transactions.  Since we have a large
user base hammering our servers with this request, the speed is
significant.

We have indices on the users_id field and the (users_id, created)-tuple.

The timestamp is set by the application and has a resolution of 1 second
- so there may easily be several transactions sharing the same
timestamp, but this is an exception not the rule.  I suppose the
developers needed to add the ID to the sort list to come around a bug,
but still prefering to have the primary sorting by created to be able to
use the index.  One workaround here is to order only by id desc and
create a new index on (users_id, id) - but I really don't like adding
more indices to the transaction table.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-04 Thread Carlo Stonebanks
 can you do explain analyze on the two select queries on either side of
 the union separatly?  the subquery is correctly written and unlikely
 to be a problem (in fact, good style imo).  so lets have a look at
 both sides of facil query and see where the problem is.

Sorry for the delay, the server was down yesterday and couldn't get 
anything.

I have modified the sub-queries a little, trying to get the index scans to 
fire - all the tables involved here are large enough to benefit from index 
scans over sequential scans. I am mystified as to why PART 1 is giving me:

 Seq Scan on facility_address fa  (cost=0.00..3014.68 rows=128268 width=12) 
(actual time=0.007..99.033 rows=128268 loops=1)

which I assume is for the:

join mdx_core.facility_address as fa on fa.facility_id = f.facility_id

Then again, I am not sure how to read the EXPLAIN ANALYSE performance 
numbers.

The other part of the UNION (PART 2) I have also modified, I think it's 
working nicely. Let me know if I'm mistaken on thinking that!

The one remaining problem is that the UNION of these two sub-queries has a 
column which is a call to a custom TCL function that does a lexical analysis 
on the results, ranking the result names by their proximity to the imported 
name. his definitely eats up the performance and I hope that my decision to 
call this function on the results of the union (assuming union deletes 
redundent rows) is the correct one.

Thanks!

Carlo


/* PART 1.
  The redundant expression facility_address_id is NULL was removed because
  only an OUTER join would have made this meaningful. We use only INNER 
joins in this sub-query
  Both facility_address and address have seq scans, even though there is an 
index for
  facility_address(facility_id( and an index for address( country_code, 
postal_code, address).
  The like operator appears to be making things expensive. This is used 
because we have to take
   into account that perhaps the import row is using the 5-number US ZIP, 
not the 9-number USZIP+4
  standard (although this is not the case in this sample).
/*
explain analyse select
  f.facility_id,
  fa.facility_address_id,
  a.address_id,
  f.facility_type_code,
  f.name,
  a.address,
  a.city,
  a.state_code,
  a.postal_code,
  a.country_code
   from
  mdx_core.facility as f
   join mdx_core.facility_address as fa
  on fa.facility_id = f.facility_id
   join mdx_core.address as a
  on a.address_id = fa.address_id
   where
  a.country_code = 'US'
  and a.state_code = 'IL'
  and a.postal_code like '60640-5759'||'%'
   order by facility_id

Sort  (cost=6392.50..6392.50 rows=1 width=103) (actual 
time=189.133..189.139 rows=12 loops=1)
  Sort Key: f.facility_id
  -  Nested Loop  (cost=2732.88..6392.49 rows=1 width=103) (actual 
time=14.006..188.967 rows=12 loops=1)
-  Hash Join  (cost=2732.88..6388.91 rows=1 width=72) (actual 
time=13.979..188.748 rows=12 loops=1)
  Hash Cond: (outer.address_id = inner.address_id)
  -  Seq Scan on facility_address fa  (cost=0.00..3014.68 
rows=128268 width=12) (actual time=0.004..98.867 rows=128268 loops=1)
  -  Hash  (cost=2732.88..2732.88 rows=1 width=64) (actual 
time=6.430..6.430 rows=3 loops=1)
-  Bitmap Heap Scan on address a  (cost=62.07..2732.88 
rows=1 width=64) (actual time=2.459..6.417 rows=3 loops=1)
  Recheck Cond: ((country_code = 'US'::bpchar) AND 
((state_code)::text = 'IL'::text))
  Filter: ((postal_code)::text ~~ 
'60640-5759%'::text)
  -  Bitmap Index Scan on 
address_country_state_postal_code_address_idx  (cost=0.00..62.07 rows=3846 
width=0) (actual time=1.813..1.813 rows=3554 loops=1)
Index Cond: ((country_code = 'US'::bpchar) 
AND ((state_code)::text = 'IL'::text))
-  Index Scan using facility_pkey on facility f  (cost=0.00..3.56 
rows=1 width=35) (actual time=0.012..0.013 rows=1 loops=12)
  Index Cond: (outer.facility_id = f.facility_id)
Total runtime: 189.362 ms

/* PART 2 - can you see anything that could work faster? */

explain analyse select
  f.facility_id,
  null as facility_address_id,
  null as address_id,
  f.facility_type_code,
  f.name,
  null as address,
  f.default_city as city,
  f.default_state_code as state_code,
  f.default_postal_code as postal_code,
  f.default_country_code as country_code
   from
  mdx_core.facility as f
   left outer join mdx_core.facility_address as fa
  on fa.facility_id = f.facility_id
   where
  fa.facility_address_id is null
  and f.default_country_code = 'US'
  and f.default_state_code = 'IL'
  and '60640-5759' like f.default_postal_code||'%'

Nested Loop Left Join  (cost=0.00..6042.41 rows=32 width=73) (actual 
time=14.923..14.923 rows=0 loops=1)
  Filter: (inner.facility_address_id IS NULL)
  -  Index Scan using 

Re: [PERFORM] Multi-key index not beeing used - bug?

2006-10-04 Thread Graham Davis
Thanks Tobias.  The difference here though, is that in terms of your 
database I am doing a query to select the most recent transaction for 
EACH user at once, not just for one user.  If I do a similar query to 
yours to get the last transaction for a single user, my query is fast 
like yours.  It's when I'm doing a query to get the results for all 
users at once that it is slow.  If you try a query to get the most 
recent transaction of all useres at once you will run into the same 
problem I am having.


Graham.


Tobias Brox wrote:


Look at this:

NBET= explain select * from account_transaction where users_id=123456 order by 
created desc limit 10;
   QUERY PLAN
---
Limit  (cost=0.00..27.40 rows=10 width=213)
  -  Index Scan Backward using account_transaction_on_user_and_timestamp on 
account_transaction  (cost=0.00..1189.19 rows=434 width=213)
Index Cond: (users_id = 123456)
(3 rows)

NBET= explain select * from account_transaction where users_id=123456 order by 
created desc, id desc limit 10;
 QUERY PLAN
--
Limit  (cost=1114.02..1114.04 rows=10 width=213)
  -  Sort  (cost=1114.02..1115.10 rows=434 width=213)
Sort Key: created, id
-  Index Scan using account_transaction_by_users_id on 
account_transaction  (cost=0.00..1095.01 rows=434 width=213)
  Index Cond: (users_id = 123456)
(5 rows)

In case the explains doesn't explain themself good enough: we have a
transaction table with ID (primary key, serial), created (a timestamp)
and a users_id.  Some of the users have generated thousands of
transactions, and the above query is a simplified version of the query
used to show the users their last transactions.  Since we have a large
user base hammering our servers with this request, the speed is
significant.

We have indices on the users_id field and the (users_id, created)-tuple.

The timestamp is set by the application and has a resolution of 1 second
- so there may easily be several transactions sharing the same
timestamp, but this is an exception not the rule.  I suppose the
developers needed to add the ID to the sort list to come around a bug,
but still prefering to have the primary sorting by created to be able to
use the index.  One workaround here is to order only by id desc and
create a new index on (users_id, id) - but I really don't like adding
more indices to the transaction table.


---(end of broadcast)---
TIP 6: explain analyze is your friend
 




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Unsubscribe

2006-10-04 Thread Michael Stone

On Wed, Oct 04, 2006 at 08:30:03AM -0700, Joshua D. Drake wrote:

They don't follow what is largely
considered standard amongst lists which is to have list information at
the bottom of each e-mail.


In my experience such a footer doesn't do much to prevent people sending 
unsubscribe messages to the list.


Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] UPDATE becomes mired / win32

2006-10-04 Thread Tom Lane
Steve Peterson [EMAIL PROTECTED] writes:
 If I run the statement:
 (1):  UPDATE voter SET gender = 'U';
 on the table in this condition, the query effectively never ends -- 
 I've allowed it to run for 12-14 hours before giving up.
 ...
 When (1) is running, the machine is very nearly idle, with no 
 postmaster taking more than 1 or 2 % of the CPU.

Is the disk busy?  If neither CPU nor I/O are saturated, then it's a
good bet that the UPDATE isn't actually running at all, but is waiting
for a lock somewhere.  Have you looked into pg_locks to check for a
conflicting lock?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Multi-key index not beeing used - bug?

2006-10-04 Thread Tom Lane
Tobias Brox [EMAIL PROTECTED] writes:
 NBET= explain select * from account_transaction where users_id=123456 order 
 by created desc, id desc limit 10;

 We have indices on the users_id field and the (users_id, created)-tuple.

Neither of those indexes can provide the sort order the query is asking
for.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Multi-key index not beeing used - bug?

2006-10-04 Thread Tobias Brox
[Tom Lane - Wed at 04:33:54PM -0400]
  We have indices on the users_id field and the (users_id, created)-tuple.
 
 Neither of those indexes can provide the sort order the query is asking
 for.

Ah; that's understandable - the planner have two options, to do a index
traversion without any extra sorting, or to take out everything and then
sort.  What I'd like postgres to do is to traverse the index and do some
sorting for every unique value of created.  Maybe such a feature can be
found in future releases - like Postgres 56.3? ;-)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-04 Thread Merlin Moncure

On 10/4/06, Carlo Stonebanks [EMAIL PROTECTED] wrote:

 can you do explain analyze on the two select queries on either side of
 the union separatly?  the subquery is correctly written and unlikely
 to be a problem (in fact, good style imo).  so lets have a look at
 both sides of facil query and see where the problem is.

Sorry for the delay, the server was down yesterday and couldn't get
anything.

I have modified the sub-queries a little, trying to get the index scans to
fire - all the tables involved here are large enough to benefit from index
scans over sequential scans. I am mystified as to why PART 1 is giving me:




 Seq Scan on facility_address fa  (cost=0.00..3014.68 rows=128268 width=12)
(actual time=0.007..99.033 rows=128268 loops=1)


not sure on this, lets go back to that.


   into account that perhaps the import row is using the 5-number US ZIP,
not the 9-number USZIP+4




   where
  a.country_code = 'US'
  and a.state_code = 'IL'
  and a.postal_code like '60640-5759'||'%'
   order by facility_id


1. create a small function, sql preferred which truncates the zip code
to 5 digits or reduces to so called 'fuzzy' matching criteria.  lets
call it zip_trunc(text) and make it immutable which it is. write this
in sql, not tcl if possible (trust me).

create index address_idx on address(country_code, state_code,
zip_trunc(postal_code));

rewrite above where clause as

where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US',
'IL', zip_trunc('60640-5759'));

try it out, then lets see how it goes and then we can take a look at
any seqscan issues.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] any hope for my big query?

2006-10-04 Thread Ben

On Fri, 29 Sep 2006, Jim C. Nasby wrote:


There's no join criteria for umdb.node... is that really what you want?



Unfortunately, yes, it is.

I've taken in all of everybody's helpful advice (thanks!) and reworked 
things a little, and now I'm left with this expensive nugget:


select aj.album from
(select seconds-1 as a,seconds+1 as b from node where node.dir = 6223) n
join public.track t
on (t.length between n.a*1000 and n.b*1000)
join public.albumjoin aj
on (aj.track = t.id)
join (select id from public.albummeta am where tracks between 3 and 7) lam
on (lam.id = aj.album)
group by aj.album having count(*) = 4;

...which comes out to be:

 HashAggregate  (cost=90.69..904909.99 rows=31020 width=4)
   Filter: (count(*) = 4)
   -  Nested Loop  (cost=428434.81..897905.17 rows=1307904 width=4)
 Join Filter: ((inner.length = ((outer.seconds - 1) * 1000)) AND 
(inner.length = ((outer.seconds + 1) * 1000)))
 -  Index Scan using node_dir on node  (cost=0.00..3.46 rows=17 
width=4)
   Index Cond: (dir = 6223)
 -  Materialize  (cost=428434.81..438740.01 rows=692420 width=8)
   -  Hash Join  (cost=210370.58..424361.39 rows=692420 width=8)
 Hash Cond: (outer.id = inner.track)
 -  Seq Scan on track t  (cost=0.00..128028.41 
rows=5123841 width=8)
 -  Hash  (cost=205258.53..205258.53 rows=692420 width=8)
   -  Hash Join  (cost=6939.10..205258.53 rows=692420 
width=8)
 Hash Cond: (outer.album = inner.id)
 -  Seq Scan on albumjoin aj  
(cost=0.00..88918.41 rows=5123841 width=8)
 -  Hash  (cost=6794.51..6794.51 rows=57834 
width=4)
   -  Bitmap Heap Scan on albummeta am  
(cost=557.00..6794.51 rows=57834 width=4)
 Recheck Cond: ((tracks = 3) AND 
(tracks = 7))
 -  Bitmap Index Scan on 
albummeta_tracks_index  (cost=0.00..557.00 rows=57834 width=0)
   Index Cond: ((tracks = 3) AND 
(tracks = 7))
(19 rows)


I'm surprised (though probably just because I'm ignorant) that it would 
have so much sequential scanning in there. For instance, because n is 
going to have at most a couple dozen rows, it seems that instead of 
scanning all of public.track, it should be able to convert my t.length 
between a and b clause to some between statements or'd together. Or at 
least, it would be nice if the planner could do that. :)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-04 Thread Carlo Stonebanks
Hi Merlin,

Here are the results. The query returned more rows (65 vs 12) because of the 
vague postal_code.

In reality, we would have to modify the postal_code logic to take advantage 
of full zip codes when they were avalable, not unconditionally truncate 
them.

Carlo

explain analyze select
  f.facility_id,
  fa.facility_address_id,
  a.address_id,
  f.facility_type_code,
  f.name,
  a.address,
  a.city,
  a.state_code,
  a.postal_code,
  a.country_code
   from
  mdx_core.facility as f
   join mdx_core.facility_address as fa
  on fa.facility_id = f.facility_id
   join mdx_core.address as a
  on a.address_id = fa.address_id
   where
  (a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) = 
('US', 'IL', mdx_core.zip_trunc('60640-5759'))
   order by facility_id

Sort  (cost=6474.78..6474.84 rows=25 width=103) (actual 
time=217.279..217.311 rows=65 loops=1)
  Sort Key: f.facility_id
  -  Nested Loop  (cost=2728.54..6474.20 rows=25 width=103) (actual 
time=35.828..217.059 rows=65 loops=1)
-  Hash Join  (cost=2728.54..6384.81 rows=25 width=72) (actual 
time=35.801..216.117 rows=65 loops=1)
  Hash Cond: (outer.address_id = inner.address_id)
  -  Seq Scan on facility_address fa  (cost=0.00..3014.68 
rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)
  -  Hash  (cost=2728.50..2728.50 rows=19 width=64) (actual 
time=33.618..33.618 rows=39 loops=1)
-  Bitmap Heap Scan on address a  (cost=48.07..2728.50 
rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)
  Recheck Cond: ((country_code = 'US'::bpchar) AND 
((state_code)::text = 'IL'::text))
  Filter: (mdx_core.zip_trunc(postal_code) = 
'60640'::text)
  -  Bitmap Index Scan on 
address_country_state_zip_trunc_idx  (cost=0.00..48.07 rows=3846 width=0) 
(actual time=1.783..1.783 rows=3554 loops=1)
Index Cond: ((country_code = 'US'::bpchar) 
AND ((state_code)::text = 'IL'::text))
-  Index Scan using facility_pkey on facility f  (cost=0.00..3.56 
rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)
  Index Cond: (outer.facility_id = f.facility_id)
Total runtime: 217.520 ms



Merlin Moncure [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On 10/4/06, Carlo Stonebanks [EMAIL PROTECTED] wrote:
  can you do explain analyze on the two select queries on either side of
  the union separatly?  the subquery is correctly written and unlikely
  to be a problem (in fact, good style imo).  so lets have a look at
  both sides of facil query and see where the problem is.

 Sorry for the delay, the server was down yesterday and couldn't get
 anything.

 I have modified the sub-queries a little, trying to get the index scans 
 to
 fire - all the tables involved here are large enough to benefit from 
 index
 scans over sequential scans. I am mystified as to why PART 1 is giving 
 me:


  Seq Scan on facility_address fa  (cost=0.00..3014.68 rows=128268 
 width=12)
 (actual time=0.007..99.033 rows=128268 loops=1)

 not sure on this, lets go back to that.

into account that perhaps the import row is using the 5-number US ZIP,
 not the 9-number USZIP+4


where
   a.country_code = 'US'
   and a.state_code = 'IL'
   and a.postal_code like '60640-5759'||'%'
order by facility_id

 1. create a small function, sql preferred which truncates the zip code
 to 5 digits or reduces to so called 'fuzzy' matching criteria.  lets
 call it zip_trunc(text) and make it immutable which it is. write this
 in sql, not tcl if possible (trust me).

 create index address_idx on address(country_code, state_code,
 zip_trunc(postal_code));

 rewrite above where clause as

 where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US',
 'IL', zip_trunc('60640-5759'));

 try it out, then lets see how it goes and then we can take a look at
 any seqscan issues.

 merlin

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] pg_trgm indexes giving bad estimations?

2006-10-04 Thread Ben

I've got another query I'm trying to optimize:

select aj.album from
public.track t
join public.albumjoin aj
on (aj.track = t.id)
join (select id from public.albummeta am where tracks between 10 and  
14) lam

on (lam.id = aj.album)
where (t.name % '01New OrderEvil Dust' or t.name % '04OrbitalOpen Mind')
group by aj.album having count(*) = 9.6;

This gives an expensive (but still reasonable) plan of:

HashAggregate  (cost=76523.64..76602.25 rows=4492 width=4)
   Filter: ((count(*))::numeric = 9.6)
   -  Hash Join  (cost=63109.73..76501.18 rows=4492 width=4)
 Hash Cond: (outer.id = inner.album)
 -  Bitmap Heap Scan on albummeta am   
(cost=1810.10..9995.34 rows=187683 width=4)

   Recheck Cond: ((tracks = 10) AND (tracks = 14))
   -  Bitmap Index Scan on albummeta_tracks_index   
(cost=0.00..1810.10 rows=187683 width=0)

 Index Cond: ((tracks = 10) AND (tracks = 14))
 -  Hash  (cost=61274.03..61274.03 rows=10243 width=4)
   -  Nested Loop  (cost=163.87..61274.03 rows=10243  
width=4)
 -  Bitmap Heap Scan on track t   
(cost=163.87..28551.33 rows=10243 width=4)
   Recheck Cond: (((name)::text % '01New  
OrderEvil Dust'::text) OR ((name)::text % '04OrbitalOpen Mind'::text))
   -  BitmapOr  (cost=163.87..163.87  
rows=10248 width=0)
 -  Bitmap Index Scan on  
track_name_trgm_idx  (cost=0.00..81.93 rows=5124 width=0)
   Index Cond: ((name)::text %  
'01New OrderEvil Dust'::text)
 -  Bitmap Index Scan on  
track_name_trgm_idx  (cost=0.00..81.93 rows=5124 width=0)
   Index Cond: ((name)::text %  
'04OrbitalOpen Mind'::text)
 -  Index Scan using albumjoin_trackindex on  
albumjoin aj  (cost=0.00..3.18 rows=1 width=8)

   Index Cond: (aj.track = outer.id)
(19 rows)

Unfortunately, when I modify this example to use a more typical  
number of trigram searches or'd together (anywhere from 10 to 20),  
the planner thinks the bitmap heap scan on track t will return a lot  
of rows, and so reverts to doing a sequential scan of albumjoin for  
the next table join. That would make sense IF there were a lot of  
rows returned by the bitmap index scans. But here is where the  
planner gets it really wrong, if I'm reading it right.


It seems to think both my index scans will return 5124 rows, when, in  
reality, it's a lot less:


select count(*) from public.track where name % '01New OrderEvil Dust';
count
---
20
(1 row)

select count(*) from public.track where name % '04OrbitalOpen Mind';
count
---
   123
(1 row)


How can I get the planner to not expect so many rows to be returned?  
A possibly related question is: because pg_tgrm lets me set the  
matching threshold of the % operator, how does that affect the planner?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg_trgm indexes giving bad estimations?

2006-10-04 Thread Tom Lane
Ben [EMAIL PROTECTED] writes:
 How can I get the planner to not expect so many rows to be returned?  

Write an estimation function for the pg_trgm operator(s).  (Send in a
patch if you do!)  I see that % is using contsel which is only a stub,
and would likely be wrong for % even if it weren't.

 A possibly related question is: because pg_tgrm lets me set the  
 matching threshold of the % operator, how does that affect the planner?

It hasn't a clue about that.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] UPDATE becomes mired / win32

2006-10-04 Thread Steve Peterson
Both commands seem to be saturating the disk.  There's nothing else 
running in the database, and all of the locks have 't' in the granted 
column, which I'm assuming means they're not blocked.


According to the statistics, the original table has 889 mb and 
indexes of 911mb, whereas the copy has 1021 mb and no space for indexes.


Steve

At 03:28 PM 10/4/2006, Tom Lane wrote:

Steve Peterson [EMAIL PROTECTED] writes:
 If I run the statement:
 (1):  UPDATE voter SET gender = 'U';
 on the table in this condition, the query effectively never ends --
 I've allowed it to run for 12-14 hours before giving up.
 ...
 When (1) is running, the machine is very nearly idle, with no
 postmaster taking more than 1 or 2 % of the CPU.

Is the disk busy?  If neither CPU nor I/O are saturated, then it's a
good bet that the UPDATE isn't actually running at all, but is waiting
for a lock somewhere.  Have you looked into pg_locks to check for a
conflicting lock?

regards, tom lane




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] UPDATE becomes mired / win32

2006-10-04 Thread Steve Peterson
I'm pretty sure that the table was empty before doing the load, but I 
gave this a shot.  It didn't have an impact on the results.


The behavior also persists across a dump/reload of the table into a 
new install on a different machine.  IIRC dump/reload rebuilds 
indexes from scratch.


Steve

At 01:13 PM 10/4/2006, [EMAIL PROTECTED] wrote:
The table, VOTER, contains 3,090,013 rows and each row is about 120 
bytes wide.  It's loaded via a batch process in one shot, and the 
load is followed by an VACUUM FULL ANALYZE.  Its structure is shown 
at the bottom of the message.



if the table wasn't empty before and has indices defined, try a 
REINDEX TABLE VOTER before running the update. i had a similar 
case where an often updated table was vacuumed regurarly, but the 
indices grew and grew and grew.  in my case the table - even when 
empty and analyze full'ed was 1.2gb according to pgadmin due to 
(outdated) indices. a reindex fixed all my performance issues.


- thomas





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings