Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-04 Thread Grzegorz Jaśkiewicz
isn't that possible with window functions and cte ?
rank, and limit ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Amitabh Kant
On Wed, Feb 3, 2010 at 9:49 PM, Reid Thompson reid.thomp...@ateb.comwrote:

 On Wed, 2010-02-03 at 20:42 +0530, Amitabh Kant wrote:
  Forgot to add that I am using Postgres 8.4.2 from the default ports of
  FreeBSD.

 start with this page
 http://www.postgresql.org/docs/8.4/static/kernel-resources.html


I somehow missed the doc. Thanks Reid.

With regards

Amitabh Kant


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Amitabh Kant
On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote:

 On 02/03/10 16:10, Amitabh Kant wrote:

 Hello

 I have a server dedicated for Postgres with the following specs:

 RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
 2.33GHz
 OS: FreeBSD 8.0


 If you really do have heavy read and write load on the server, nothing
 will save you from the bottleneck of having only 4 drives in the system (or
 more accurately: adding more memory will help reads but nothing helps writes
 except more drivers or faster (SSD) drives). If you can, add another 2
 drives in RAID 1 and move+symlink the pg_xlog directory to the new array.


Can't do anything about this server now, but would surely keep in mind
before upgrading other servers. Would you recommend the same speed
drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
or even SATA II)?




  maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
 checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
 effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
 work_mem = 160MB # pg_generate_conf wizard 2010-02-03
 wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
 checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
 shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
 max_connections = 100 # pg_generate_conf wizard 2010-02-03


  I would appreciate if somebody could point out the sysctl/loader.conf
 settings that I need to have in FreeBSD.


 Firstly, you need to run a 64-bit version (amd64) of FreeBSD.


Yes, its running amd64 arch.


 In /boot/loader.conf you will probably need to increase the number of sysv
 ipc semaphores:

 kern.ipc.semmni=512
 kern.ipc.semmns=1024

 This depends mostly on the number of connections allowed to the server. The
 example values I gave above are more than enough but since this is a
 boot-only tunable it is expensive to modify later.

 In /etc/sysctl.conf you will need to increase the shared memory sizes, e.g.
 for a 3900 MB shared_buffer:

 kern.ipc.shmmax=4089446400
 This is the maximum shared memory segment size, in bytes.

 kern.ipc.shmall=105
 This is the maximum amount of memory allowed to be used as sysv shared
 memory, in 4 kB pages.

 If the database contains many objects (tables, indexes, etc.) you may need
 to increase the maximum number of open files and the amount of memory for
 the directory list cache:

 kern.maxfiles=16384
 vfs.ufs.dirhash_maxmem=4194304

 If you estimate you will have large sequential reads on the database, you
 should increase read-ahead count:

 vfs.read_max=32

 Be sure that soft-updates is enabled on the file system you are using for
 data. Ignore all Linux-centric discussions about problems with journaling
 and write barriers :)

 All settings in /etc/sysctl.conf can be changed at runtime (individually or
 by invoking /etc/rc.d/sysctl restart), settings in loader.conf are
 boot-time only.


Thanks Ivan. That's a great explanation of the variables involved.


With regards

Amitabh Kant


Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-04 Thread Glenn Maynard
2010/2/4 Grzegorz Jaśkiewicz gryz...@gmail.com:
 isn't that possible with window functions and cte ?
 rank, and limit ?

It is, but again I tried that when I originally designed this and I
think it ended up using seq scans, or at least being slow for some
reason or other.

But I'll be dropping this db into 8.4 soon to see if it helps
anything, and I'll check again (and if it's still slow I'll post more
details).  It's been a while and I might just have been doing
something wrong.

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Amitabh Kant
On Thu, Feb 4, 2010 at 12:11 AM, Andy Colson a...@squeakycode.net wrote:

 On 2/3/2010 9:10 AM, Amitabh Kant wrote:

 Hello

 I have a server dedicated for Postgres with the following specs:

 RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
 2.33GHz
 OS: FreeBSD 8.0

 It runs multiple (approx 10) databases ranging from 500MB to over 24 GB
 in size. All of them are of the same structure, and almost all of them
 have very heavy read and writes.


 With regards

 Amitabh Kant


 What problems are you having?  Is it slow?  Is there something you are
 trying to fix, or is this just the first tune up?


This is the first tune up. The system has worked pretty fine till now, but
it does lag once in a while, and I would like to optimize it before it
becomes a bigger issue.



  memory allocations. The last time I tried, Postgres refused to start and
  I had to fall back to the default settings.

 Its probably upset about the amount of shared mem.  There is probably a way
 in bsd to set the max amount of shared memory available.  A Quick google
 turned up:

 kern.ipc.shmmax

 Dunno if thats right.  When you try to start PG, if it cannot allocate
 enough shared mem it'll spit out an error message into its log saying how
 much it tried to allocate.

 Check:
 http://archives.postgresql.org/pgsql-admin/2004-06/msg00155.php





  maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
  checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
  effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
  work_mem = 160MB # pg_generate_conf wizard 2010-02-03
  wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
  checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
  shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
  max_connections = 100 # pg_generate_conf wizard 2010-02-03

 Some of these seem like too much.  I'd recommend starting with one or two
 and see how it runs.  Then increase if you're still slow.

 Start with effective_cache_size, shared_buffers and checkpoint_segments.

 Wait until very last to play with work_mem and maintenance_work_mem.


 -Andy


I would keep that in mind. Thanks Andy

With regards

Amitabh


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Ivan Voras
On 4 February 2010 10:02, Amitabh Kant amitabhk...@gmail.com wrote:
 On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote:

 On 02/03/10 16:10, Amitabh Kant wrote:

 Hello

 I have a server dedicated for Postgres with the following specs:

 RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
 2.33GHz
 OS: FreeBSD 8.0

 If you really do have heavy read and write load on the server, nothing
 will save you from the bottleneck of having only 4 drives in the system (or
 more accurately: adding more memory will help reads but nothing helps writes
 except more drivers or faster (SSD) drives). If you can, add another 2
 drives in RAID 1 and move+symlink the pg_xlog directory to the new array.

 Can't do anything about this server now, but would surely keep in mind
 before upgrading other servers. Would you recommend the same speed
 drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
 or even SATA II)?

Again, it depends on your load. It would probably be best if they are
approximately the same speed; the location of pg_xlog will dictate
your write (UPDATE / INSERT / CREATE) speed.

Writes to your database go like this: the data is first written to the
WAL (this is the pg_xlog directory - the transaction log), then it is
read and written to the main database. If the main database is very
busy reading, transfers from WAL to the database will be slower.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Amitabh Kant
On Thu, Feb 4, 2010 at 3:10 PM, Ivan Voras ivo...@freebsd.org wrote:

 On 4 February 2010 10:02, Amitabh Kant amitabhk...@gmail.com wrote:
  On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote:
 
  On 02/03/10 16:10, Amitabh Kant wrote:
 
  Hello
 
  I have a server dedicated for Postgres with the following specs:
 
  RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
  2.33GHz
  OS: FreeBSD 8.0
 
  If you really do have heavy read and write load on the server, nothing
  will save you from the bottleneck of having only 4 drives in the system
 (or
  more accurately: adding more memory will help reads but nothing helps
 writes
  except more drivers or faster (SSD) drives). If you can, add another 2
  drives in RAID 1 and move+symlink the pg_xlog directory to the new
 array.
 
  Can't do anything about this server now, but would surely keep in mind
  before upgrading other servers. Would you recommend the same speed
  drives(15K SAS) for RAID 1, or would a slower drive also work here (10K
 SAS
  or even SATA II)?

 Again, it depends on your load. It would probably be best if they are
 approximately the same speed; the location of pg_xlog will dictate
 your write (UPDATE / INSERT / CREATE) speed.

 Writes to your database go like this: the data is first written to the
 WAL (this is the pg_xlog directory - the transaction log), then it is
 read and written to the main database. If the main database is very
 busy reading, transfers from WAL to the database will be slower.


Thanks Ivan. I have to go in for upgrade of couple of more servers. I will
be going in for RAID 1 (OS + pg_xlog ) and RAID 10 (Pgsql data), all of them
of same speed.

With regards

Amitabh Kant


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Amitabh Kant
On Thu, Feb 4, 2010 at 3:29 AM, Greg Smith g...@2ndquadrant.com wrote:

  Robert Haas wrote:

 On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant amitabhk...@gmail.com 
 amitabhk...@gmail.com wrote:


  work_mem = 160MB # pg_generate_conf wizard 2010-02-03


  Overall these settings look sane, but this one looks like an
 exception.  That is an enormous value for that parameter...



 Yeah, I think I need to retune the suggestions for that parameter.  The
 idea behind the tuning profile used in the web and OLTP setups is that
 you're unlikely to have all the available connections doing something
 involving sorting at the same time with those workloads, and when it does
 happen you want it to use the fastest approach possible even if that takes
 more RAM so the client waiting for a response is more likely to get one on
 time.  That's why the work_mem figure in those situations is set very
 aggressively:  total_mem / connections, so on a 16GB server that comes out
 to the 160MB seen here.  I'm going to adjust that so that it's capped a
 little below (total_mem - shared_buffers) / connections instead.


Thanks Robert  Greg.  From what others have suggested,  I am going in for
the following changes:
/boot/loader.conf:

kern.ipc.semmni=512
kern.ipc.semmns=1024
kern.ipc.semmnu=512



/etc/sysctl.conf:

kern.ipc.shm_use_phys=1
kern.ipc.shmmax=4089446400
kern.ipc.shmall=105
kern.maxfiles=16384
kern.ipc.semmsl=1024
kern.ipc.semmap=512
vfs.ufs.dirhash_maxmem=4194304
vfs.read_max=32



/usr/local/pgsql/data/postgresql.conf:

maintenance_work_mem= 960MB# pg_generate_conf wizard
2010-02-03
checkpoint_completion_target= 0.9# pg_generate_conf wizard
2010-02-03
effective_cache_size= 11GB# pg_generate_conf wizard
2010-02-03
work_mem= 110MB# pg_generate_conf wizard
2010-02-03 Reduced as per Robert/Greg suggestions
wal_buffers= 8MB# pg_generate_conf wizard
2010-02-03
checkpoint_segments= 16# pg_generate_conf wizard
2010-02-03
shared_buffers= 3840MB# pg_generate_conf wizard
2010-02-03
max_connections= 100# pg_generate_conf wizard
2010-02-03


Hope this works out good in my case.

With regards

Amitabh Kant


Re: [PERFORM] Air-traffic benchmark

2010-02-04 Thread Simon Riggs
On Thu, 2010-01-07 at 13:38 +0100, Lefteris wrote:
 Reported query times are (in sec):
 MonetDB 7.9s
 InfoBright 12.13s
 LucidDB 54.8s

It needs to be pointed out that those databases are specifically
optimised for Data Warehousing, whereas Postgres core is optimised for
concurrent write workloads in production systems.

If you want a best-vs-best type of comparison, you should be looking at
a version of Postgres optimised for Data Warehousing. These results show
that Postgres-related options exist that clearly beat the above numbers.
http://community.greenplum.com/showthread.php?t=111
I note also that Greenplum's Single Node Edition is now free to use, so
is a reasonable product for comparison on this list.

Also, I'm unimpressed by a Data Warehouse database that requires
everything to reside in memory, e.g. MonetDB. That severely limits
real-world usability, in my experience because it implies the queries
you're running aren't ad-hoc.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Matthew Wakeling

On Thu, 4 Feb 2010, Amitabh Kant wrote:

On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote:
If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog 
directory to the new array.


Can't do anything about this server now, but would surely keep in mind
before upgrading other servers. Would you recommend the same speed
drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
or even SATA II)?


The performance requirements for the WAL are significantly lower than for 
the main database. This is for two reasons - firstly the WAL is 
write-only, and has no other activity. The WAL only gets read again in the 
event of a crash. Secondly, writes to the WAL are sequential writes, which 
is the fastest mode of operation for a disc, whereas the main database 
discs will have to handle random access.


The main thing you need to make sure of is that the WAL is on a disc 
system that has a battery-backed up cache. That way, it will be able to 
handle the high rate of fsyncs that the WAL generates, and the cache will 
convert that into a simple sequential write. Otherwise, you will be 
limited to one fsync every 5ms (or whatever the access speed of your WAL 
discs is).


If you make sure of that, then there is no reason to get expensive fast 
discs for the WAL at all (assuming they are expensive enough to not lie 
about flushing writes properly).


Matthew

--
So, given 'D' is undeclared too, with a default of zero, C++ is equal to D.
 mnw21, commenting on the Surely the value of C++ is zero, but C is now 1
 response to No, C++ isn't equal to D. 'C' is undeclared [...] C++ should
 really be called 1 response to C++ -- shouldn't it be called D?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Tory M Blue
Greetings,

I have a column that is a bigint that needs to store integers up to 19
digits long. For the most part this works but we sometimes have
numbers that are greater than 9223372036854775807.

I was thinking of changing this to a real or double precision field,
but read in the docs that the value stored is not always the value
inserted. From the docs   Inexact means that some values cannot be
converted exactly to the internal format and are stored as
approximations, so that storing and printing back out a value may show
slight discrepancies.

Is it known what level of precision is provided by the double data
type. My number will always be 19 digits long and always an integer.

I looked into the numeric data type, but the docs say that it can be slow.


Any feedback would be appreciated.
Thanks
Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Craig James

Tory M Blue wrote:

I have a column that is a bigint that needs to store integers up to 19
digits long. For the most part this works but we sometimes have
numbers that are greater than 9223372036854775807.
...
I was thinking of changing this to a real or double precision field,
but read in the docs that the value stored is not always the value
inserted...


They're actually less precise than the same size of integer.  Real/double 
datatypes trade more range for less precision in the same number of bytes.


My number will always be 19 digits long and always an integer.
I looked into the numeric data type, but the docs say that it can be slow.


If it's *always* going to be 19 digits, couldn't you make it a text or char 
field?  You didn't say if this is really a number.  Do you do arithmetic with 
it? Sort it numerically?  Or is it just a long identifier that happens to only 
used digits?

Craig James

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Tory M Blue escribió:

 I looked into the numeric data type, but the docs say that it can be slow.

It is slower than values that fit in a single CPU register, sure.  Is it
slow enough that you can't use it?  That's a different question.  I'd
give it a try -- maybe it's not all that slow.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Tory M Blue
On Thu, Feb 4, 2010 at 10:43 AM, Craig James craig_ja...@emolecules.com wrote:
 Tory M Blue wrote:

 I have a column that is a bigint that needs to store integers up to 19
 digits long. For the most part this works but we sometimes have
 numbers that are greater than 9223372036854775807.
 ...
 I was thinking of changing this to a real or double precision field,
 but read in the docs that the value stored is not always the value
 inserted...

 They're actually less precise than the same size of integer.  Real/double
 datatypes trade more range for less precision in the same number of bytes.

 My number will always be 19 digits long and always an integer.
 I looked into the numeric data type, but the docs say that it can be slow.

 If it's *always* going to be 19 digits, couldn't you make it a text or char
 field?  You didn't say if this is really a number.  Do you do arithmetic
 with it? Sort it numerically?  Or is it just a long identifier that happens
 to only used digits?

it is an identifier and is always a number and is used in grouping and
querying. I thought I would lose performance if it is text vs an
integer/double field.

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Jochen Erwied
Thursday, February 4, 2010, 7:51:37 PM you wrote:

 it is an identifier and is always a number and is used in grouping and
 querying. I thought I would lose performance if it is text vs an
 integer/double field.

Maybe using 'numeric(19)' instead of bigint is an alternative. I actually
don't know how these numbers are stored internally (some kind of BCD, or as
base-100?), but IMHO they should be faster than strings, although not as
fast as 'native' types.

-- 
Jochen Erwied |   home: joc...@erwied.eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: j...@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erw...@vodafone.de   +49-173-5404164


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Jochen Erwied escribió:

 Maybe using 'numeric(19)' instead of bigint is an alternative. I actually
 don't know how these numbers are stored internally (some kind of BCD, or as
 base-100?), but IMHO they should be faster than strings, although not as
 fast as 'native' types.

base 1 in the current implementation

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-04 Thread Robert Haas
On Thu, Feb 4, 2010 at 3:24 AM, Glenn Maynard gl...@zewt.org wrote:
 On Wed, Feb 3, 2010 at 10:05 PM, Robert Haas robertmh...@gmail.com wrote:
 Rewriting it as a join will likely be faster still:

 SELECT r.id FROM stomp_steps s, stomp_round r WHERE (s.id IS NULL OR
 r.steps_id = s.id) AND ($1 IS NULL OR r.user_card_id = $1) ORDER BY
 r.score DESC LIMIT $2

 That's not the same; this SELECT will only find the N highest scores,
 since the LIMIT applies to the whole results.  Mine finds the highest
 scores for each stage (steps), since the scope of the LIMIT is each
 call of the function (eg. find the top score for each stage as
 opposed to find the top five scores for each stage).

 That's the only reason I used a function at all to begin with--I know
 no way to do this with a plain SELECT.

Oh, I get it.  Yeah, I don't think you can do that without LATERAL(),
which we don't have, unless the window-function thing works...

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow-ish Query Needs Some Love

2010-02-04 Thread Robert Haas
On Mon, Feb 1, 2010 at 7:53 PM, Matt White mattw...@gmail.com wrote:
 I have a relatively straightforward query that by itself isn't that
 slow, but we have to run it up to 40 times on one webpage load, so it
 needs to run much faster than it does. Here it is:

 SELECT COUNT(*) FROM users, user_groups
  WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND
 user_groups.partner_id IN
  (partner_id_1, partner_id_2);

 The structure is partners have user groups which have users. In the
 test data there are over 200,000 user groups and users but only ~3000
 partners. Anyone have any bright ideas on how to speed this query up?

 Here's the query plan:

  Aggregate  (cost=12574.53..12574.54 rows=1 width=0) (actual
 time=2909.298..2909.299 rows=1 loops=1)
   -  Hash Join  (cost=217.79..12566.08 rows=3378 width=0) (actual
 time=2909.284..2909.284 rows=0 loops=1)
         Hash Cond: (users.user_group_id = user_groups.id)
         -  Seq Scan on users  (cost=0.00..11026.11 rows=206144
 width=4) (actual time=0.054..517.811 rows=205350 loops=1)
               Filter: (NOT deleted)
         -  Hash  (cost=175.97..175.97 rows=3346 width=4) (actual
 time=655.054..655.054 rows=22 loops=1)
               -  Nested Loop  (cost=0.27..175.97 rows=3346 width=4)
 (actual time=1.327..428.406 rows=22 loops=1)
                     -  HashAggregate  (cost=0.27..0.28 rows=1
 width=4) (actual time=1.259..1.264 rows=2 loops=1)
                           -  Result  (cost=0.00..0.26 rows=1
 width=0) (actual time=1.181..1.240 rows=2 loops=1)
                     -  Index Scan using user_groups_partner_id_idx
 on user_groups  (cost=0.00..133.86 rows=3346 width=8) (actual
 time=0.049..96.992 rows=11 loops=2)
                           Index Cond: (user_groups.partner_id =
 (partner_all_subpartners(3494)))


 The one obvious thing that everyone will point out is the sequential
 scan on users, but there actually is an index on users.deleted. When I
 forced sequential scanning off, it ran slower, so the planner wins
 again.

Yeah, I don't think the sequential scan is hurting you.  What is
bugging me is that it doesn't look like the plan you've posted is for
the query you've posted.  The plan shows an index condition that
references partner_all_subpartners(3494), which doesn't appear in your
original query, and also has two aggregates in it, where your posted
query only has one.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] foreign key constraint lock behavour in postgresql

2010-02-04 Thread Robert Haas
On Thu, Feb 4, 2010 at 12:40 AM,  da...@lang.hm wrote:
 I could be wrong in this (if so I know I'll be corrected :-)

 but Postgres doesn't need to lock anything for what you are describing.

 instead there will be multiple versions of the 'b1' row, one version will be
 deleted, one version that will be kept around until the first transaction
 ends, after which a vaccum pass will remove the data.

Just for kicks I tried this out and the behavior is as the OP
describes: after a little poking around, it sees that the INSERT grabs
a share-lock on the referenced row so that a concurrent update can't
modify the referenced column.

It's not really clear how to get around this.  If it were possible to
lock individual columns within a tuple, then the particular update
above could be allowed since only the name is being changed.  Does
anyone know what happens in Oracle if the update targets the id column
rather than the name column?

Another possibility is that instead of locking the row, you could
recheck that the foreign key constraint still holds at commit time.
But that seems like it could potentially be quite expensive.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: table iteration (8.3)

2010-02-04 Thread Glenn Maynard
On Thu, Feb 4, 2010 at 4:09 AM, Glenn Maynard gl...@zewt.org wrote:
 But I'll be dropping this db into 8.4 soon to see if it helps
 anything, and I'll check again (and if it's still slow I'll post more
 details).  It's been a while and I might just have been doing
 something wrong.

Windowing doesn't want to scale for this case.  I'll simplify to give
an actual test case.

create table test_users (id serial primary key);
insert into test_users (id) select generate_series(1, 1000);
create table test (id serial primary key, score integer, user_id integer);
insert into test (user_id, score) select s.id, random() * 100 from
(select generate_series(1, 1000) as id) as s, generate_series(1,
1000);
create index test_1 on test (score);
create index test_2 on test (user_id, score desc);
analyze;

This generates a thousand users, with a thousand scores each.  This
finds the top score for each user (ignoring the detail of duplicate
scores; easy to deal with):

SELECT sub.id FROM (
SELECT t.id, rank() OVER (PARTITION BY t.user_id ORDER BY score
DESC) AS rank
FROM test t
) AS sub WHERE rank = 1;

This does use the test_2 index (as intended), but it's still very
slow: 11 seconds on my system.

It seems like it's doing a *complete* scan of the index, generating
ranks for every row, and then filters out all but the first of each
rank.  That means it scales linearly with the total number of rows.
All it really needs to do is jump to each user in the index and pull
out the first entry (according to the score desc part of the test_2
index), which would make it scale linearly with the number of users.

The function version:

CREATE FUNCTION high_score_for_user(user_id int) RETURNS SETOF INTEGER
LANGUAGE SQL AS $$
   SELECT t.id FROM test t
   WHERE t.user_id = $1
   ORDER BY t.score DESC LIMIT 1
$$;
SELECT high_score_for_user(u.id) FROM test_users u;

runs in 100ms.

I think I'm stuck with either creating temporary functions with the
constants already replaced, or creating an SQL function that evaluates
a brand new query as a string as Yeb suggested.

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance