Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Bruce Momjian
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  toruvinn wrote:
  I was always wondering, though, why PostgreSQL uses this approach and not 
  its catalogs.
 
  It does use the catalog for most things.  THe flatfile is used for the
  situations where the catalogs are not yet ready to be read.
 
 Now that we have SQL-level CONNECT privilege, I wonder just how much
 functionality would be lost if we got rid of the flat files and told
 people they had to use CONNECT to do any per-user or per-database
 access control.
 
 The main point I can see offhand is that password checking would have
 to be done a lot later in the startup sequence, with correspondingly
 more cycles wasted to reject bad passwords.

Is this a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] ORDER BY ... LIMIT and JOIN

2009-08-08 Thread Fizu
Hello,

I'm trying to optimize the follow query which returns the top users
ordered by ranking. I'll show you my schema and explain analyze for
each case.

So, i'm asking two things:

1) Why ranking index is not used in the second query when sorting.
2) Am i missing some obvious optimization like a missing index? :)

Schemas:

# \d ranking
Table public.ranking
  Column   | Type  | Modifiers
---+---+---
 ranking   | bigint|
 score | double precision  |
 username  | character varying(20) | not null
 variation | bigint|
Indexes:
ranking_tmp_pkey1 PRIMARY KEY, btree (username)
idxrank_6057 btree (ranking) CLUSTER


# \d user
  Table public.user
   Column   | Type  | Modifiers
+---+---
 id | integer   | not null default
nextval('user_id_seq'::regclass)
 username   | character varying(20) | not null
 about  | text  |
 name   | character varying(50) |
 photo  | text  |
 country_id | integer   |
Indexes:
user_pkey PRIMARY KEY, btree (username)
country_ranking_user_idx btree (country_id)


Explain:

#  explain analyze SELECT * FROM ranking INNER JOIN user ON
(ranking.username = user.username) WHERE user.country_id =
1  ORDER BY ranking.ranking ASC LIMIT 100;

 QUERY PLAN
--
 Limit  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.229..965.302 rows=100 loops=1)
   -  Sort  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.227..965.256 rows=100 loops=1)
 Sort Key: ranking.ranking
 Sort Method:  top-N heapsort  Memory: 56kB
 -  Nested Loop  (cost=0.00..13.02 rows=1 width=180) (actual
time=0.049..900.847 rows=57309 loops=1)
   -  Index Scan using country_ranking_user_idx on user
 (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
rows=57309 loops=1)
 Index Cond: (country_id = 1)
   -  Index Scan using ranking_tmp_pkey1 on ranking
(cost=0.00..6.52 rows=1 width=35) (actual time=0.013..0.013 rows=1
loops=57309)
 Index Cond: ((ranking.username)::text =
(user.username)::text)
 Total runtime: 965.412 ms
(10 rows)

# explain analyze SELECT * FROM ranking INNER JOIN user ON
(ranking.username = user.username) ORDER BY
ranking.ranking ASC LIMIT 100;

QUERY PLAN
---
 Limit  (cost=0.00..137.02 rows=100 width=180) (actual
time=0.056..1.973 rows=100 loops=1)
   -  Nested Loop  (cost=0.00..3081316.65 rows=2248753 width=180)
(actual time=0.055..1.921 rows=100 loops=1)
 -  Index Scan using idxrank_6057 on ranking
(cost=0.00..70735.73 rows=2248753 width=35) (actual time=0.021..0.076
rows=100 loops=1)
 -  Index Scan using user_pkey on user  (cost=0.00..1.33
rows=1 width=145) (actual time=0.016..0.017 rows=1 loops=100)
   Index Cond: ((user.username)::text = (ranking.username)::text)
 Total runtime: 2.043 ms
(6 rows)


Thanks!
Fz

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG-related ACM Article: The Pathologies of Big Data

2009-08-08 Thread Pierre Frédéric Caillau d


I don't see how on any recent hardware, random access to RAM is slower  
than
sequential from disk.  RAM access, random or not, is measured in  
GB/sec...


I don't think anybody's arguing that.


http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2795p=5

These guys mention about 50 ns memory latency ; this would translate into  
20 million memory seeks per second, which is in the same ballpark as the  
numbers given by the article...


If you count 10GB/s bandwidth, 50 ns is the time to fetch 500 bytes.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 Now that we have SQL-level CONNECT privilege, I wonder just how much
 functionality would be lost if we got rid of the flat files and told
 people they had to use CONNECT to do any per-user or per-database
 access control.
 
 The main point I can see offhand is that password checking would have
 to be done a lot later in the startup sequence, with correspondingly
 more cycles wasted to reject bad passwords.

 Is this a TODO?

Well, it's a TO-THINK-ABOUT anyway.  I think the appropriate next step
would not be to write code, but to do a detailed investigation of what
would be gained or lost.  I don't remember exactly what we do with the
flat-file contents.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY ... LIMIT and JOIN

2009-08-08 Thread Michael Andreen
On Saturday 08 August 2009 08:02:47 Fizu wrote:
-  Index Scan using country_ranking_user_idx on user
  (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
 rows=57309 loops=1)
  Index Cond: (country_id = 1)

The planner is expecting one user with country_id = 1, but instead there are 
57309. Have you analyzed recently? Maybe increasing the statistics target will 
help.

/Michael

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 Well, it's a TO-THINK-ABOUT anyway.  I think the appropriate next step
 would not be to write code, but to do a detailed investigation of what
 would be gained or lost.  I don't remember exactly what we do with the
 flat-file contents.

Maybe what we need is not to get rid of the flat files, but to speed
them up.  If we're worried about speed in the pg_authid flatfile, and
come up with a solution to that problem, what will we do with the
pg_database flatfile when it grows too large?  We can't just get rid of
it, because autovacuum needs to access it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 ... I don't remember exactly what we do with the
 flat-file contents.

 Maybe what we need is not to get rid of the flat files, but to speed
 them up.  If we're worried about speed in the pg_authid flatfile, and
 come up with a solution to that problem, what will we do with the
 pg_database flatfile when it grows too large?  We can't just get rid of
 it, because autovacuum needs to access it.

Well, one of the components of the TODO would have to be to figure out a
way to fix autovacuum to avoid that.

Or we could consider getting rid of the pg_auth flatfile while keeping
the pg_database one, which would fix the issue for role names anyway...
but it's certainly an ugly compromise.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Tom Lane
I wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 ... I don't remember exactly what we do with the
 flat-file contents.

 Maybe what we need is not to get rid of the flat files, but to speed
 them up.  If we're worried about speed in the pg_authid flatfile, and
 come up with a solution to that problem, what will we do with the
 pg_database flatfile when it grows too large?  We can't just get rid of
 it, because autovacuum needs to access it.

 Well, one of the components of the TODO would have to be to figure out a
 way to fix autovacuum to avoid that.

Actually, I had forgotten that we were using the pg_database flatfile
for purposes other than authentication checks.  In particular, we need
it during backend startup to map from database name to database OID,
without which it's impossible to locate the system catalogs for the
target database.  It's pretty hard to see a way around that one.
We could grovel through pg_database itself, as indeed is done to rebuild
the flatfile during system start.  But that's certainly not going to be
fast in cases where there are enough DBs to make the flatfile slow.

So on third thought, Alvaro's right: the only real solution here is to
adopt a more efficient representation of the flat files.  Maybe some
sort of simple hashtable arrangement would work.  (Rendering them not so
flat anymore...)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 Actually, I had forgotten that we were using the pg_database flatfile
 for purposes other than authentication checks.  In particular, we need
 it during backend startup to map from database name to database OID,
 without which it's impossible to locate the system catalogs for the
 target database.  It's pretty hard to see a way around that one.
 We could grovel through pg_database itself, as indeed is done to rebuild
 the flatfile during system start.  But that's certainly not going to be
 fast in cases where there are enough DBs to make the flatfile slow.

Also, IIRC flatfiles were introduced precisely to avoid having to read
the catalogs manually.

 So on third thought, Alvaro's right: the only real solution here is to
 adopt a more efficient representation of the flat files.  Maybe some
 sort of simple hashtable arrangement would work.  (Rendering them not so
 flat anymore...)

As long as there's a simple API, there should be no problem.

(Except that it would be nice to be able to build the file incrementally
...  If we have to write out a million lines each time a millionth user
is created, there will still be a bottleneck at CREATE USER time.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 So on third thought, Alvaro's right: the only real solution here is to
 adopt a more efficient representation of the flat files.  Maybe some
 sort of simple hashtable arrangement would work.  (Rendering them not so
 flat anymore...)

 As long as there's a simple API, there should be no problem.

 (Except that it would be nice to be able to build the file incrementally
 ...  If we have to write out a million lines each time a millionth user
 is created, there will still be a bottleneck at CREATE USER time.)

If we allow the requirements to creep on this, we'll soon find ourselves
either using or reinventing BDB for the flatfiles.  Ick.

[ thinks for awhile... ]

In some sense this is a bootstrap problem: what does it take to get to
the point of being able to read pg_database and its indexes?  That is
necessarily not dependent on the particular database we want to join.
Maybe we could solve it by having the relcache write a global cache
file containing only entries for the global tables, and load that before
we have identified the database we want to join (after which, we'll load
another cache file for the local entries).  It would doubtless take some
rearrangement of the backend startup sequence, but it doesn't seem
obviously impossible.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 In some sense this is a bootstrap problem: what does it take to get to
 the point of being able to read pg_database and its indexes?  That is
 necessarily not dependent on the particular database we want to join.
 Maybe we could solve it by having the relcache write a global cache
 file containing only entries for the global tables, and load that before
 we have identified the database we want to join (after which, we'll load
 another cache file for the local entries).

This sounds good, because autovacuum could probably use this too.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 In some sense this is a bootstrap problem: what does it take to get to
 the point of being able to read pg_database and its indexes?  That is
 necessarily not dependent on the particular database we want to join.
 Maybe we could solve it by having the relcache write a global cache
 file containing only entries for the global tables, and load that before
 we have identified the database we want to join (after which, we'll load
 another cache file for the local entries).

 This sounds good, because autovacuum could probably use this too.

Maybe I'll look at this after commitfest is over.  I haven't messed
with the bootstrap sequence in awhile, but I used to remember how
it worked ...

As far as AV is concerned, taking this approach would likely mean
turning the launcher into a full-fledged backend just like the workers.
Do you see any problem with that?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane wrote:
  Now that we have SQL-level CONNECT privilege, I wonder just how much
  functionality would be lost if we got rid of the flat files and told
  people they had to use CONNECT to do any per-user or per-database
  access control.
  
  The main point I can see offhand is that password checking would have
  to be done a lot later in the startup sequence, with correspondingly
  more cycles wasted to reject bad passwords.
 
  Is this a TODO?
 
 Well, it's a TO-THINK-ABOUT anyway.  I think the appropriate next step
 would not be to write code, but to do a detailed investigation of what
 would be gained or lost.  I don't remember exactly what we do with the
 flat-file contents.

The flat file is the username/password sorted list.  We load that info
into the postmaster in an array that we can binary sort.

I wonder how big the postmaster process address space was when handling
2 billion users:

http://archives.postgresql.org/pgsql-bugs/2009-07/msg00176.php

It seems just storing many users in the postmaster could be burdensome,
but fixing that would be creating something like a database, which we
already have.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need suggestions on kernel settings for dedicated FreeBSD/Postgresql machine

2009-08-08 Thread Robert Haas
On Fri, Aug 7, 2009 at 5:24 PM, Culley Harrelsonharrel...@gmail.com wrote:
 Hi Everyone,

 I manage a freeBSD server that is dedicated to postgresql.  The
 machine has 4 gigs of ram and there is a single database powering a
 web application that is hosted on a neighboring machine.  The web
 application is mostly reading the database but there are considerable
 writes and I don't want to tune the machine exclusively for writes.  I
 realize more information would be needed to optimally tune the machine
 but I am seeking advice on making some sane kernel settings for a
 general purpose database on a dedicated system.  Currently I have:

 $ cat /etc/sysctl.conf

 kern.ipc.shmmax=268435456
 kern.ipc.shmall=65536

 and

 $ cat /boot/loader.conf
 kern.ipc.semmni=256
 kern.ipc.semmns=512
 kern.ipc.semmnu=256

 In postgresql.conf I have:

 max_connections = 180
 shared_buffers = 28MB

 I would like to increase this to 256 connections and make sure the
 kernel settings are giving postgresql enough breathing room without.
 I suspect my settings are conservative and since the machine is
 dedicated to postgresql I would like to give it more resources if they
 could be used.  Any suggestions?

This might be worth a look, for starters.

http://pgfoundry.org/projects/pgtune/

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance