Re: [PERFORM] Poor query performance

2009-07-15 Thread Greg Stark
On Thu, Jul 9, 2009 at 10:35 PM, Alexa...@liivid.com wrote:
 Forgot to add:

 postg...@ec2-75-101-128-4:~$ psql --version
 psql (PostgreSQL) 8.3.5


How is the index  sl_city_etc defined?

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query performance

2009-07-15 Thread Greg Stark
On Wed, Jul 15, 2009 at 8:51 AM, Alexa...@liivid.com wrote:
 Also posted this to the list.  Thanks for your answer - still
 struggling.

Staying on-list is always preferred.

 How is the index  sl_city_etc defined?

         Index public.sl_city_etc
    Column    |            Type
 --+-
  city         | text
  listing_type | text
  post_time    | timestamp without time zone
  bedrooms     | integer
  region       | text
  geo_lat      | integer
  geo_lon      | integer
 btree, for table public.source_listings

So the presence of listing_type before post_time when it's not in your
query means that the index scan has to look at every entry for
'boston'. It skips over entries that don't match the post_time or geo
columns but it still has to go through them in the index. Think of
being asked to find every word in the dictionary starting with 'a' and
whose third letter is 'k' but with no restriction on the second
letter...

You would probably be better off starting with separate indexes on
each column and then considering how to combine them if possible than
starting with them all in one index like this.

If you always have city in your query and then some collection of
other columns then you could have separate indexes on
city,listing_type, city,post_time, city, bedrooms, etc.

The geometric columns are a more interesting case. You could have
separate indexes on each and hope a bitmap scan combines them, or you
could use a geometric GIST index on point(geo_lon,geo_lat). Doing so
would mean using the right operator to find points within a box rather
than simple  and  operators.






-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Craig Ringer
On Wed, 2009-07-15 at 12:10 +0900, ning wrote:
 Hi,
 
 I am transplanting an application to use PostgreSQL8.2.4 instead of DB2 9.1.
 CLI was used to connect to DB2, and ODBC is used to connect to PostgreSQL.
 The query statement is as follows:

 PostgreSQL cost nearly the same time but DB2 ran 30 times faster in
 second execution.

Can you run your query with EXPLAIN ANALYZE and post the results, both
of the first and later executions?

-- 
Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Craig Ringer
On Wed, 2009-07-15 at 12:10 +0900, ning wrote:

 First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
 Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

Actually, on second thoughts that looks a lot like DB2 is caching the
query results and is just returning the cached results when you repeat
the query.

I'm not sure to what extent PostgreSQL is capable of result caching, but
I'd be surprised if it could do as much as DB2.

-- 
Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Greg Stark
On Wed, Jul 15, 2009 at 9:27 AM, Craig
Ringercr...@postnewspapers.com.au wrote:
 On Wed, 2009-07-15 at 12:10 +0900, ning wrote:

 First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
 Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

 Actually, on second thoughts that looks a lot like DB2 is caching the
 query results and is just returning the cached results when you repeat
 the query.


Yeah, is 6ms really a problematic response time for your system?

If so you might consider whether executing millions of small queries
is really the best approach instead of absorbing them all into queries
which operate on more records at a time. For example, it's a lot
faster to join two large tables than look up matches for every record
one by one in separate queries.

There's no question if you match up results from DB2 and Postgres one
to one there will be cases where DB2 is faster and hopefully cases
where Postgres is faster. It's only interesting if the differences
could cause problems, otherwise you'll be running around in circles
hunting down every difference between two fundamentally different
products.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread ning
Hi Craig,

The log is really long, but I compared the result of explain analyze
for first and later executions, except for 3 time=XXX numbers, they
are identical.
I agree with you that PostgreSQL is doing different level of caching,
I just wonder if there is any way to speed up PostgreSQL in this
scenario, which is a wrong way perhaps.

Thank you.
Ning


On Wed, Jul 15, 2009 at 5:27 PM, Craig
Ringercr...@postnewspapers.com.au wrote:
 On Wed, 2009-07-15 at 12:10 +0900, ning wrote:

 First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
 Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

 Actually, on second thoughts that looks a lot like DB2 is caching the
 query results and is just returning the cached results when you repeat
 the query.

 I'm not sure to what extent PostgreSQL is capable of result caching, but
 I'd be surprised if it could do as much as DB2.

 --
 Craig Ringer



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread ning
Hi Greg,

I am doing performance test by running unit test program to compare
time used on PostgreSQL and DB2. As you pointed out, there are cases
that PostgreSQL is faster. Actually in real world for my application,
repeatedly executing same query statement will hardly happen. I am
investigating this because on the performance test report
automatically generated by running unit test program, DB2 is 20-30
times faster than PostgreSQL in some test cases because of repeatedly
executed query.

I am thinking that ignoring these test cases for performance measure
is safe and acceptable, since PostgreSQL is quicker than DB2 for the
first execution.

Thank you.
Ning


On Wed, Jul 15, 2009 at 5:37 PM, Greg Starkgsst...@mit.edu wrote:
 On Wed, Jul 15, 2009 at 9:27 AM, Craig
 Ringercr...@postnewspapers.com.au wrote:
 On Wed, 2009-07-15 at 12:10 +0900, ning wrote:

 First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
 Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

 Actually, on second thoughts that looks a lot like DB2 is caching the
 query results and is just returning the cached results when you repeat
 the query.


 Yeah, is 6ms really a problematic response time for your system?

 If so you might consider whether executing millions of small queries
 is really the best approach instead of absorbing them all into queries
 which operate on more records at a time. For example, it's a lot
 faster to join two large tables than look up matches for every record
 one by one in separate queries.

 There's no question if you match up results from DB2 and Postgres one
 to one there will be cases where DB2 is faster and hopefully cases
 where Postgres is faster. It's only interesting if the differences
 could cause problems, otherwise you'll be running around in circles
 hunting down every difference between two fundamentally different
 products.

 --
 greg
 http://mit.edu/~gsstark/resume.pdf


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Scott Marlowe
On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roysuvankar@tcs.com wrote:

 Hi Scott,

 This is what I have got -
 In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) 
 on
 i686-pc-linux-gnu, compiled by GCC gcc (GCC)

 In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
 (1 row)

I wouldn't expect 8.2.x to outrun 8.3.x

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
This mail contains the asked plans :

I've done them with the different configurations, as I had done the effort of 
setting up the whole thing :)
Stats were updated between all runs. Each time is the first run of the query 
(that's what we have in production with bacula)
And I added the executor stats, in case ...

By the way, I think I must mention it, the whole thing runs over DRBD, but with 
2 gigabyte links between the master and the slave.
And I tried deactivating replication when things got really slow (despooling in 
24 hours), it changed nothing (sorts were a bit faster, 
around 20%). Server is 12 GB ram, 1 quad core xeon E5335.

PostgreSQL starts to hash filename a bit later than what I said in the first 
mail, because it's become bigger (it was around 30-40 million last time I did 
the tests).

This is the query (temp_mc is the table I've created to do my tests ...):

explain ANALYZE SELECT batch.FileIndex,
  batch.JobId,
  Path.PathId,
  Filename.FilenameId,
  batch.LStat,
  batch.MD5
  FROM temp_mc AS batch
  JOIN Path ON (batch.Path = Path.Path)
  JOIN Filename ON (batch.Name = Filename.Name);


Plan 1
around 1 million records to insert, seq_page_cost 1, random_page_cost 4

LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!   380.143452 elapsed 79.000938 user 44.386774 system sec
!   [415.785985 user 155.733732 sys total]
!   15848728/12934936 [24352752/50913184] filesystem blocks in/out
!   0/44188 [86/987512] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   93812/40706 [405069/184511] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks: 877336 read,  0 written, buffer hit rate = 
6.75%
!   Local  blocks:  0 read,  0 written, buffer hit rate = 
0.00%
!   Direct blocks:  0 read,  0 written

 Hash Join  (cost=3923929.71..5131377.91 rows=1286440 width=91) (actual 
time=234021.194..380018.709 rows=1286440 loops=1)
   Hash Cond: (batch.name = filename.name)
   -  Hash Join  (cost=880140.87..1286265.62 rows=1286440 width=102) (actual 
time=23184.959..102400.782 rows=1286440 loops=1)
 Hash Cond: (batch.path = path.path)
 -  Seq Scan on temp_mc batch  (cost=0.00..49550.40 rows=1286440 
width=189) (actual time=0.007..342.396 rows=1286440 loops=1)
 -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual 
time=23184.196..23184.196 rows=16732049 loops=1)
   -  Seq Scan on path  (cost=0.00..425486.72 rows=16746972 
width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)
   -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual 
time=210831.840..210831.840 rows=79094418 loops=1)
 -  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615 
width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)
 Total runtime: 380136.601 ms


Plan 2
the same insert, with seq_page_cost to 0.01 and random_page_cost to 0.02

DETAIL:  ! system usage stats:
!   42.378039 elapsed 28.277767 user 12.192762 system sec
!   [471.865489 user 180.499280 sys total]
!   0/4072368 [24792848/59059032] filesystem blocks in/out
!   0/0 [86/989858] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   1061/9131 [429738/200320] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks: 251574 read,  0 written, buffer hit rate = 
96.27%
!   Local  blocks:  0 read,  0 written, buffer hit rate = 
0.00%
!   Direct blocks:  0 read,  0 written
LOG:  duration: 42378.373 ms  statement: 
  QUERY PLAN
---
 Hash Join  (cost=381840.21..1012047.92 rows=1286440 width=91) (actual 
time=20284.387..42242.955 rows=1286440 loops=1)
   Hash Cond: (batch.path = path.path)
   -  Nested Loop  (cost=0.00..583683.91 rows=1286440 width=178) (actual 
time=0.026..10333.636 rows=1286440 loops=1)
 -  Seq Scan on temp_mc batch  (cost=0.00..13231.26 rows=1286440 
width=189) (actual time=0.008..380.361 rows=1286440 loops=1)
 -  Index Scan using filename_name_idx on filename  (cost=0.00..0.43 
rows=1 width=35) (actual time=0.006..0.007 rows=1 loops=1286440)
   Index Cond: (filename.name = batch.name)
   -  Hash  (cost=170049.89..170049.89 rows=16746972 width=92) (actual 
time=20280.729..20280.729 rows=16732049 loops=1)
 -  Seq Scan on path  (cost=0.00..170049.89 rows=16746972 width=92) 
(actual time=0.005..4560.872 rows=16732049 loops=1)
 Total runtime: 42371.362 ms


The thing is that this query is ten 

Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Alex Goncharov
,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) *
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB 
| as well as a Greenplum DB.
| 
| The Primary key is a composite one comprising of 2 columns (so_no, 
| serial_no).
| 
| The execution of the following query takes 8214.016 ms in Greenplum but 
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
| 
| I believe that execution time in greenplum should be less compared to 
| postgres. Can anybody throw some light, it would be of great help.

Why do you believe so?

Is your data distributed and served by separate segment hosts?  By how
many?  Is the network connectivity not a factor?  What happens with
the times if you don't sort your result set?

-- Alex -- alex-goncha...@comcast.net --


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Alex Goncharov
,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) *
| Yes, I have got 2 segments and a master host. So, in a way processing 
| should be faster in Greenplum.

No, it should not: it all depends on your data, SQL statements and
setup.

In my own experiments, with small amounts of stored data, PostgreSQL
beats Greenplum, which doesn't surprise me a bit.

You need to know where most of the execution time goes -- maybe to
sorting?  And sorting in Greenplum, isn't it done on one machine, the
master host?  Why would that be faster than in PostgreSQL?
|
| For other queries though, results are satisfactory or at least comparable, 
| like-
| 
| select distinct so_no, serial_no from observation_all;
| in postgres it takes - 1404.238 ms
| in gp it takes - 1217.283 ms

No surprise here: the data is picked by multiple segment hosts and
never sorted on the master.

-- Alex -- alex-goncha...@comcast.net --


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Alvaro Herrera
Marc Cousin escribió:

 There are other things I am thinking of : maybe it would be better to have 
 sort space on another (and not DBRD'ded) raid set ? we have a quite
 cheap setup right now for the database, and I think maybe this would help 
 scale better. I can get a filesystem in another volume group, which is not 
 used that much for now.

You know, that's the first thing it came to me when I read you're using
DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
Le Wednesday 15 July 2009 15:45:01, Alvaro Herrera a écrit :
 Marc Cousin escribió:
  There are other things I am thinking of : maybe it would be better to
  have sort space on another (and not DBRD'ded) raid set ? we have a quite
  cheap setup right now for the database, and I think maybe this would help
  scale better. I can get a filesystem in another volume group, which is
  not used that much for now.

 You know, that's the first thing it came to me when I read you're using
 DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

I wish I could easily. I'm not entitled to tune the database, only to give 
directives. I've given this one, but I don't know when it will be done. I'll 
keep you informed on this one, but I don't have my hopes too high.

As mentionned before, I tried to deactivate DRBD (still using the DRBD device, 
but not connected to the other node, so it has almost no effect). It didn't 
change much (performance was a bit (around 20% better).

Anyway, the thing is that :
- big sorts kill my machine when there are more that 5 of them. I think it is 
a system problem (raid, filesystem, linux tuning, don't really know, I'll have 
to dig into this, but it will be complicated, for human reasons :) )
- the plan through nested loops is faster anyway, and I think it's because 
there is only a small fraction of filename and path that is used (most files 
backed up have the same name or path, as we save 600 machines with mostly 2 
OSes, linux and windows), so the hot parts of these 2 tables are extremely 
likely to be in the database or linux cache (buffer hit rate was 97% in the 
example provided). Moreover, the first two queries of the insert procedure fill 
the cache for us...




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
Lauris Ulmanis wrote:
 Hello again!
 
 I did test on my local test server
 
 I created up 500 000 users in function loop very quickly - within 48
 seconds. I did again this script reaching up to 1 billion users - results
 was the same - 48 seconds. It is very quickly.
 
 But problem seems is with transaction preparation because if in database is
 1 billion users and I want to create 1 new - it will take 4 seconds! 
 
 After that I generated up to 2 billion users in this server (generation
 process took just 1.44 minutes of times - again quickly).
 
 And did 1 user creation again - now it took 9 seconds of time!
 
 What is a reason of this slowness? Is there a workaround or solution how to
 avoid it? 

My bet is on the pg_auth flat file.  I doubt we have ever tested the
behavior of that code with 1 billion users ...

Do you really need 1 billion users?  Are you planning on giving accounts
to every human being in the planet or what?  I mean, what's the point of
this test?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor overall performance unless regular VACUUM FULL

2009-07-15 Thread Scott Marlowe
On Wed, Jul 15, 2009 at 6:51 AM, Wayne Conradwa...@databill.com wrote:
 On Tue, 14 Jul 2009, Scott Marlowe wrote:

 Just wondering, which pgsql version, and also, do you have
 autovacuum turned on?

 Dang, I should have said in my initial message.  8.3.6, and autovacuum
 is turned on and has plenty of log activity.

Are you guys doing anything that could be deemed pathological, like
full table updates on big tables over and over?  Had an issue last
year where a dev left a where clause off an update to a field in one
of our biggest tables and in a few weeks the database was so bloated
we had to take it offline to fix the problem.  After fixing the query.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
toruvinn wrote:
 On Wed, 15 Jul 2009 16:02:09 +0200, Alvaro Herrera  
 alvhe...@commandprompt.com wrote:
 My bet is on the pg_auth flat file.  I doubt we have ever tested the
 behavior of that code with 1 billion users ...

 I was always wondering, though, why PostgreSQL uses this approach and not 
 its catalogs.

It does use the catalog for most things.  THe flatfile is used for the
situations where the catalogs are not yet ready to be read.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 toruvinn wrote:
 I was always wondering, though, why PostgreSQL uses this approach and not 
 its catalogs.

 It does use the catalog for most things.  THe flatfile is used for the
 situations where the catalogs are not yet ready to be read.

Now that we have SQL-level CONNECT privilege, I wonder just how much
functionality would be lost if we got rid of the flat files and told
people they had to use CONNECT to do any per-user or per-database
access control.

The main point I can see offhand is that password checking would have
to be done a lot later in the startup sequence, with correspondingly
more cycles wasted to reject bad passwords.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster index on a table

2009-07-15 Thread Ibrahim Harrani
Hi,

thanks for your suggestion.
Is there any benefit of setting fillfactor to 70 or 80 on this table?



On Wed, Jun 24, 2009 at 8:42 PM, Scott Marlowescott.marl...@gmail.com wrote:
 As another poster pointed out, you cluster on ONE index and one index
 only.  However, you can cluster on a multi-column index.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Marko Kreen
On 7/15/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  toruvinn wrote:
   I was always wondering, though, why PostgreSQL uses this approach and not
   its catalogs.

   It does use the catalog for most things.  THe flatfile is used for the
   situations where the catalogs are not yet ready to be read.


 Now that we have SQL-level CONNECT privilege, I wonder just how much
  functionality would be lost if we got rid of the flat files and told
  people they had to use CONNECT to do any per-user or per-database
  access control.

  The main point I can see offhand is that password checking would have
  to be done a lot later in the startup sequence, with correspondingly
  more cycles wasted to reject bad passwords.

From security standpoint, wasting more cycles on bad passwords is good,
as it decreases the rate bruteforce password scanning can happen.

And I cannot imagine a scenario where performance on invalid logins
can be relevant..

-- 
marko

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread David Wilson
On Wed, Jul 15, 2009 at 11:10 AM, Marko Kreenmark...@gmail.com wrote:

 From security standpoint, wasting more cycles on bad passwords is good,
 as it decreases the rate bruteforce password scanning can happen.

 And I cannot imagine a scenario where performance on invalid logins
 can be relevant..

DoS attacks. The longer it takes to reject an invalid login, the fewer
invalid login attempts it takes to DoS the server.

-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Tom Lane
David Wilson david.t.wil...@gmail.com writes:
 On Wed, Jul 15, 2009 at 11:10 AM, Marko Kreenmark...@gmail.com wrote:
 From security standpoint, wasting more cycles on bad passwords is good,
 as it decreases the rate bruteforce password scanning can happen.
 
 And I cannot imagine a scenario where performance on invalid logins
 can be relevant..

 DoS attacks. The longer it takes to reject an invalid login, the fewer
 invalid login attempts it takes to DoS the server.

Yeah, but even with the current setup, an attacker who can fire
connection request packets at your postmaster port is not going to have
any trouble DoS'ing the service.  We expend quite a lot of cycles before
getting to the password challenge already.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Marko Kreen
On 7/15/09, David Wilson david.t.wil...@gmail.com wrote:
 On Wed, Jul 15, 2009 at 11:10 AM, Marko Kreenmark...@gmail.com wrote:
   From security standpoint, wasting more cycles on bad passwords is good,
   as it decreases the rate bruteforce password scanning can happen.
  
   And I cannot imagine a scenario where performance on invalid logins
   can be relevant..


 DoS attacks. The longer it takes to reject an invalid login, the fewer
  invalid login attempts it takes to DoS the server.

No, this is not a good argument against it.  Especially if you consider
that DoS via hanging-connect or SSL is still there.

Compared to minor DoS, the password-leakage is much worse danger.

-- 
marko

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Scott Mead
On Wed, Jul 15, 2009 at 9:18 AM, Alex Goncharov
alex-goncha...@comcast.netwrote:

 ,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) *
 | Yes, I have got 2 segments and a master host. So, in a way processing
 | should be faster in Greenplum.

 No, it should not: it all depends on your data, SQL statements and
 setup.

 In my own experiments, with small amounts of stored data, PostgreSQL
 beats Greenplum, which doesn't surprise me a bit.


Agreed.  You're only operating on 99,000 rows.  That isn't really
enough rows to exercise the architecture of shared-nothing clusters.
Now, I don't know greenplum very well, but I am familiar with another
warehousing product
with approximately the same architecture behind
it.  From all the testing I've done, you need to get into the 50
million plus row range before the architecture starts to be really
effective.  99,000 rows probably fits completely into memory on the
machine that you're testing PG with, so your test really isn't fair.
 On one PG box, you're just doing memory reads, and maybe some high-speed
disk access, on the Greenplum setup, you've got network overhead on top of
all that.  Bottom
line: You need to do a test with a number of rows that won't fit into
memory, and won't be very quickly scanned from disk into memory.  You
need a LOT of data.

--Scott


Re: [PERFORM] CREATE USER command slows down when user count per server reaches up to 500 000

2009-07-15 Thread Haszlakiewicz, Eric
-Original Message-
From: pgsql-performance-ow...@postgresql.org 

When users count in Postgres database reaches up to 500 000 - database
command of creating users 'CREATE USER' slows down to 5-10 
seconds per user.

What could be a reason of this problem and is there any solution how to
avoid it?

For each of user can be associated up to 10 roles with grants to system
objects.

I have no idea about the performance issues, but I'm curious: how/why do
you have so many users accessing your database?  I'm drawing a blank on
coming up with a use case where that many users are needed.

eric

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Mike Ivanov

ning wrote:
The log is really long, 


Which usually signals a problem with the query.


but I compared the result of explain analyze
for first and later executions, except for 3 time=XXX numbers, they
are identical.
  


They are supposed to be identical unless something is really badly broken.


I agree with you that PostgreSQL is doing different level of caching,
I just wonder if there is any way to speed up PostgreSQL in this
scenario, 
  


This is what EXPLAIN ANALYZE for. Could you post the results please?

Cheers,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Andres Freund
On Wednesday 15 July 2009 10:27:50 Craig Ringer wrote:
 On Wed, 2009-07-15 at 12:10 +0900, ning wrote:
  First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
  Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

 Actually, on second thoughts that looks a lot like DB2 is caching the
 query results and is just returning the cached results when you repeat
 the query.
Are you sure getting the query *result* is causing the delay? If my faint 
memory serves right DB2 does plan caching - PG does not.
To test this theory you could prepare it and execute it twice.

Prepare it:
PREPARE test_query AS SELECT void,nameId,tag FROM (SELECT void,nameId,tag, 
FROM Attr
WHERE attributeof IN (SELECT oid_ FROM ItemView WHERE
ItemView.ItemId=?)) x RIGHT OUTER JOIN (SELECT oid_ FROM ItemView
WHERE ItemView.ItemId=? and ItemView.assignedTo_=?) y ON attributeof =
oid_ FOR READ ONLY;


Execute it:
EXECUTE test_query;
EXECUTE test_query;

Greetings, 

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Devin Ben-Hur

Marc Cousin wrote:

This mail contains the asked plans :
Plan 1
around 1 million records to insert, seq_page_cost 1, random_page_cost 4



 -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual 
time=23184.196..23184.196 rows=16732049 loops=1)
   -  Seq Scan on path  (cost=0.00..425486.72 rows=16746972 
width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)



   -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual 
time=210831.840..210831.840 rows=79094418 loops=1)
 -  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615 
width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)


This doesn't address the cost driving plan question, but I think it's a 
bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs, 
while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row 
ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's 
some terrible bloat on filename that's not present on path?  If that seq 
scan time on filename were proportionate to path this plan would 
complete about two minutes faster (making it only 6 times slower instead 
of 9 :).


--
-Devin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor overall performance unless regular VACUUM FULL

2009-07-15 Thread Scott Carey

On 7/14/09 9:53 PM, David Wilson david.t.wil...@gmail.com wrote:

 On Mon, Jul 13, 2009 at 3:31 PM, Wayne Conradwa...@databill.com wrote:
 Howdy.  Some months back, when advised on one of these lists that it
 should not be necessary to issue VACUUM FULL/REINDEX DATABASE, we quit
 this nightly maintenance practice.  We've been very happy to not
 have to do that, since it locked the database all night.  Since then,
 however, our database performance has decreased.  The decrease took a
 few weeks to become noticable; perhaps six weeks to become awful.
 
  snip
 
 I did a VACUUM VERBOSE and looked at the statistics at the end; they
 seem to indicated that my max_fsm_pages is large enough to keep track
 of all of the dead rows that are being created (we do a fair amount of
 deleting as well as inserting).  Postgres prints no complaint saying
 we need more slots, and we have more than the number of slots needed
 (if I recall, about twice as many).
 
 What options do I have for restoring performance other than VACUUM
 FULL/REINDEX DATABASE?
 
 
 Do you have autovacuum on, or otherwise replaced your VACUUM FULL with
 regular VACUUM? The symptoms are pretty classically those of table
 bloat. Since it's gotten so out of hand now, a VACUUM FULL/REINDEX is
 probably what you'll need to fix it.

If you go that route, do a REINDEX first.  You probably want to know whether
it is mostly index or table bloat that is the majority of the problem.

Adjusting each table and index FILLFACTOR may also help.

However, if it has bloated this much, you may have some long living
transactions that make it hard for postgres to recycle free space.

And as others have said, certain things can cause a lot of bloat that only
CLUSTER or VACUUM FULL will reclaim well -- especially updating all or most
rows in a table, or otherwise doing very large bulk delete or update.

 
 Going forward, you need *some* vacuuming strategy. Autovacuum is
 probably best, especially if you're on 8.3. If not autovacuum for some
 reason, you *must* at least do regular vacuums.
 
 Vacuum full/reindex is for fixing the situation you're in now, but a
 regular vacuum strategy should prevent you from getting back into it.
 
 --
 - David T. Wilson
 david.t.wil...@gmail.com
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster index on a table

2009-07-15 Thread Scott Carey
If you have a lot of insert/update/delete activity on a table fillfactor can 
help.

I don't believe that postgres will try and maintain the table in the cluster 
order however.


On 7/15/09 8:04 AM, Ibrahim Harrani ibrahim.harr...@gmail.com wrote:

Hi,

thanks for your suggestion.
Is there any benefit of setting fillfactor to 70 or 80 on this table?



On Wed, Jun 24, 2009 at 8:42 PM, Scott Marlowescott.marl...@gmail.com wrote:
 As another poster pointed out, you cluster on ONE index and one index
 only.  However, you can cluster on a multi-column index.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Poor overall performance unless regular VACUUM FULL

2009-07-15 Thread Scott Marlowe
On Wed, Jul 15, 2009 at 4:03 PM, Wayne Conradwa...@databill.com wrote:
 On Tue, 14 Jul 2009, Scott Marlowe wrote:

 Are you guys doing anything that could be deemed pathological, like
 full table updates on big tables over and over?  Had an issue last
 year where a dev left a where clause off an update to a field in one
 of our biggest tables and in a few weeks the database was so bloated
 we had to take it offline to fix the problem.  After fixing the
 query.

 I've just audited the source, looking for any updates without where
 clauses.  None jumped out to bite me.

 Almost everything we do happens in transactions which can occasionally
 take 10-20 minutes to complete and span thousands or tens of thousands
 of rows across multiple tables.  Are long-running transactions a
 culprit in table bloat?

 I've also used contrib/pgstattuple to try to identify which of our
 large tables and indices are experiencing bloat.  Here are the
 pgstattuple results for our largest tables:

Ouch hurts my eyes :)  Can you see something like table_len,
dead_tuple_percent, free_percent order by dead_tuple_percent desc
limit 10 or something like that maybe?


 table_len:          56639488
 tuple_count:        655501
 tuple_len:          53573112
 tuple_percent:      94.59
 dead_tuple_count:   0
 dead_tuple_len:     0
 dead_tuple_percent: 0
 free_space:         251928
 free_percent:       0.44
 table_name:         status
Lots more rows deleted.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Scott Carey


On 7/15/09 4:56 PM, Devin Ben-Hur dben...@whitepages.com wrote:

 Marc Cousin wrote:
 This mail contains the asked plans :
 Plan 1
 around 1 million records to insert, seq_page_cost 1, random_page_cost 4
 
  -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual
 time=23184.196..23184.196 rows=16732049 loops=1)
-  Seq Scan on path  (cost=0.00..425486.72 rows=16746972
 width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)
 
-  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
 time=210831.840..210831.840 rows=79094418 loops=1)
  -  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615
 width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)
 
 This doesn't address the cost driving plan question, but I think it's a
 bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
 while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
 ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
 some terrible bloat on filename that's not present on path?  If that seq
 scan time on filename were proportionate to path this plan would
 complete about two minutes faster (making it only 6 times slower instead
 of 9 :).


Bloat is possible.  This can be checked with VACUUM VERBOSE on the table.
Postgres has a habit of getting its table files fragmented too under certain
use cases.
Additionally, some of the table pages may have been cached in one use case
and not in another.
 
 --
 -Devin
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread ning
Hi Mike,

Thank you for your explanation.
The explain analyze command used is as follows, several integers are
bound to '?'.
-
SELECT 
oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM (SELECT 
attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM
JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) AND nameId in
(?)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE
JobView.JobId=? and JobView.assignedTo_=?) y ON attributeof = oid_ FOR
READ ONLY
-

The result of the command is
-

 QUERY PLAN
---
 Nested Loop Left Join  (cost=575.60..1273.15 rows=81 width=568)
(actual time=0.018..0.018 rows=0 loops=1)
   Join Filter: (x.attributeof = j1.oid_)
   -  Index Scan using job_tc1 on job j1  (cost=0.00..8.27 rows=1
width=4) (actual time=0.016..0.016 rows=0 loops=1)
 Index Cond: ((assignedto_ = 888) AND (jobid = 0))
   -  Merge Left Join  (cost=575.60..899.41 rows=16243 width=564)
(never executed)
 Merge Cond: (v.void = b.void)
 -  Merge Left Join  (cost=470.77..504.87 rows=2152
width=556) (never executed)
   Merge Cond: (v.void = res.void)
   -  Sort  (cost=373.61..374.39 rows=310 width=544)
(never executed)
 Sort Key: v.void
 -  Hash Left Join  (cost=112.07..360.78 rows=310
width=544) (never executed)
   Hash Cond: (v.void = i.void)
   -  Hash Left Join  (cost=65.40..303.17
rows=38 width=540) (never executed)
 Hash Cond: (v.void = r.void)
 -  Hash Left Join
(cost=21.42..257.86 rows=5 width=532) (never executed)
   Hash Cond: (v.void = s.void)
   -  Nested Loop Left Join
(cost=8.27..244.65 rows=5 width=16) (never executed)
 Join Filter: (v.containedin = a.id)
 -  Nested Loop
(cost=8.27..16.57 rows=1 width=12) (never executed)
   -  HashAggregate
(cost=8.27..8.28 rows=1 width=4) (never executed)
 -  Index
Scan using job_tc1 on job j1  (cost=0.00..8.27 rows=1 width=4) (never
executed)
   Index
Cond: ((assignedto_ = 888) AND (jobid = 0))
   -  Index Scan
using attribute_tc1 on attribute a  (cost=0.00..8.27 rows=1 width=12)
(never executed)
 Index Cond:
((a.attributeof = j1.oid_) AND (a.nameid = 6))
 -  Append
(cost=0.00..137.60 rows=7239 width=12) (never executed)
   -  Index Scan
using attribute_value_i on attribute_value v  (cost=0.00..5.30 rows=9
width=12) (never executed)
 Index Cond:
(v.containedin = a.id)
   -  Seq Scan on
string_value v  (cost=0.00..11.40 rows=140 width=12) (never executed)
   -  Seq Scan on
integer_value v  (cost=0.00..26.30 rows=1630 width=12) (never
executed)
   -  Seq Scan on
bigint_value v  (cost=0.00..25.10 rows=1510 width=12) (never executed)
   -  Seq Scan on
rangeofint_value v  (cost=0.00..25.10 rows=1510 width=12) (never
executed)
   -  Seq Scan on
resolution_value v  (cost=0.00..24.00 rows=1400 width=12) (never
executed)
   -  Seq Scan on
opaque_value v  (cost=0.00..20.40 rows=1040 width=12) (never executed)
   -  Hash  (cost=11.40..11.40
rows=140 width=520) (never executed)
 -  Seq Scan on
string_value s  (cost=0.00..11.40 rows=140 width=520) (never executed)
 -  Hash  (cost=25.10..25.10
rows=1510 width=12) (never executed)
   -  Seq Scan on
rangeofint_value r  (cost=0.00..25.10 rows=1510 width=12) (never
executed)
   -  Hash  (cost=26.30..26.30 rows=1630
width=8) (never executed)
 -  Seq Scan on integer_value i
(cost=0.00..26.30 rows=1630 width=8) (never executed)
   -  Sort  (cost=97.16..100.66 rows=1400 width=16)
(never executed)
 Sort Key: res.void

Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread ning
Hi Andres,

The log for the test you suggested is as follows in PostgreSQL8.2.4,
but I cannot find a clue to prove or prove not PostgreSQL is doing
plan caching.

Best regards,
Ning

-
job=# prepare test_query as SELECT
oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM (SELECT 
attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM
JobView WHERE JobView.JobId=100 and JobView.assignedTo_=1) AND nameId
in (6)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE
JobView.JobId=100 and JobView.assignedTo_=1) y ON attributeof = oid_
FOR READ ONLY
;
PREPARE
job=# execute test_query;
 oid_ | void | nameid | tag | intval | lowerbound | upperbound |
crossfeeddir | feeddir | units | opqval | bigval | strval
--+--++-++++--+-+---+++
  101 |  || ||||
   | |   |||
(1 row)

job=# execute test_query;
 oid_ | void | nameid | tag | intval | lowerbound | upperbound |
crossfeeddir | feeddir | units | opqval | bigval | strval
--+--++-++++--+-+---+++
  101 |  || ||||
   | |   |||
(1 row)
-

On Thu, Jul 16, 2009 at 7:51 AM, Andres Freundand...@anarazel.de wrote:
 On Wednesday 15 July 2009 10:27:50 Craig Ringer wrote:
 On Wed, 2009-07-15 at 12:10 +0900, ning wrote:
  First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds
  Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds

 Actually, on second thoughts that looks a lot like DB2 is caching the
 query results and is just returning the cached results when you repeat
 the query.
 Are you sure getting the query *result* is causing the delay? If my faint
 memory serves right DB2 does plan caching - PG does not.
 To test this theory you could prepare it and execute it twice.

 Prepare it:
 PREPARE test_query AS SELECT void,nameId,tag FROM (SELECT void,nameId,tag,
 FROM Attr
 WHERE attributeof IN (SELECT oid_ FROM ItemView WHERE
 ItemView.ItemId=?)) x RIGHT OUTER JOIN (SELECT oid_ FROM ItemView
 WHERE ItemView.ItemId=? and ItemView.assignedTo_=?) y ON attributeof =
 oid_ FOR READ ONLY;


 Execute it:
 EXECUTE test_query;
 EXECUTE test_query;

 Greetings,

 Andres


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Greg Smith

On Mon, 13 Jul 2009, Suvankar Roy wrote:


I believe that execution time in greenplum should be less compared to postgres.


Well, first off you don't even mention which PostgreSQL or Greenplum 
version you're comparing, which leaves a lot of variables we can't account 
for.  Second, you'd need to make sure that the two servers had as close to 
identical server parameter configurations as possible to get a fair 
comparison (the postgresql.conf file).  Next, you need to make sure the 
data has been loaded and analyzed similarly on the two--try using VACUUM 
ANALYZE on both systems before running your query, then EXPLAIN ANALYZE 
on both setups to get an idea if they're using the same plan to pull data 
from the disk, you may discover there's a radical different there.


...and even if you did all that, this still wouldn't be the right place to 
ask about Greenplum's database product.  You'll end up with everyone mad 
at you.  Nobody likes have benchmarks that show their product in a bad 
light published, particularly if they aren't completely fair.  And this 
list is dedicated to talking about the open-source PostgreSQL versions. 
Your question would be more appropriate to throw in Greenplum's direction. 
The list I gave above is by no means even comprehensive--there are plenty 
of other ways you can end up doing an unfair comparison here (using 
different paritions on the same disk which usually end up with different 
speeds comes to mind).


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance comparison between Postgres and Greenplum

2009-07-15 Thread Greg Smith

On Wed, 15 Jul 2009, Scott Marlowe wrote:


On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roysuvankar@tcs.com wrote:


Hi Scott,

This is what I have got -
In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
i686-pc-linux-gnu, compiled by GCC gcc (GCC)



In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
(1 row)


I wouldn't expect 8.2.x to outrun 8.3.x


And you can't directly compare performance of a system running Linux with 
one running Windows, even if they're the same hardware.  Theoretically, 
Linux should have an advantage, but only if you're accounting for a whole 
stack of other variables.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster index on a table

2009-07-15 Thread Scott Marlowe
I'd love to see it.

On Wed, Jul 15, 2009 at 8:17 PM, Justin Pittsjustinpi...@gmail.com wrote:
 Is there any interest in adding that (continual/automatic cluster
 order maintenance) to a future release?

 On Wed, Jul 15, 2009 at 8:33 PM, Scott Careysc...@richrelevance.com wrote:
 If you have a lot of insert/update/delete activity on a table fillfactor can
 help.

 I don’t believe that postgres will try and maintain the table in the cluster
 order however.


 On 7/15/09 8:04 AM, Ibrahim Harrani ibrahim.harr...@gmail.com wrote:

 Hi,

 thanks for your suggestion.
 Is there any benefit of setting fillfactor to 70 or 80 on this table?



 On Wed, Jun 24, 2009 at 8:42 PM, Scott Marlowescott.marl...@gmail.com
 wrote:
 As another poster pointed out, you cluster on ONE index and one index
 only.  However, you can cluster on a multi-column index.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance






-- 
When fascism comes to America, it will be intolerance sold as diversity.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Concurrency issue under very heay loads

2009-07-15 Thread Raji Sridar (raji)
Hi,
 
We use a typical counter within a transaction to generate order sequence number 
and update the next sequence number. This is a simple next counter - nothing 
fancy about it.  When multiple clients are concurrently accessing this table 
and updating it, under extermely heavy loads in the system (stress testing), we 
find that the same order number is being generated for multiple clients. Could 
this be a bug? Is there a workaround? Please let me know.
 
Thanks
Raji


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
 Marc Cousin wrote:
  This mail contains the asked plans :
  Plan 1
  around 1 million records to insert, seq_page_cost 1, random_page_cost 4
 
   -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
  (actual time=23184.196..23184.196 rows=16732049 loops=1) -  Seq Scan on
  path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
  time=0.004..7318.850 rows=16732049 loops=1)
 
 -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
  time=210831.840..210831.840 rows=79094418 loops=1) -  Seq Scan on
  filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
  time=46.324..148887.662 rows=79094418 loops=1)

 This doesn't address the cost driving plan question, but I think it's a
 bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
 while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
 ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
 some terrible bloat on filename that's not present on path?  If that seq
 scan time on filename were proportionate to path this plan would
 complete about two minutes faster (making it only 6 times slower instead
 of 9 :).
Much simpler than that I think : there is a bigger percentage of path that is 
used all the time than of filename. The database used is the production 
database, so there were other insert queries running a few minutes before I 
got this plan.

But I'll give it a look today and come back with bloat and cache information 
on these 2 tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Concurrency issue under very heay loads

2009-07-15 Thread ramasubramanian
Hi,
Are you using automatic sequence increment in table?
  - Original Message - 
  From: Raji Sridar (raji) 
  To: pgsql-gene...@postgresql.org ; pgsql-performance@postgresql.org 
  Sent: Thursday, July 16, 2009 10:29 AM
  Subject: [PERFORM] Concurrency issue under very heay loads


  Hi,

  We use a typical counter within a transaction to generate order sequence 
number and update the next sequence number. This is a simple next counter - 
nothing fancy about it.  When multiple clients are concurrently accessing this 
table and updating it, under extermely heavy loads in the system (stress 
testing), we find that the same order number is being generated for multiple 
clients. Could this be a bug? Is there a workaround? Please let me know.

  Thanks
  Raji