[PERFORM] Bad query plan with high-cardinality column

2013-02-28 Thread Alexander Staubo
I have a planner problem that looks like a bug, but I'm not familiar enough 
with how planner the works to say for sure.

This is my schema:

create table comments (
  id serial primary key,
  conversation_id integer,
  created_at timestamp
);
create index comments_conversation_id_index on comments (conversation_id);
create index comments_created_at_index on comments (created_at);

The table has 3.5M rows, and 650k unique values for conversation_id, where 
the histogram goes up to 54000 rows for the most frequent ID, with a long tail. 
There are only 20 values with a frequency of 1000 or higher. The created_at 
column has 3.5M distinct values.

Now, I have this query:

select comments.id from comments where
  conversation_id = 3975979 order by created_at limit 13


This filters about 5000 rows and returns the oldest 13 rows. But the query is 
consistently planned wrong:

Limit  (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 
rows=13 loops=1)
  Buffers: shared hit=2400709 read=338923 written=21

  -  Index Scan using comments_created_at_index on comments  
(cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 
rows=13 loops=1)
Filter: (conversation_id = 3975979) 

Rows Removed by Filter: 2817751 

Buffers: shared hit=2400709 read=338923 written=21  

Total runtime: 3179.553 ms  




It takes anywhere between 3 seconds and several minutes to run, depending on 
how warm the disk cache is. This is the correct plan and index usage:

Limit  (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 
rows=13 loops=1)
  
  Buffers: shared hit=197 read=4510 

  
  -  Sort  (cost=6214.34..6228.02 rows=5471 width=12) (actual 
time=25.469..25.470 rows=13 loops=1)
   
Sort Key: created_at

  
Sort Method: top-N heapsort  Memory: 25kB   

  
Buffers: shared hit=197 read=4510   

  
-  Index Scan using comments_conversation_id_index on comments  
(cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 
loops=1)
  Index Cond: (conversation_id = 3975979)   

  
  Buffers: shared hit=197 read=4510 

  
Total runtime: 25.500 ms

  




Now, the problem for Postgres is obviously to estimate how many rows have a 
given conversation_id value, but it does have that number. I'm at a loss how 
to explain why it thinks scanning a huge index that covers the entire table 
will ever beat scanning a small index that has 17% of the table's values.

It will consistently use the bad plan for higher-frequency values, and the good 
plan for lower-frequency values.

If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose 
the correct plan. This behaviour actually seems related to 
effective_cache_size; if it's set small (128MB), the planner will sometimes 
favour the good plan, but if large (2GB) it will consistently use the bad plan.

I have bumped the statistics target up to 1, but it does not help. I have 
also tried setting n_distinct for the column manually, since Postgres guesses 
it's 285k instead of 650k, but that does not help.

What is odd is that the bad plan is really never correct in any situation for 
this query. It will *always* be better to branch off the 
comments_conversation_id_index index.

Our environment: 9.2.2, tweaked memory parameters (work_mem, 

[PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
I have a problem with a query that is planned wrong. This is my schema:

   create table comments (
 id serial primary key,
 conversation_id integer,
 created_at timestamp
   );
   create index comments_conversation_id_index on comments (conversation_id);
   create index comments_created_at_index on comments (created_at);

The table has 3.5M rows, and 650k unique values for conversation_id, where 
the histogram goes up to 54000 rows for the most frequent ID, with a long tail. 
There are only 20 values with a frequency of 1000 or higher. The created_at 
column has 3.5M distinct values.

Now, I have this query:

   select comments.id from comments where
 conversation_id = 3975979 order by created_at limit 13

This filters about 5000 rows and returns the oldest 13 rows. But the query is 
consistently planned wrong:

   Limit  (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 
rows=13 loops=1)
 Buffers: shared hit=2400709 read=338923 written=21 
   
 -  Index Scan using comments_created_at_index on comments  
(cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 
rows=13 loops=1)
   Filter: (conversation_id = 3975979)  
   
   Rows Removed by Filter: 2817751  
   
   Buffers: shared hit=2400709 read=338923 written=21   
   
   Total runtime: 3179.553 ms   
   

It takes anywhere between 3 seconds and several minutes to run, depending on 
how warm the disk cache is. This is the correct plan and index usage:

   Limit  (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 
rows=13 loops=1)
 
 Buffers: shared hit=197 read=4510  

 -  Sort  (cost=6214.34..6228.02 rows=5471 width=12) (actual 
time=25.469..25.470 rows=13 loops=1)
  
   Sort Key: created_at 

   Sort Method: top-N heapsort  Memory: 25kB

   Buffers: shared hit=197 read=4510

   -  Index Scan using comments_conversation_id_index on comments  
(cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 
loops=1)
 Index Cond: (conversation_id = 3975979)

 Buffers: shared hit=197 read=4510  

   Total runtime: 25.500 ms 


The problem for Postgres is obviously to estimate how many rows have a given 
conversation_id value, but I have confirmed that the value is correctly 
tracked in the histogram.

I'm at a loss how to explain why the planner thinks scanning a huge index that 
covers the entire table will ever beat scanning a small index that has 17% of 
the table's values. Even if the entire database were in RAM, this would hit way 
too much buffers unnecessarily. (I have determined that planner will 
consistently use the bad plan for higher-frequency values, and the good plan 
for lower-frequency values.) It will *always* be better to branch off the 
comments_conversation_id_index index.

Another curious thing: If I run ANALYZE repeatedly, the planner will sometimes, 
oddly enough, choose the correct plan. This behaviour actually seems related to 
effective_cache_size; if it's set small (128MB), the planner will sometimes 
favour the good plan, but if large (= 2GB) it will consistently use the bad 
plan. Not sure if ANALYZE is changing anything or if it's just bad timing.

Things I have tried: I have bumped the statistics target up to 1, but it 
does not help. I have also tried setting n_distinct for the column manually, 
since Postgres guesses it's 285k instead of 650k, but 

Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
On Friday, February 22, 2013 at 21:33 , Tom Lane wrote:
 The reason is that the LIMIT may stop the query before it's scanned all
 of the index. The planner estimates on the assumption that the desired
 rows are roughly uniformly distributed within the created_at index, and
 on that assumption, it looks like this query will stop fairly soon ...
 but evidently, that's wrong. On the other hand, it knows quite well
 that the other plan will require pulling out 5000-some rows and then
 sorting them before it can return anything, so that's not going to be
 exactly instantaneous either.
 
 In this example, I'll bet that conversation_id and created_at are pretty
 strongly correlated, and that most or all of the rows with that specific
 conversation_id are quite far down the created_at ordering, so that the
 search through the index takes a long time to run. OTOH, with another
 conversation_id the same plan might run almost instantaneously.


That's right. So I created a composite index, and not only does this make the 
plan correct, but the planner now chooses a much more efficient plan than the 
previous index that indexed only on conversation_id:

Limit  (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 
rows=13 loops=1)
   
  Buffers: shared hit=8 


  -  Index Scan using index_comments_on_conversation_id_and_created_at on 
comments  (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 
rows=13 loops=1)
Index Cond: (conversation_id = 3975979) 


Buffers: shared hit=8   


Total runtime: 0.094 ms 




Is this because it can get the value of created_at from the index, or is it 
because it can know that the index is pre-sorted, or both?

Very impressed that Postgres can use a multi-column index for this. I just 
assumed, wrongly, that it couldn't. I will have to go review my other tables 
now and see if they can benefit from multi-column indexes.

Thanks!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote:
 I suspect you would be better off without those two indexes, and
 instead having an index on (conversation_id, created_at). Not just
 for the query you show, but in general.


Indeed, that solved it, thanks!
 


 In my experience these problems come largely from the planner not
 knowing the cost of dealing with each tuple. I see a lot less of
 this if I raise cpu_tuple_cost to something in the 0.03 to 0.05
 range.


Is this something I can just frob a bit without worrying about it adversely 
impacting database performance across the board, or should I be very careful 
and do lots of testing on a staging box first?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Alexander Staubo
On Thu, Jul 9, 2009 at 6:26 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say,
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
 mapping of CATEGORY, something like this:

  1 = 'z'
  2 = 'a'
  3 = 'b'
  4 = 'w'
  5 = 'h'

 So when I get done, the sort order should be 2,3,5,4,1.

If the object is to avoid a separate table, you can do it with a
case statement:

  select ... from ...
  order by case category
when 1 then 'z'
when 2 then 'a'
when 3 then 'b'
when 4 then 'w'
when 5 then 'h'
  end

If you this sounds slow, you're right. But it might perform well
enough for your use case.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Alexander Staubo
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:
 This flags table has more or less the following fields:

 UserID - TopicID - LastReadAnswerID

We are doing pretty much same thing.

 My problem is that everytime a user READS a topic, it UPDATES this flags
 table to remember he has read it. This leads to multiple updates at the
 same time on the same table, and an update can take a few seconds. This
 is not acceptable for my users.

First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an update ... where last_read_answer_id
 ? should avoid the need for an update.

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an update with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take a few seconds. You might want to
investigate this part before you turn to further optimizations.

In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Alexander Staubo
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:
 The flags table keeps track of every topic a member has visited and
 remembers the last answer which was posted at this moment. It allows the
 user to come back a few days after and immediately jump to the last
 answer he has not read.

I forgot to mention that we speed up our queries by caching the last
read ID in Memcached. This is the kind of thing that Memcached is
ideal for.

For example, we show the list of the most recent posts, along with a
comment count, eg. 42 comments (6 new). We found that joining posts
against the last-read table is expensive, so instead we read from
Memcached on every post to find the number of unread comments.

We use the thread's last commented at timestamp as part of the key
so that when somebody posts a new comment, every user's cached unread
count is invalidated; it is automatically recalculated the next time
they view the post.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
Here's the query:

select photos.*
from photos
inner join event_participations on
  event_participations.user_id = photos.creator_id and
  event_participations.attend = true
inner join event_instances on
  event_instances.id = event_participations.event_instance_id
where (
  (event_instances.venue_id = 1290) and
  (photos.taken_at  (event_instances.time + interval '-3600 seconds')) and
  (photos.taken_at  (event_instances.time + interval '25200 seconds'))
)
order by taken_at desc
limit 20

It occasionally takes four minutes to run:


 QUERY PLAN
--
 Limit  (cost=0.00..10997.65 rows=20 width=116) (actual
time=262614.474..262614.474 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
(actual time=262614.470..262614.470 rows=0 loops=1)
 Join Filter: ((photos.taken_at  (event_instances.time +
'-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
+ '07:00:00'::interval)))
 -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)
   -  Index Scan Backward using photos_taken_at on photos
 (cost=0.00..40924.34 rows=544171 width=116) (actual
time=14.997..1357.724 rows=544171 loops=1)
   -  Index Scan using event_participations_user_id_index
on event_participations  (cost=0.00..2.95 rows=60 width=8) (actual
time=0.007..0.159 rows=26 loops=544171)
 Index Cond: (event_participations.user_id =
photos.creator_id)
 Filter: event_participations.attend
 -  Index Scan using event_instances_pkey on event_instances
(cost=0.00..0.29 rows=1 width=12) (actual time=0.008..0.008 rows=0
loops=14013998)
   Index Cond: (event_instances.id =
event_participations.event_instance_id)
   Filter: (event_instances.venue_id = 1290)
 Total runtime: 262614.585 ms

With enable_nestloop to false, it takes about 1 second to run.

Database is freshly analyzed and vacuumed. Default statistics target
is 100. I have tried increasing the stats on
event_participations.user_id, event_participations.event_instance_id
and photos.taken_at to 1000, but no improvement.

This is PostgreSQL 8.3.3.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
On Tue, Jun 16, 2009 at 3:56 PM, Dave Dutcherd...@tridecap.com wrote:
 -Original Message-
 From: Alexander Staubo

    -  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
 (actual time=262614.470..262614.470 rows=0 loops=1)
          Join Filter: ((photos.taken_at  (event_instances.time +
 '-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
 + '07:00:00'::interval)))
          -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)


 Do you have any of the other enable_* options set to false?

No.

 What do you
 have random_page_cost set to?  I ask because I'm surprised to see postgres
 choose to loop when it knows it will have to loop 11 million times.

The default, ie. 4.0.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Actually the easiest way to fix that is to get rid of the LIMIT.
 (Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
 magnifies the risks from any estimation error, and you've got a lot
 of that here ...

There's no cursor support in ActiveRecord, the ORM library we use, and
I'm not going to write it. Anyway, I would prefer not to gloss over
the underlying problem with something that requires a TODO next to
it. What can be done to fix the underlying problem? Nothing?

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Storing sensor data

2009-05-28 Thread Alexander Staubo
On Thu, May 28, 2009 at 2:54 PM, Ivan Voras ivo...@freebsd.org wrote:
 The volume of sensor data is potentially huge, on the order of 500,000
 updates per hour. Sensor data is few numeric(15,5) numbers.

The size of that dataset, combined with the apparent simplicity of
your schema and the apparent requirement for most-sequential access
(I'm guessing about the latter two), all lead me to suspect you would
be happier with something other than a traditional relational
database.

I don't know how exact your historical data has to be. Could you get
by with something like RRDTool? RRdTool is a round-robin database that
stores multiple levels of historical values aggregated by function. So
you typically create an average database, a max database and so
on, with the appropriate functions to transform the data, and you
subdivide these into day, month, year and so on, by the granularity of
your choice.

When you store a value, the historical data is aggregated
appropriately -- at appropriate levels of granularity, so the current
day database is more precise than the monthly one, and so on -- and
you always have access to the exact current data. RRDTool is used by
software such as Munin and Cacti that track a huge number of readings
over time for graphing.

If you require precise data with the ability to filter, aggregate and
correlate over multiple dimensions, something like Hadoop -- or one of
the Hadoop-based column database implementations, such as HBase or
Hypertable -- might be a better option, combined with MapReduce/Pig to
execute analysis jobs

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Storing sensor data

2009-05-28 Thread Alexander Staubo
On Thu, May 28, 2009 at 5:06 PM, Ivan Voras ivo...@freebsd.org wrote:
 If you require precise data with the ability to filter, aggregate and
 correlate over multiple dimensions, something like Hadoop -- or one of
 the Hadoop-based column database implementations, such as HBase or
 Hypertable -- might be a better option, combined with MapReduce/Pig to
 execute analysis jobs

 This looks like an interesting idea to investigate. Do you have more
 experience with such databases? How do they fare with the following
 requirements:

We might want to take this discussion off-list, since this list is
about PostgreSQL. Feel free to reply privately.

 * Storing large datasets (do they pack data well in the database? No
 wasted space like in e.g. hash tables?)

Columns databases like Hypertable and HBase are designed to store data
quite efficiently. Each column is grouped in a unit called a column
family and stored together in chunks usually called SSTables, after
the Google Bigtable paper. (When you design your database you must
determine which columns are usually accessed together, in other to
avoid incurring the I/O cost of loading non-pertinent columns.) Each
SSTable is like a partition. When storing a chunk to disk, the column
is compressed, each column being stored sequentially for optimal
compression.

I have used HBase, but I don't have any feel for how much space it
wastes. In theory, though, space usage should be more optimal than
with PostgreSQL. I have used Cassandra, another column database I
would also recommend, which is very efficient. In many ways I prefer
Cassandra to HBase -- it's leaner, completely decentralized (no single
point of failure) and independent of the rather huge, monolithic
Hadoop project -- but it does not currently support MapReduce. If you
want to implement some kind of distributed analysis system, you will
need to write yourself.

All three column stores support mapping information by a time
dimension. Each time you write a key, you also provide a timestamp. In
theory you can retain the entire history of a single key. HBase lets
you specify how many revisions to retain; not sure what Cassandra
does. However, Cassandra introduces the notion of a supercolumn
family, another grouping level which lets you use the timestamp as a
column key. To explain how this works, consider the following inserts:

  # insert(table_name, key, column, value, timestamp)
  db.insert(readings, temperature_sensor, value:1, 23, 200905281725023)
  db.insert(readings, temperature_sensor, value:2, 27, 200905281725023)
  db.insert(readings, temperature_sensor, value:3, 21, 200905281725023)

The resulting temperature_sensor row will have three column values:

  value:1   value:2   value:3
  23  27  21

You can keep adding values and the row will get bigger. Because
columns are dynamic, only that row will grow; all other rows will stay
the same size. Cassandra users usually use the column name as a kind
of value -- image it's like subindexing an array.

As you can see, I also passed a timestamp (the 2009.. bit), which is
used for versioning. Since anyone can write to any node in a cluster,
Cassandra needs to be able to resolve conflicts.

Note that these databases are inherently distributed. You can run them
on a single node just fine -- and that might be appropriate in your
case -- but they really shine when you run a whole cluster. Cassandra
is multi-master, so you can just boot up a number of nodes and read
from/write to any of them.

 * Retrieving specific random records based on a timestamp or record ID?

Absolutely.

 * Storing inifinite datasets (i.e. whose size is not known in
 advance - cf. e.g. hash tables)

This is one area where column databases are better than relational
ones. The schema is completely dynamic, and you can treat it as a hash
table.

 On the other hand, we could periodically transfer data from PostgreSQL
 into a simpler database (e.g. BDB) for archival purposes (at the
 expense of more code). Would they be better suited?

Considering the size and sequential nature of the data, I think they
would be better match than a simple key-value store like BDB.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad plan for nested loop + limit

2009-03-30 Thread Alexander Staubo
On Sun, Mar 1, 2009 at 4:32 AM, Robert Haas robertmh...@gmail.com wrote:
 What do you have default_statistics_target set to?  If it's less than
 100, you should probably raise it to 100 and re-analyze (the default
 value for 8.4 will be 100, but for 8.3 and prior it is 10).

Changing it to 100 fixed the problem. Thanks for alerting me to the
existence of default_statistics_target.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad plan for nested loop + limit

2009-02-28 Thread Alexander Staubo
On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote:
 The problem here is that the planner estimates the cost of a Limit
 plan node by adding up (1) the startup cost of the underlying plan
 node, in this case 0 for the nestjoin, and (2) a percentage of the run
 cost, based on the ratio of the number of rows expected to be returned
 to the total number of rows.  In this case, the nested loop is
 expected to return 6944 rows, so it figures it won't have to get very
 far to find the 4 you requested.
[...]
 I will think about this some more but nothing is occurring to me off
 the top of my head.

Thanks for explaining. Is there any way to rewrite the query in a way
that will avoid the nested loop join -- other than actually disabling
nested loop joins? If I do the latter, the resulting query uses a hash
join and completes in 80-100 ms, which is still pretty horrible,
especially for a query that returns nothing, but extremely auspicious
compared to the unthinkable 4-5 seconds for the current query.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad plan for nested loop + limit

2009-02-27 Thread Alexander Staubo
On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo a...@bengler.no wrote:
 On Sun, Feb 15, 2009 at 5:29 AM, David Wilson david.t.wil...@gmail.com 
 wrote:
 On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote:

 Output from explain analyze:

  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
 time=61.554..4039.704 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
 time=61.552..4039.700 rows=1 loops=1)
         -  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
 (actual time=61.535..4039.682 rows=1 loops=1)

 Those estimates are pretty far off. Did you try increasing the
 statistics target? Also, is the first query repeatable (that is, is it
 already in cache when you do the test, or alternately, are all queries
 *out* of cache when you test?)

All right, this query keeps coming back to bite me. If this part of the join:

  ... and section_items.sandbox_id = 16399

yields a sufficiently large number of matches, then performance goes
'boink', like so:

 Limit  (cost=0.00..34.86 rows=4 width=48) (actual
time=4348.696..4348.696 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..60521.56 rows=6944 width=48) (actual
time=4348.695..4348.695 rows=0 loops=1)
 -  Index Scan using index_event_occurrences_on_start_time on
event_occurrences  (cost=0.00..11965.38 rows=145712 width=48) (actual
time=0.093..138.029 rows=145108 loops=1)
   Index Cond: (start_time  '2009-02-27
18:01:14.739411+01'::timestamp with time zone)
 -  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
loops=145108)
   Index Cond: (((section_items.subject_type)::text =
'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
   Filter: (section_items.sandbox_id = 9)
 Total runtime: 4348.777 ms

In this case:

# select count(*) from section_items where sandbox_id = 9;
 count
---
  3126

If I remove the start_time  ... clause, performance is fine. Upping
the statistics setting on any of the columns involved seems to have no
effect.

Is this a pathological border case, or is there something I can do to
*generally* make this query run fast? Keep in mind that the query
itself returns no rows at all. I want to avoid doing an initial
select count(...) just to avoid the bad plan. Suffice to say, having
a web request take 5 seconds is asking too much from our users.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-16 Thread Alexander Staubo
On Sun, Feb 15, 2009 at 6:35 PM, Greg Smith gsm...@gregsmith.com wrote:
 http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm goes over
 this topic, with Appendix B: pg_stat_bgwriter sample analysis covering a
 look at what to do based on a pg_stat_bgwriter snapshot.

Wonderful, thank you.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad plan for nested loop + limit

2009-02-15 Thread Alexander Staubo
On Sun, Feb 15, 2009 at 5:29 AM, David Wilson david.t.wil...@gmail.com wrote:
 On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote:

 Output from explain analyze:

  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
 time=61.554..4039.704 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
 time=61.552..4039.700 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
 (actual time=61.535..4039.682 rows=1 loops=1)

 Those estimates are pretty far off. Did you try increasing the
 statistics target? Also, is the first query repeatable (that is, is it
 already in cache when you do the test, or alternately, are all queries
 *out* of cache when you test?)

All in the cache when I do the test. Ok, so upping the statistics to
100 on section_items.subject_id fixed it:

 Limit  (cost=3530.95..3530.96 rows=4 width=48) (actual
time=0.107..0.107 rows=1 loops=1)
   -  Sort  (cost=3530.95..3531.12 rows=66 width=48) (actual
time=0.106..0.106 rows=1 loops=1)
 Sort Key: event_occurrences.start_time
 Sort Method:  quicksort  Memory: 25kB
 -  Nested Loop  (cost=0.00..3529.96 rows=66 width=48)
(actual time=0.098..0.100 rows=1 loops=1)
   -  Index Scan using index_section_items_on_sandbox_id
on section_items  (cost=0.00..104.29 rows=22 width=4) (actual
time=0.017..0.033 rows=7 loops=1)
 Index Cond: (sandbox_id = 16399)
 Filter: ((subject_type)::text = 'Event'::text)
   -  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..154.79 rows=74 width=48) (actual time=0.008..0.008 rows=0
loops=7)
 Index Cond: (event_occurrences.event_id =
section_items.subject_id)
 Filter: (event_occurrences.start_time 
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.142 ms

Thanks.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Fri, Feb 13, 2009 at 6:35 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 You should definitely set effective_cache_size.

Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for
cache), and today's average write frequency went from 20MB/sec to just
1MB/sec. The documentation suggests that effective_cache_size is only
used for query planning in conjunction with indexes. So how come it
affects write access?

 If you still see the problem after that, I suggest testing different
 settings for:

 bgwriter_lru_maxpages
 bgwriter_lru_multiplier
 checkpoint_segments
 checkpoint_timeout
 checkpoint_completion_target

 Both the checkpoint process and the background writer changed quite a
 bit, and these are the main knobs for tuning the new version.

We are hoping to set up a duplicate instance and play back the SQL log
against it so we can experiment with different settings. Until we have
such a setup, I'm not sure what to do with the knobs other than frob
them wildly. :-) Are there any statistics, either in PostgreSQL proper
or in the OS, that I can use as metrics to guide the tuning? For
example, is there anything in pg_stat_bgwriter that can help me tune
the bgwriter_lru_* settings?

Do transactions that only contain query statements end up writing
entries to the WAL when they commit? If yes, can we avoid the writes
by wrapping our queries in read only transactions, and would it be
worth the effort?

Our web application is handling 30 requests per second at peak time;
each request is performing dozens queries in autocommit mode, ie. one
transaction per query. Only a minority of those requests actually end
up modifying the database. PostgreSQL is committing and fsyncing
600-800 transactions per second, so that's probably a good chunk of
disk/CPU usage wasted, right?

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Sat, Feb 14, 2009 at 9:49 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Is there any chance you had pg_xlog stored separately on your old database,
 and I/O for it wasn't being recorded?

No, the database files have always been on a single volume.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Sat, Feb 14, 2009 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alexander Staubo a...@bengler.no writes:
 Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for
 cache), and today's average write frequency went from 20MB/sec to just
 1MB/sec. The documentation suggests that effective_cache_size is only
 used for query planning in conjunction with indexes. So how come it
 affects write access?

 It *is* only used for query planning.  A plausible theory is that you
 caused some join queries to change from hash or merge joining involving
 a temporary hash or sort file to an index nestloop that doesn't use any
 temporary storage.  If the required hash or sort storage exceeded
 work_mem, which you have set to just 10MB, that would have created some
 write traffic.

Interesting. Is there any statistic available that can tell me whether
work_mem is being exceeded? The tools to monitor exactly what
PostgreSQL is doing -- especially on a production box -- are fairly
limited, especially since Linux does not yet have anything close to
DTrace in functionality.

 Did you happen to notice whether your queries got faster or slower when
 you did this?  Watching only aggregate write traffic is a pretty limited
 view of what is happening in your database.

Unfortunately we don't log SQL queries at the moment. We do log
application response times, but they look roughly the same as before
the change. I could revert the effective_cache_size setting, turn on
SQL logging for a while, then reapply the change and compare.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad plan for nested loop + limit

2009-02-14 Thread Alexander Staubo
This dinky little query takes about 4 seconds to run:

  select event_occurrences.*
  from event_occurrences
  join section_items on section_items.subject_id = event_occurrences.event_id
and section_items.subject_type = 'Event'
and section_items.sandbox_id = 16399
  where event_occurrences.start_time  '2009-02-14 18:15:14.739411 +0100'
  order by event_occurrences.start_time
  limit 4;

Output from explain analyze:

 Limit  (cost=0.00..973.63 rows=4 width=48) (actual
time=61.554..4039.704 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
time=61.552..4039.700 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
(actual time=61.535..4039.682 rows=1 loops=1)
   -  Index Scan using
index_event_occurrences_on_start_time on event_occurrences
(cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152
rows=155197 loops=1)
 Index Cond: (start_time  '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
   -  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0
loops=155197)
 Index Cond: (((section_items.subject_type)::text
= 'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
 Filter: (section_items.sandbox_id = 16399)
 -  Index Scan using event_instances_pkey on events
(cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1
loops=1)
   Index Cond: (events.id = event_occurrences.event_id)
   Filter: (events.read_permissions = (-1))
 Total runtime: 4039.788 ms

Now, if I use limit 50 it uses a plan that is several orders of
magnitude more efficient:

 Limit  (cost=6202.38..6202.51 rows=50 width=48) (actual
time=0.170..0.171 rows=1 loops=1)
   -  Sort  (cost=6202.38..6203.20 rows=326 width=48) (actual
time=0.170..0.170 rows=1 loops=1)
 Sort Key: event_occurrences.start_time
 Sort Method:  quicksort  Memory: 25kB
 -  Nested Loop  (cost=5.09..6191.55 rows=326 width=48)
(actual time=0.160..0.161 rows=1 loops=1)
   -  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7
loops=1)
 Recheck Cond: (sandbox_id = 16399)
 Filter: ((subject_type)::text = 'Event'::text)
 -  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.018..0.018 rows=7 loops=1)
   Index Cond: (sandbox_id = 16399)
   -  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0
loops=7)
 Index Cond: (event_occurrences.event_id =
section_items.subject_id)
 Filter: (event_occurrences.start_time 
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.210 ms

Similarly if I disable nested joins with set enable_nestloop = off:

 Limit  (cost=10900.13..10900.14 rows=4 width=48) (actual
time=191.476..191.478 rows=1 loops=1)
   -  Sort  (cost=10900.13..10900.95 rows=326 width=48) (actual
time=191.474..191.475 rows=1 loops=1)
 Sort Key: event_occurrences.start_time
 Sort Method:  quicksort  Memory: 25kB
 -  Hash Join  (cost=8944.52..10895.24 rows=326 width=48)
(actual time=162.104..191.463 rows=1 loops=1)
   Hash Cond: (section_items.subject_id =
event_occurrences.event_id)
   -  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7
loops=1)
 Recheck Cond: (sandbox_id = 16399)
 Filter: ((subject_type)::text = 'Event'::text)
 -  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.019..0.019 rows=7 loops=1)
   Index Cond: (sandbox_id = 16399)
   -  Hash  (cost=5580.54..5580.54 rows=157752 width=48)
(actual time=161.832..161.832 rows=155197 loops=1)
 -  Seq Scan on event_occurrences
(cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406
rows=155197 loops=1)
   Filter: (start_time  '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 192.496 ms

Some statistics:

# # select attname, n_distinct from pg_stats where tablename =
'event_occurrences';
  attname   | n_distinct
+
 id | -1
 created_at |  -0.291615
 updated_at |  -0.294081
 created_by |715
 updated_by |715
 event_id   |   2146
 start_time |   -0.10047
 end_time   |   5602

# select attname, n_distinct from pg_stats where tablename = 'section_items';
   attname| n_distinct
--+
 id 

[PERFORM] I/O increase after upgrading to 8.3.5

2009-02-13 Thread Alexander Staubo
After upgrading from 8.2 to 8.3.5, the write load on our database
server has increased dramatically and inexplicably -- as has the CPU
usage.

Here's a Munin graph of iostat showing the sudden increase in blocks
written/sec:

  http://purefiction.net/paste/Totals-iostatwrite1-week.png

We expected the upgrade to give us better, not worse, performance. How
do I diagnose this problem?

The application code has not changed, and neither have the usage
patterns. The number of tuple inserts/updates/deletes per second has
not deviated from the normal. However, CPU usage has increased by
around 30%, and the load average has similarly increased.

The other performance metrics I monitor, such as transaction load,
HTTP traffic, etc., show everything else as either normal or slightly
decreased, as you would expect when the database server slows down.

The upgrade was done with dump/restore using pg_dump -Fc. The old
database lived on a SAN volume, whereas the new database lives on a
local disk volume.

OS is Linux 2.6.24 on Intel (x86_64).

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-13 Thread Alexander Staubo
On Fri, Feb 13, 2009 at 3:46 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Alexander Staubo a...@bengler.no wrote:
 After upgrading from 8.2 to 8.3.5, the write load on our database
 server has increased dramatically and inexplicably -- as has the CPU
 usage.

 Did you do a VACUUM ANALYZE of the database after loading it?  Without
 the database VACUUM, the first read of any page causes it to be
 rewritten to set hint bits.  Without an ANALYZE, it might be picking
 very inefficient plans.  I actually run a VACUUM FREEZE ANALYZE after
 loading a database (as with the upgrade), to prevent massive rewrites
 of everything in the database at some later date due to freeze
 activity.

Thanks, the lack of statistics should explain why things are a bit
slow. I ran a vacuum freeze analyze now, but the I/O level is still
quite high.

I have verified using pidstat that the I/O is all caused by
PostgreSQL. Here's some sample output from iostat, interval 1 second
(the 4th column is KB written):

sda1173.0068.00149672.00 68 149672
sda  14.00 0.00  1712.00  0   1712
sda   2.00 0.00   336.00  0336
sda 679.00   344.00115200.00344 115200
sda 238.00 0.00 61764.00  0  61764
sda 436.00 0.00 95004.00  0  95004
sda  14.00 0.00  1032.00  0   1032
sda1882.0072.00 82380.00 72  82380
sda 173.00 8.00  7936.00  8   7936

What I find odd is that PostgreSQL is only clocking at about 8 tuple
modifications per second average (on our user tables). There are
around 800 transactions per second, but most of these are only reads.
How can 8 tuples/sec result in 115MB writes per second? The database
is not large enough and the tuples not large enough to explain those
huge amounts of data. At 115MB/s you could rewrite the entire database
in 1.5 minutes.

Is there any way to determine the *sizes* of the tuple mods, not just
the frequency, that PostgreSQL performs? What kinds of tools are
available to track down the causes of these writes?

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-13 Thread Alexander Staubo
On Fri, Feb 13, 2009 at 12:53 PM, Alexander Staubo a...@bengler.no wrote:
 The upgrade was done with dump/restore using pg_dump -Fc. The old
 database lived on a SAN volume, whereas the new database lives on a
 local disk volume.

I need to correct myself: The Munin graphs were never set to track the
SAN volume where the old database lived. So when the graph goes from
near-zero to lots, it's actually correct.

When I compare the correct graph, however, it's apparently that I/O
writes have, on average, doubled.

The new volume uses the same file system and block size as the old one.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-13 Thread Alexander Staubo
On Fri, Feb 13, 2009 at 5:17 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Could you show the non-commented lines from old and new
 postgresql.conf files, please?

Attached. The differences are not performance-related, as far as I can
see, aside from the additional of synchronous_commit = off.

Alexander.


82.conf
Description: Binary data


83.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identifying the nature of blocking I/O

2008-08-25 Thread Alexander Staubo
On Mon, Aug 25, 2008 at 3:34 AM, Scott Carey [EMAIL PROTECTED] wrote:
 DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD.
 Linux however is still in the dark ages when it comes to system monitoring,
 especially with I/O.

While that's true, newer 2.6 kernel versions at least have I/O
accounting built in, something which only used to be available through
the atop accounting kernel patch:

$ cat /proc/22785/io
rchar: 31928
wchar: 138
syscr: 272
syscw: 4
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Optimization with Kruskal’s Algorithm

2008-05-07 Thread Alexander Staubo
On 5/7/08, Tarcizio Bini [EMAIL PROTECTED] wrote:
 I'm working on optimizing queries using the Kruskal algorithm
 (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4318118).

That paper looks very interesting. I would love to hear what the
PostgreSQL committers think of this algorithm.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Putting files into fields in a table

2007-12-13 Thread Alexander Staubo
On 12/13/07, Campbell, Lance [EMAIL PROTECTED] wrote:
 I am looking at the possibility of storing files in some of my database
 tables.  My concern is obviously performance.  I have configured PostgreSQL
 to take advantage of Linux file caching.  So my PostgreSQL does not have a
 large setting for shared_buffers even though I have 24G of memory.

This used to be the recommended way before 8.0. In 8.0, it is
advantageous to give PostgreSQL more buffers. You should still make
some room for the kernel cache.

By storing files, I assume you mean a lot of data imported from
files. The procs and cons of storing large amounts of data as
PostgreSQL tuples has been debated before. You might want to search
the archives.

My opinion is that PostgreSQL is fine up to a point (let's say 10,000
tuples x 2KB), above which I would merely store references to
file-system objects. Managing these objects can be painful, especially
in a cluster of more than one machine, but at least it's fast and
lightweight.

 What data type should I use for fields that hold files?

PostgreSQL has two ways of storing large amounts of data in a single
tuple: variable-length columns, and blobs.

Blobs are divided into chunks and stored in separate tables, one tuple
per chunk, indexed by offset, and PostgreSQL allows random access to
the data. The downside is that they take up more room, they're slower
to create, slower to read from end to end, and I believe there are
some operations (such as function expressions) that don't work on
them. Some replication products, including (the last time I looked)
Slony, does not support replicating blobs. Blobs are not deprecated, I
think, but they feel like they are.

Variable-length columns such as bytea and text support a system called
TOAST, which allow the first few kilobytes of the data to be stored
in-place in the tuple, and the overflow to be stored elsewhere and
potentially compressed. This system is much faster and tighter than
blobs, but does not offer random I/O.

 Is there anything that I should be aware of when putting files into a field
 in a table?

Backup dumps will increase in size in proportion to the size of your
data. PostgreSQL is no speed demon at loading/storing data, so this
might turn out to be the Achilles heel.

 When PostgreSQL accesses a table that has fields that contain files does it
 put the fields that contain the files into the shared_buffers memory area?

I believe so.

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL performance on various distribution stock kernels

2007-11-26 Thread Alexander Staubo
On 11/26/07, Damon Hart [EMAIL PROTECTED] wrote:
 So, what's different between these tests? I'm seeing performance
 differences of between +65% to +90% transactions per second of the
 OpenVZ kernel running on the HN over the stock Fedora 8 kernel. Is
 this reflective of different emphasis between RHEL and Fedora kernel
 builds?  Some OpenVZ optimization on top of the RHEL5 build? Something
 else? Where should I look?

A recent FreeBSD benchmark (which also tested Linux performance) found
major performance differences between recent versions of the kernel,
possibly attributable to the new so-called completely fair scheduler:

  http://archives.postgresql.org/pgsql-performance/2007-11/msg00132.php

No idea if it's relevant.

Alexander.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Alexander Staubo
On 11/7/07, Guillaume Smet [EMAIL PROTECTED] wrote:
 While studying a query taking forever after an ANALYZE on a never
 analyzed database (a bad estimate causes a nested loop on a lot of
 tuples), I found the following problem:
[snip]
  Total runtime: 31.097 ms
[snip]
  Total runtime: 31.341 ms
[snip]
  Total runtime: 34.778 ms

 Which is a really good estimate.

That's a difference of less than *three milliseconds* -- a difference
probably way within the expected overhead of running explain
analyze. Furthermore, all three queries use the same basic plan: a
sequential scan with a filter. At any rate you're microbenchmarking in
a way that is not useful to real-world queries. In what way are these
timings a problem?

Have you tried using an index which supports prefix searches? The
text_pattern_ops operator class lets yo do this with a plain B-tree
index:

  create index cms_items_ancestors_index on cms_items (ancestors
text_pattern_ops);
  analyze cms_items;

Now all like 'prefix%' queries should use the index.

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] TEXT or LONGTEXT?

2007-09-24 Thread Alexander Staubo
On 9/24/07, Fabiola Fernández [EMAIL PROTECTED] wrote:
 I have a database with an amount of tables and in several of them I have an
 attribute for a semantic definition, for which I use a field of type text. I
 am trying to decide if it would be worth using LONGTEXT instead of TEXT, as
 maybe it would slow down the data insertion and extraction. I hope that you
 could help me. Thank you.

Easy choice -- PostgreSQL does not have a data type named longtext.

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Replication

2007-06-15 Thread Alexander Staubo

On 6/15/07, Craig James [EMAIL PROTECTED] wrote:

I don't think I can use PGPool as the replicator, because then it becomes a new 
single point of failure that could bring the whole system down.  If you're 
using it for INSERT/UPDATE, then there can only be one PGPool server.


Are you sure? I have been considering this possibility, too, but I
didn't find anything in the documentation. The main mechanism of the
proxy is taking received updates and playing them one multiple servers
with 2PC, and the proxies should not need to keep any state about
this, so why couldn't you install multiple proxies?

Alexander.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Replication

2007-06-14 Thread Alexander Staubo

On 6/15/07, Craig James [EMAIL PROTECTED] wrote:
[snip]

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?


* Mammoth Replicator, commercial.

* Continuent uni/cluster, commercial
(http://www.continuent.com/index.php?option=com_contenttask=viewid=212Itemid=169).

* pgpool-II. Supports load-balancing and replication by implementing a
proxy that duplicates all updates to all slaves. It can partition data
by doing this, and it can semi-intelligently route queries to the
appropriate servers.

* Cybertec. This is a commercial packaging of PGCluster-II from an
Austrian company.

* Greenplum Database (formerly Bizgres MPP), commercial. Not so much a
replication solution as a way to parallelize queries, and targeted at
the data warehousing crowd. Similar to ExtenDB, but tightly integrated
with PostgreSQL.

* DRDB (http://www.drbd.org/), a device driver that replicates disk
blocks to other nodes. This works for failover only, not for scaling
reads. Easy migration of devices if combined with an NFS export.

* Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools),
a collection of replication tools from the Skype people. Purports to
be simpler to use than Slony.

Lastly, and perhaps most promisingly, there's the Google Summer of
Code effort by Florian Pflug
(http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6)
to implement true log-based replication, where PostgreSQL's
transaction logs are used to keep live slave servers up to date with a
master. In theory, such a system would be extremely simple to set up
and use, especially since it should, as far as I can see, also
transparently replicate the schema for you.

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Some info to share: db_STRESS Benchmark results

2007-05-31 Thread Alexander Staubo

On 5/31/07, Dimitri [EMAIL PROTECTED] wrote:

just wanted to share some benchmark results from one long performance
study comparing MySQL, PostgreSQL and Oracle transactions throughput
and engine scalability on T2000 and V890 (under Solaris).


Interesting, if awfully cryptic. The lack of axis labels, the lack of
axis normalization, and the fact that you put the graphs for different
databases and parameters on separate pages makes it rather hard to
compare the various results.

Alexander.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread Alexander Staubo

On 5/27/07, PFC [EMAIL PROTECTED] wrote:

PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance.


How does it know what to cluster by? Does it gather statistics about
query patterns on which it can decide an optimal clustering, or does
it merely follow a clustering previously set up by the user?

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] LIKE search and performance

2007-05-23 Thread Alexander Staubo

On 5/23/07, Andy [EMAIL PROTECTED] wrote:

An example would be:
SELECT * FROM table
 WHERE name like '%john%' or street like '%srt%'

Anyway, the query planner always does seq scan on the whole table and that
takes some time. How can this be optimized or made in another way to be
faster?


There's no algorithm in existence that can index arbitrary
substrings the way you think. The only rational way to accomplish this
is to first break the text into substrings using some algorithm (eg.,
words delimited by whitespace and punctuation), and index the
substrings individually.

You can do this using vanilla PostgreSQL, and you can use Tsearch2
and/or its GIN indexes to help speed up the searches. The simplest
solution would be to put all the substrings in a table, one row per
substring, along with an attribute referencing the source table/row.

At this point you have effectively reduced your search space: you can
use a query to isolate the words in your dictionary that contain the
substrings. So a query might be:

 select ... from persons where id in (
   select person_id from person_words
   where word like '%john%';
 )

The like search, even though it will use a sequential scan, is bound
to be much faster on these small words than searching for substrings
through large gobs of text in the persons table.

Note that PostgreSQL *can* exploit the index for *prefix* matching, if
you tell it to use the right operator class:

 create index persons_name_index on persons (name text_pattern_ops);

or, if you're using varchars (though there is rarely any reason to):

 create index persons_name_index on persons (name varchar_pattern_ops);

(These two may be identical internally. Check the docs.)

Now you can do:

 select ... from persons where name like 'john%';

which will yield a query plan such as this:

Index Scan using persons_name_index on persons  (cost=0.00..8.27
rows=1 width=29) (actual time=0.184..0.373 rows=51 loops=1)
  Index Cond: ((name ~=~ 'john'::text) AND (name ~~ 'joho'::text))
  Filter: (title ~~ 'john%'::text)

Alexander.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Alexander Staubo

On 5/22/07, Arnau [EMAIL PROTECTED] wrote:

   On older versions of PostgreSQL, at least in my experience, queries
on timestamps fields even having indexes where performing quite bad
mainly sequential scans where performed.


PostgreSQL uses B-tree indexes for scalar values. For an expression
such as t between a and b, I believe it's going to match both sides
of the table independently (ie., t = a and t = b) and intersect
these subsets. This is inefficient.

You should get better performance by mapping timestamps to a
one-dimensional plane and indexing them using GiST. GiST implements an
R-tree-like structure that supports bounding-box searches.

This involves setting up a functional index:

 create index ... on payment_transactions using gist (
   box(point(extract(epoch from time), 0), point(extract(epoch from
time), 0)) box_ops)

I'm using box() here because GiST doesn't have a concept of points.

Then insert as usual, and then query with something like:

 select ... from payment_transactions
 where box(
   point(extract(epoch from '2006-04-01'::date), 0),
   point(extract(epoch from '2006-08-01'::date), 0))  box(
   point(extract(epoch from time), 0),
   point(extract(epoch from time), 0));

PostgreSQL should be able to exploit the GiST index by recognizing
that the result of box() expression operand is already computed in the
index.

This much less inconvenient and portable -- I would love for
PostgreSQL to be provide syntactic sugar and special-casing to make
this transparent -- but worth it if you are dealing with a lot of
range searches.


   Now I have a newer version of PostgreSQL and I've done some tests
comparing the performance of an index over a timestamp field with a
numeric field. To do so, I have the following table:

  Table public.payment_transactions
 Column |Type |Modifiers
+-+-
transaction_id | character varying(32)   | not null
timestamp_in   | timestamp without time zone | default now()
credits| integer |
epoch_in   | bigint  |
epoch_in2  | double precision|

[snip]

A timestamp is stored internally as an 8-byte double-precision float.
Therefore, timestamp_in and epoch_in2 should behave identically.


While doing the tests this table has about 100.000 entries.


Make sure PostgreSQL is able to keep the entire table in memory by
setting shared_buffers; you don't want to be hitting to the disk.

Make sure you run analyze on the table before you execute the test.


To test the diferent indexes I have executed the following:


Your query plans are roughly identical. The difference in the timings
implies that you only ran the queries once. I suggest you run each
query at least 10 times, and report the individual numbers (the total
runtime parts of the output) you get. Arithmetic means are not that
interesting.

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Alexander Staubo

On 5/22/07, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote:
 PostgreSQL uses B-tree indexes for scalar values. For an expression
 such as t between a and b, I believe it's going to match both sides
 of the table independently (ie., t = a and t = b) and intersect
 these subsets. This is inefficient.

A B-tree index can satisfy range queries such as this.


You're right, and I'm wrong -- my head is not in the right place
today. B-trees are inefficient for intervals, but quite satisfactory
for range searches.

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Tips Tricks for validating hardware/os

2007-05-22 Thread Alexander Staubo

On 5/22/07, Stephane Bailliez [EMAIL PROTECTED] wrote:

Out of curiosity, can anyone share his tips  tricks to validate a
machine before labelling it as 'ready to use postgres - you probably
won't trash my data today' ?
I'm looking for a way to stress test components especially kernel/disk
to have confidence  0 that I can use postgres on top of it.

Any secret trick is welcome (beside the memtest one :)


Compile the Linux kernel -- it's a pretty decent stress test.

You could run pgbench, which comes with PostgreSQL (as part of the
contrib package). Give a database size that's larger than the amount
of physical memory in the box.

Alexander.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] bitmap index and IS NULL predicate

2007-05-15 Thread Alexander Staubo

On 5/15/07, Jason Pinnix [EMAIL PROTECTED] wrote:

Does the bitmap
index not store a bit vector for the NULL value (i.e. a bit vector that
contains a 1 for each row with a NULL value and 0 for other rows) ?


You should be able to do this with a conditional index:

 create index ... (col) where col is null;

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Alexander Staubo

On 5/8/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
[snip]

I personally don't trust reiserfs, jfs seems to be a tools for
transitioning from AIX more then anything else [...]


What makes you say this? I have run JFS for years with complete
satisfaction, and I have never logged into an AIX box.

JFS has traditionally been seen as an underdog, but undeservedly so,
in my opinion; one cause might be the instability of the very early
releases, which seems to have tainted its reputation, or the alienness
of its AIX heritage. However, every benchmark I have come across puts
its on par with, and often surpassing, the more popular file systems
in performance. In particular, JFS seems to shine with respect to CPU
overhead.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Alexander Staubo

On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote:

You're not getting the indexscan optimization of the LIKE clause, which
is most likely due to having initdb'd the 8.1 installation in something
other than C locale.  You can either redo the initdb in C locale (which
might be a good move to fix other inconsistencies from the 7.3 behavior
you're used to) or create a varchar_pattern_ops index on the column(s)
you're using LIKE with.


Given the performance implications of setting the wrong locale, and
the high probability of accidentally doing this (I run my shells with
LANG=en_US.UTF-8, so all my databases have inherited this locale), why
is there no support for changing the database locale after the fact?

# alter database test set lc_collate = 'C';
ERROR:  parameter lc_collate cannot be changed

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] pg_stat_* collection

2007-05-03 Thread Alexander Staubo

On 5/3/07, Greg Smith [EMAIL PROTECTED] wrote:

Today's survey is:  just what are *you* doing to collect up the
information about your system made available by the various pg_stat views?
I have this hacked together script that dumps them into a file, imports
them into another database, and then queries against some of the more
interesting data.  You would thing there would be an organized project
addressing this need around to keep everyone from reinventing that wheel,
but I'm not aware of one.


I have a bunch of plugin scripts for Munin
(http://munin.projects.linpro.no/) that collect PostgreSQL statistics.
Graphs like this are useful:

 http://purefiction.net/paste/pg_munin_example.png

I have been considering tarring them up as a proper release at some
point. Anyone interested?

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] index structure for 114-dimension vector

2007-05-01 Thread Alexander Staubo

On 5/1/07, Andrew Lazarus [EMAIL PROTECTED] wrote:

Let me just thank the list, especially for the references. (I found
similar papers myself with Google: and to think I have a university
library alumni card and barely need it any more!)

I'll write again on the sorts of results I get.


Looking forward to hearing about them. I have worked with such dataset
problems, but never attempted to apply them to a relational database
such as PostgreSQL. If you want speed, nothing beats in-memory vectors
on a modern CPU architecture.

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] index structure for 114-dimension vector

2007-04-26 Thread Alexander Staubo

On 4/20/07, Andrew Lazarus [EMAIL PROTECTED] wrote:

I have a table with 2.5 million real[] arrays. (They are points in a
time series.) Given a new array X, I'd like to find, say, the 25
closest to X in some sense--for simplification, let's just say in the
usual vector norm. Speed is critical here, and everything I have tried
has been too slow.


Let me chime in with the observation that this is a multidimensional
nearest neighbour (reverse nearest neighbour and its close cousin,
k-NN) that is well known in statistics, and particularly relevant to
statistical learning and classification. Knowing the jargon might help
you dig up efficient algorithms to mine your data; there are tons of
fascinating papers available through Citeseer.

In particular, I recommend the paper Efficient k-NN Search on
Vertically Decomposed Data by de Vries et al, SIGMOD 2002 (PDF here:
http://citeseer.ist.psu.edu/618138.html), if only for inspiration. It
proposes an algorithm called BOND to drastically reduce the search
space by probalistic means. They give an example using image
histograms, but the algorithm applies to any multidimensional data.
Briefly put, it points out that proximity comparison can be computed
vertically, a few dimensions at a time, and entire subsets can be
thrown away when it's apparent that they are below a statistically
derived lower bound. The only gotcha is that the algorithm derives
much of its performance from the assumption that your data is
vertically decomposed, one table per dimension, otherwise the search
effectively incurs a sequential scan of the entire dataset, and then
you're pretty much back to square one.

The most common approach to nearest neighbour search is to use a
spatial data structure. The classic algorithm is the kd-tree
(http://en.wikipedia.org/wiki/Kd-tree) and there's the newer K-D-B
tree, neither of which are available in PostgreSQL. If I remember
correctly, R-trees have also been shown to be useful for high numbers
of dimensions; with PostgreSQL you have R-trees and even better
R-tree-equivalent support through GiST. I have no idea whether you can
actually munge your integer vectors into something GiST can index and
search, but it's a thought. (GiST, presumably, can also theoretically
index kd-trees and other spatial trees.)

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster

2007-04-26 Thread Alexander Staubo

On 24 Apr 2007 14:26:46 -0700, zardozrocks [EMAIL PROTECTED] wrote:

I have this table:

CREATE TABLE test_zip_assoc (
id serial NOT NULL,
f_id integer DEFAULT 0 NOT NULL,
lat_radians numeric(6,5) DEFAULT 0.0 NOT NULL,
long_radians numeric(6,5) DEFAULT 0.0 NOT NULL
);
CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians);
CREATE INDEX long_radians ON test_zip_assoc USING btree
(long_radians);


This is a spatial search -- B-tree indexes are much less efficient
than this than certain other data structures. The R-tree and its many
variants are based on subdividing the space in regions, allowing you
to do efficient checks on containment, intersection, etc., based on
points or bounding boxes.

PostgreSQL implements R-trees natively as well as through a mechanism
called GiST, a framework for implementing pluggable tree-like indexes.
It also provides some geometric data types. However, as far as I know,
PostgreSQL's R-tree/GiST indexes do *not* provide the operators to do
bounding box searches. For this you need PostGIS.

PostGIS implements the whole GIS stack, and it's so good at this that
it's practically the de facto tool among GIS analysts. Installing
PostGIS into a database is simple, and once you have done this, you
can augment your table with a geometry (*):

 alter table test_zip_assoc add column lonlat geometry;
 update test_zip_assoc set lonlat = makepoint(
   long_radians / (3.14159265358979 / 180),
   lat_radians / (3.14159265358979 / 180));

The division is to convert your radians into degrees; PostGIS works
with degrees, at least out of the box.

Now you can query on a bounding box (although, are you sure you got
your lons and lats in order? That's Antarctica, isn't it?):

 select * from test_zip_assoc
 where lonlat  makebox2d(
   makepoint(-77.6049721697096, 39.5425768302107),
   makepoint(-76.1592790300818,  40.9882699698386))

This is bound to be blazingly fast. Next you can order by geographic
distance if you like:

 order by distance_sphere(lonlat,
   makepoint(-77.6049721697096, 39.5425768302107))

Nobody has mentioned PostGIS so far, so I hope I'm not missing some
crucial detail, like no spatial indexes allowed!.

(*) I cheated here. The PostGIS manual recommends that you use a
function to create geometric column, because it will set up some
auxilary data structures for you that are needed for certain
operations. The recommended syntax is:

 select AddGeometryColumn('', 'test_zip_assoc', 'geom', -1, 'POINT', 2);

Alexander.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster

2007-04-26 Thread Alexander Staubo

On 4/27/07, Alexander Staubo [EMAIL PROTECTED] wrote:
[snip]

PostGIS implements the whole GIS stack, and it's so good at this that
it's practically the de facto tool among GIS analysts. Installing
PostGIS into a database is simple, and once you have done this, you
can augment your table with a geometry (*):

  alter table test_zip_assoc add column lonlat geometry;


I forgot to include the crucial step, of course:

 create index test_zip_assoc_lonlat_index on test_zip_assoc
   using gist (lonlat gist_geometry_ops);
 analyze test_zip_assoc_lonlat;

This creates a GiST index on the geometry and (significantly) updates
the table statistics.

Alexander.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Warm - standby system.

2007-04-24 Thread Alexander Staubo

On 4/24/07, Nimesh Satam [EMAIL PROTECTED] wrote:

Can anybody let me know the steps which are supposed to be followed to make
the standby machine for read access? and how it should be one.


Not possible at the moment. The warm standby is not hot -- it cannot
be used for queries while it's acting as a standby. Explained here:

 http://www.postgresql.org/docs/8.2/static/warm-standby.html

Alexander.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-24 Thread Alexander Staubo

On Feb 25, 2007, at 04:39 , Carlos Moreno wrote:

I do have the option to configure it in RAID-0, but I'm sort of  
reluctant;  I think
there's the possibility that having two filesystems that can be  
accessed truly
simultaneously can be more beneficial.  The question is:  does  
PostgreSQL
have separate, independent areas that require storage such that  
performance
would be noticeably boosted if the multiple storage operations  
could be done

simultaneously?


Putting the WAL (aka pg_xlog) on a separate disk will take some load  
off your main database disk. See http://www.varlena.com/GeneralBits/ 
Tidbits/perf.html for this.


It is also possible to put individual tables and/or indexes on  
separate disks by using tablespaces: For example, an index which is  
very heavily used can be placed on a very fast, highly available  
disk, such as an expensive solid state device. At the same time a  
table storing archived data which is rarely used or not performance  
critical could be stored on a less expensive, slower disk  
system. (http://www.postgresql.org/docs/8.2/interactive/manage-ag- 
tablespaces.html)


In both cases, the performance benefits tend to be relative to the  
amount of write activity you experience, and the latter solution  
assumes you know where the hotspots are. If you have two tables that  
see continuous, intense write activity, for example, putting each on  
a separate disk


Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-15 Thread Alexander Staubo

On Dec 15, 2006, at 04:09 , Ron wrote:


At 07:27 PM 12/14/2006, Alexander Staubo wrote:


Sorry, I neglected to include the pertinent graph:

  http://purefiction.net/paste/pgbench2.pdf
In fact, your graph suggests that using arch specific options in  
addition to -O3 actually =hurts= performance.


The difference is very slight. I'm going to run without -funroll- 
loops and -pipe (which are not arch-related) to get better data.


Alexander.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-15 Thread Alexander Staubo

On Dec 15, 2006, at 17:53 , Ron wrote:


At 09:50 AM 12/15/2006, Greg Smith wrote:

On Fri, 15 Dec 2006, Merlin Moncure wrote:

The slower is probably due to the unroll loops switch which can  
actually hurt code due to the larger footprint (less cache  
coherency).


The cache issues are so important with current processors that I'd  
suggest throwing -Os (optimize for size) into the mix people  
test.  That one may stack usefully with -O2, but probably not with  
-O3 (3 includes optimizations that increase code size).


-Os
Optimize for size. -Os enables all -O2 optimizations that do not  
typically increase code size. It also performs further  
optimizations designed to reduce code size.


So far I have been compiling PostgreSQL and running my pgbench script  
manually, but this makes me want to modify my script to run pgbench  
automatically using all possible permutations of a set of compiler  
flags.


Last I tried GCC to produce 32-bit code on this Opteron system,  
though, it complained about the lack of a compiler; can I persuade it  
to generate 32-bit code (or 32-bit pointers for that matter) without  
going the cross-compilation route?


Alexander.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-14 Thread Alexander Staubo

On Dec 14, 2006, at 16:00 , Greg Smith wrote:


On Wed, 13 Dec 2006, Ron wrote:

The slowest results, Michael's, are on the system with what  
appears to be the slowest CPU of the bunch; and the ranking of the  
rest of the results seem to similarly depend on relative CPU  
performance.  This is not what one would naively expect when  
benching a IO intensive app like a DBMS.


pgbench with 3000 total transactions and fsync off is barely doing  
I/O to disk; it's writing a bunch of data to the filesystem cache  
and ending the benchmark before the data even makes it to the hard  
drive.  This is why his results become completely different as soon  
as the number of transactions increases. With little or no actual  
disk writes, you should expect results to be ranked by CPU speed.


I also second your suggestion that pgbench should be run with -S to  
disable updates. As far as I can see, nobody has reported numbers for  
this setting, so here goes. I also increased the buffer size, which I  
found was needed to avoid hitting the disk for block reads, and  
increased the memory settings.


My PostgreSQL config overrides, then, are:

shared_buffers = 1024MB
work_mem = 1MB
maintenance_work_mem = 16MB
fsync = off

Environment: Linux 2.6.15-23-amd64-generic on Ubuntu. Dual-core AMD  
Opteron 280 with 4GB of RAM. LSI PCI-X Fusion-MPT SAS.


Running with: pgbench -S -v -n -t 5000 -c 5.

Results as a graph: http://purefiction.net/paste/pgbench.pdf

Stats for CFLAGS=-O0: 18440.181894 19207.882300 19894.432185  
19635.625622 19876.858884 20032.597042 19683.597973 20370.19  
19989.157881 20207.343510 19993.745956 20081.353580 20356.416424  
20047.810017 20319.834190 19417.807528 19906.788454 20536.039929  
19491.308046 20002.144230


Stats for CFLAGS=-O3 -msse2 -mfpmath=sse -funroll-loops -m64 - 
march=opteron -pipe: 23830.358351 26162.203569 25569.091264  
26762.755665 26590.822550 26864.908197 26608.029665 26796.116921  
26323.742015 26692.576261 26878.859132 26106.770425 26328.371664  
26755.595130 25488.304946 26635.527959 26377.485023 24817.590708  
26480.245737 26223.427801


Alexander.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-14 Thread Alexander Staubo

On Dec 14, 2006, at 20:28 , Ron wrote:

Can you do runs with just CFLAGS=-O3 and just CFLAGS=-msse2 - 
mfpmath=sse -funroll-loops -m64 - march=opteron -pipe as well ?


All right. From my perspective, the effect of -O3 is significant,  
whereas architecture-related optimizations have no statistically  
significant effect. As far as I'm aware, though, there's no other  
arch targets on the Opteron that will make sense, there being no  
predecessor CPU instruction set to choose from; -march=pentium4  
doesn't exist.



Also, what HDs are you using?  How many in what config?


I believe the volume is a two-drive RAID 1 configuration, but I'm not  
managing these servers, so I'll ask the company's support people.


Interestingly enough I see that PostgreSQL seems to be writing around  
1MB/s during the pgbench run, even though I'm running pgbench in the - 
S mode. I haven't had the chance to look at the source yet; is it  
really only doing selects?


Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-14 Thread Alexander Staubo

On Dec 15, 2006, at 01:16 , Ron wrote:


At 05:39 PM 12/14/2006, Alexander Staubo wrote:

On Dec 14, 2006, at 20:28 , Ron wrote:

Can you do runs with just CFLAGS=-O3 and just CFLAGS=-msse2 -  
mfpmath=sse -funroll-loops -m64 - march=opteron -pipe as well ?


All right. From my perspective, the effect of -O3 is significant,  
whereas architecture-related optimizations have no statistically  
significant effect.


Is this opinion?  Or have you rerun the tests using the flags I  
suggested?  If so, can you post the results?


Sorry, I neglected to include the pertinent graph:

  http://purefiction.net/paste/pgbench2.pdf

The raw data:

CFLAGS=-msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -pipe:

18480.899621 19977.162108 19640.562003 19823.585944 19500.293284  
19964.383540 20228.664827
20515.766366 19956.431120 19740.795459 20184.551390 19984.907398  
20457.260691 19771.395220
20159.225628 19907.248149 20197.580815 19947.498185 20209.450748  
20088.501904


CFLAGS=-O3

23814.672315 26846.761905 27137.807960 26957.898233 27109.057570  
26997.227925 27291.056939
27565.553643 27422.624323 27392.397185 27757.144967 27402.365372  
27563.365421 27349.544685
27544.658154 26957.200592 27523.824623 27457.380654 27052.910082  
24452.819263


CFLAGS=-O0

18440.181894 19207.882300 19894.432185 19635.625622 19876.858884  
20032.597042 19683.597973
20370.19 19989.157881 20207.343510 19993.745956 20081.353580  
20356.416424 20047.810017
20319.834190 19417.807528 19906.788454 20536.039929 19491.308046  
20002.144230


CFLAGS=-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron - 
pipe


23830.358351 26162.203569 25569.091264 26762.755665 26590.822550  
26864.908197 26608.029665
26796.116921 26323.742015 26692.576261 26878.859132 26106.770425  
26328.371664 26755.595130
25488.304946 26635.527959 26377.485023 24817.590708 26480.245737  
26223.427801


If  -O3 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron - 
pipe results in a 30-40% speed up over -O0, and
 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe  
results in a 5-10% speedup, then ~ 1/8 - 1/3 of the total possible  
speedup is due to arch specific optimizations.


Unfortunately, I don't see a 5-10% speedup; -O0 and -msse2 ...  
are statistically identical.


Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-12 Thread Alexander Staubo

On Dec 11, 2006, at 23:22 , Daniel van Ham Colchete wrote:


I ran this test at a Gentoo test machine I have here. It's a Pentium 4
3.0GHz (I don't know witch P4)


Try cat /proc/cpuinfo.


TESTS RESULTS
==


On a dual-core Opteron 280 with 4G RAM with an LSI PCI-X Fusion-MPT  
SAS controller, I am getting wildly uneven results:


tps = 264.775137 (excluding connections establishing)
tps = 160.365754 (excluding connections establishing)
tps = 151.967193 (excluding connections establishing)
tps = 148.010349 (excluding connections establishing)
tps = 260.973569 (excluding connections establishing)
tps = 144.693287 (excluding connections establishing)
tps = 148.147036 (excluding connections establishing)
tps = 259.485717 (excluding connections establishing)

I suspect the hardware's real maximum performance of the system is  
~150 tps, but that the LSI's write cache is buffering the writes. I  
would love to validate this hypothesis, but I'm not sure how.


Alexander.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-12 Thread Alexander Staubo

On Dec 12, 2006, at 13:32 , Michael Stone wrote:


On Tue, Dec 12, 2006 at 12:29:29PM +0100, Alexander Staubo wrote:
I suspect the hardware's real maximum performance of the system  
is  ~150 tps, but that the LSI's write cache is buffering the  
writes. I  would love to validate this hypothesis, but I'm not  
sure how.


With fsync off? The write cache shouldn't really matter in that  
case. (And for this purpose that's probably a reasonable  
configuration.)


No, fsync=on. The tps values are similarly unstable with fsync=off,  
though -- I'm seeing bursts of high tps values followed by low-tps  
valleys, a kind of staccato flow indicative of a write caching being  
filled up and flushed.


Alexander.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Alexander Staubo

On Dec 11, 2006, at 04:35 , Tatsuo Ishii wrote:


That's not the whole story. UTF-8 and other variable-width encodings
don't provide a 1:1 mapping of logical characters to single bytes; in
particular, combination characters opens the possibility of multiple
different byte sequences mapping to the same code point; therefore,
string comparison in such encodings generally cannot be done at the
byte level (unless, of course, you first acertain that the strings
involved are all normalized to an unambiguous subset of your  
encoding).


Can you tell me such encodings supported by PostgreSQL other
than UTF-8?


http://www.postgresql.org/docs/8.1/interactive/ 
multibyte.html#MULTIBYTE-CHARSET-SUPPORTED


Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Alexander Staubo

On Dec 6, 2006, at 16:40 , Brian Wipf wrote:


All tests are with bonnie++ 1.03a

[snip]

Care to post these numbers *without* word wrapping? Thanks.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] query produces 1 GB temp file

2006-10-27 Thread Alexander Staubo
While I can't explain why PostgreSQL would use that memory, I  
recommend looking into tweaking the work_mem parameter. This setting  
specifies how much memory PostgreSQL on certain temporary data  
structures (hash tables, sort vectors) until it starts using  
temporary files. Quoting the docs:



work_mem (integer)
Specifies the amount of memory to be used by internal sort  
operations and hash tables before switching to temporary disk  
files. The value is specified in kilobytes, and defaults to 1024  
kilobytes (1 MB). Note that for a complex query, several sort or  
hash operations might be running in parallel; each one will be  
allowed to use as much memory as this value specifies before it  
starts to put data into temporary files. Also, several running  
sessions could be doing such operations concurrently. So the total  
memory used could be many times the value of work_mem; it is  
necessary to keep this fact in mind when choosing the value. Sort  
operations are used for ORDER BY, DISTINCT, and merge joins. Hash  
tables are used in hash joins, hash-based aggregation, and hash- 
based processing of IN subqueries.


Alexander.

On Feb 5, 2005, at 18:25 , Dirk Lutzebaeck wrote:


Hi,

here is a query which produces over 1G temp file in pgsql_tmp. This
is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
sort_mem and 320MB shared_mem.

Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
tables have been analyzed before.

Can some please explain why the temp file is so huge? I understand
there are a lot of rows. All relevant indices seem to be used.

Thanks in advance,

Dirk

EXPLAIN
SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart,  
ft.flatobj, bi.oid, bi.en

FROM bi, en, df AS ft, es
WHERE bi.rc=130170467
AND bi.en=ft.en
AND bi.co=117305223
AND bi.hide=FALSE
AND ft.en=en.oid
AND es.en=bi.en
AND es.co=bi.co
AND es.spec=122293729
AND (ft.val_2='DG' OR ft.val_2='SK')
AND ft.docstart=1
ORDER BY ft.val_9 ASC, ft.created DESC
LIMIT 1000 OFFSET 0;

 Limit  (cost=8346.75..8346.78 rows=3 width=1361)
   -  Unique  (cost=8346.75..8346.78 rows=3 width=1361)
 -  Sort  (cost=8346.75..8346.76 rows=3 width=1361)
   Sort Key: ft.val_9, ft.created, ft.flatid
   -  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
 -  Nested Loop  (cost=0.00..5757.17 rows=17  
width=51)
   -  Nested Loop  (cost=0.00..5606.39  
rows=30 width=42)
 -  Index Scan using es_sc_index  
on es  (cost=0.00..847.71 rows=301 width=8)
   Index Cond: ((spec =  
122293729) AND (co = 117305223::oid))
 -  Index Scan using bi_env_index  
on bi  (cost=0.00..15.80 rows=1 width=42)
   Index Cond: (outer.en =  
bi.en)
   Filter: ((rc =  
130170467::oid) AND (co = 117305223::oid) AND (hide = false))
   -  Index Scan using en_oid_index on en   
(cost=0.00..5.01 rows=1 width=9)

 Index Cond: (outer.en = en.oid)
 -  Index Scan using df_en on df ft   
(cost=0.00..151.71 rows=49 width=1322)

   Index Cond: (outer.en = ft.en)
   Filter: (((val_2 = 'DG'::text) OR (val_2  
= 'SK'::text)) AND (docstart = 1))

(17 rows)


--

EXPLAIN ANALYZE gives:


 Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual  
time=75357.465..75679.964 rows=1000 loops=1)
   -  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual  
time=75357.459..75675.371 rows=1000 loops=1)
 -  Sort  (cost=8346.75..8346.76 rows=3 width=1361)  
(actual time=75357.448..75499.263 rows=22439 loops=1)

   Sort Key: ft.val_9, ft.created, ft.flatid
   -  Nested Loop  (cost=0.00..8346.73 rows=3  
width=1361) (actual time=34.104..18016.005 rows=703677 loops=1)
 -  Nested Loop  (cost=0.00..5757.17 rows=17  
width=51) (actual time=0.467..3216.342 rows=48563 loops=1)
   -  Nested Loop  (cost=0.00..5606.39  
rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1)
 -  Index Scan using es_sc_index  
on es  (cost=0.00..847.71 rows=301 width=8) (actual  
time=0.184..46.519 rows=5863 loops=1)
   Index Cond: ((spec =  
122293729) AND (co = 117305223::oid))
 -  Index Scan using bi_env_index  
on bi  (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218  
rows=8 loops=5863)
   Index Cond: (outer.en =  
bi.en)
   Filter: ((rc =  
130170467::oid) AND (co = 117305223::oid) AND (hide = false))
   -  Index Scan using en_oid_index on en   
(cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1  

Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Alexander Staubo

On Oct 17, 2006, at 11:33 , Ruben Rubio wrote:


CREATE TABLE comment
(
  idcomment int4 NOT NULL DEFAULT
nextval('comment_idcomment_seq'::regclass),

[snip 28 columns]

  CONSTRAINT comment_pkey PRIMARY KEY (idcomment)
)

Ficha structure:
No indexes in ficha
Ficha rows: 17.850

CREATE TABLE ficha
(
  idficha int4 NOT NULL DEFAULT nextval 
('ficha_idficha_seq'::regclass),

[snip 67 (!) columns]

  CONSTRAINT pk_ficha PRIMARY KEY (idficha),
  CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES
geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION
)


 These tables are particularly egregious examples of ignorant  
database design. You need to understand the relational model (http:// 
en.wikipedia.org/wiki/Relational_model), specifically data  
normalization (http://en.wikipedia.org/wiki/Database_normalization)  
and 3NF (http://en.wikipedia.org/wiki/3NF).


These columns are particularly telling:

  searchengine1 int4,
  searchengine2 int4,
  searchengine3 int4,
  searchengine4 int4,
  searchengine5 int4,
  searchengine6 int4,
  deseo1 int4,
  deseo2 int4,
  deseo3 int4,
  deseo4 int4,
  deseo5 int4,
  deseo6 int4,
  titulomapa_l0 varchar(255),
  titulomapa_l1 varchar(255),
  titulomapa_l2 varchar(255),
  titulomapa_l3 varchar(255),
  titulomapa_l4 varchar(255),
  titulomapa_l5 varchar(255),
  titulomapa_l6 varchar(255),
  titulomapa_l7 varchar(255),
  titulomapa_l8 varchar(255),
  titulomapa_l9 varchar(255),

Refactor into three separate tables:

  create table searchengine (
idficha int references ficha (idficha),
searchengine int,
primary key (idficha, searchengine)
  );

  create table deseo (
idficha int references ficha (idficha),
deseo int,
primary key (idficha, deseo)
  );

  create table titulomapa (
idficha int references ficha (idficha),
titulomapa int,
primary key (idficha, titulomapa)
  );

Now you can find all search engines for a single ficha row:

  select searchengine from searchengine where idficha = n

This design allows for more than 5 search engines per ficha row, and  
allows expressive joins such as:


  select ficha.idficha, searchengine.searchengine
  inner join searchengine on searchengine.idfciha = ficha.idficha

Also, most of your columns are nullable. This alone shows that you  
don't understand your own data.


Lastly, note that in PostgreSQL these length declarations are not  
necessary:


  contacto varchar(255),
  fuente varchar(512),
  prefijopais varchar(10)

Instead, use:

  contacto text,
  fuente text,
  prefijopais text

See the PostgreSQL manual for an explanation of varchar vs. text.

Alexander.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Alexander Staubo

On Oct 17, 2006, at 17:10 , Craig A. James wrote:

 These tables are particularly egregious examples of ignorant  
database design. You need to understand the relational model


This email is a *particularly* egregious example of rudeness.  You  
owe Mr. Staubo, and the Postgress community, an apology.


I'm sorry you feel that way, but I don't think I was out of line. I  
did point to several informative sources of documentation, and  
described some of the problems (but by no means all) with the  
person's schema and how to solve them. If you think the database  
design in question is *not* ignorant database design, please do  
explain why, but on technical grounds. (Ignorance, of course, is not  
a sin.)


Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Alexander Staubo

On Oct 17, 2006, at 17:29 , Mario Weilguni wrote:


Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:

Lastly, note that in PostgreSQL these length declarations are not
necessary:

   contacto varchar(255),
   fuente varchar(512),
   prefijopais varchar(10)


Enforcing length constraints with varchar(xyz) is good database  
design, not a
bad one. Using text everywhere might be tempting because it works,  
but it's

not a good idea.


Enforcing length constraints is generally a bad idea because it  
assumes you know the data domain as expressed in a quantity of  
characters. Off the top of your head, do you know the maximum length  
of a zip code? A street address? The name of a city?


In almost all cases the limit you invent is arbitrary, and the  
probability of being incompatible with any given input is inversely  
proportional to that arbitrary limit.


Encoding specific length constraints in the database makes sense when  
they relate explicitly to business logic, but I can think of only a  
few cases where it would make sense: restricting the length of  
passwords, user names, and so on. In a few cases you do know with  
100% certainty the limit of your field, such as with standardized  
abbreviations: ISO 3166 country codes, for example. And sometimes you  
want to cap data due to storage or transmission costs.


The length constraint on text fields is primarily a historical  
artifact stemming from the way databases have traditionally been  
implemented, as fixed-length fields in fixed-length row structures.  
The inexplicable, improbable space-padded (!) character data type  
in ANSI SQL is a vestige of this legacy. PostgreSQL's variable-length  
rows and TOAST mechanism makes the point moot.


Quoth the PostgreSQL manual, section 8.3:

There are no performance differences between these three types,  
apart from the increased storage size when using the blank-padded  
type. While character(n) has performance advantages in some other  
database systems, it has no such advantages in PostgreSQL. In most  
situations text or character varying should be used instead.


Alexander.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Alexander Staubo

On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote:

The problem is that simple select queries with the primary key in  
the WHERE statement take very long to run.

For example, this query returns only 7 rows and takes about 1
second to run !
SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41);


This is a very small table, but generally speaking, such queries  
benefit from an index; eg.,


  create index table1_gid on table1 (gid);

Note that PostgreSQL may still perform a sequential scan if it thinks  
this has a lower cost, eg. for small tables that span just a few pages.



I have run VACUUM FULL on this table many times... I don't know
what to try next !


PostgreSQL's query planner relies on table statistics to perform  
certain optimizations; make sure you run analyze table1.


Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly