Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Kevin Grittner
Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:
 
 Also xfs has seen quite a bit of development in these later
 kernels, any thoughts on that?
 
We've been using xfs for a few years now with good performance and
no problems other than needing to disable write barriers to get good
performance out of our battery-backed RAID adapter.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Whit Armstrong
Kevin,

While we're on the topic, do you also diable fsync?

We use xfs with battery-backed raid as well.  We have had no issues with xfs.

I'm curious whether anyone can comment on his experience (good or bad)
using xfs/battery-backed-cache/fsync=off.

Thanks,
Whit


On Tue, Jul 27, 2010 at 9:48 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 Also xfs has seen quite a bit of development in these later
 kernels, any thoughts on that?

 We've been using xfs for a few years now with good performance and
 no problems other than needing to disable write barriers to get good
 performance out of our battery-backed RAID adapter.

 -Kevin

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Kevin Grittner
Whit Armstrong armstrong.w...@gmail.com wrote:
 
 While we're on the topic, do you also diable fsync?
 
We only disable fsync during bulk loads, where we would be starting
over anyway if there was a failure.  Basically, you should never use
fsync unless you are OK with losing everything in the database
server if you have an OS or hardware failure.  We have a few
databases where we would consider that if performance wasn't
otherwise acceptable, since they are consolidated replicas of
off-side source databases, and we have four identical ones in two
separate buildings; however, since performance is good with fsync on
and it would be a bother to have to copy from one of the other
servers in the event of an OS crash, we leave it on.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Basically, you should never use fsync unless you are OK with
 losing everything in the database server if you have an OS or
 hardware failure.
 
s/use/disable/
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Whit Armstrong
Thanks.

But there is no such risk to turning off write barriers?

I'm only specifying noatime for xfs at the moment.

Did you get a substantial performace boost from disabling write
barriers?  like 10x or more like 2x?

Thanks,
Whit



On Tue, Jul 27, 2010 at 1:19 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Basically, you should never use fsync unless you are OK with
 losing everything in the database server if you have an OS or
 hardware failure.

 s/use/disable/

 -Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] potential performance gain by query planner optimization

2010-07-27 Thread Tom Lane
Kneringer, Armin armin.knerin...@fabasoft.com writes:
 I think I found a potential performance gain if the query planner would be 
 optimized. All Tests has been performed with 8.4.1 (and earlier versions) on 
 CentOS 5.3 (x64)

 The following query will run on my database (~250 GB) for ca. 1600 seconds 
 and the sort will result in a disk merge deploying ca. 200 GB of data to the 
 local disk (ca. 180.000 tmp-files)

What have you got work_mem set to?  It looks like you must be using an
unreasonably large value, else the planner wouldn't have tried to use a
hash join here:

  -  Hash  (cost=11917516.57..11917516.57 
 rows=55006045159 width=16)
-  Nested Loop  (cost=0.00..11917516.57 
 rows=55006045159 width=16)
  -  Seq Scan on atdateval t5  
 (cost=0.00...294152.40 rows=1859934 width=12)
Filter: (attrid = 
 281479288456447::bigint)
  -  Index Scan using ind_ataggval on 
 ataggval q1_1  (cost=0.00..6.20 rows=4 width=12)
Index Cond: ((q1_1.attrid = 
 281479288456451::bigint) AND (q1_1.aggval = t5.aggrid))
Filter: (q1_1.aggrid = 0)

Also, please try something newer than 8.4.1 --- this might be some
already-fixed bug.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-27 Thread Kevin Grittner
Whit Armstrong armstrong.w...@gmail.com wrote:
 
 But there is no such risk to turning off write barriers?
 
Supposedly not:
 
http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
 
 Did you get a substantial performace boost from disabling write
 barriers?  like 10x or more like 2x?
 
It made a huge difference on creation and deletion of disk files. 
Unfortunately we have some procedures which use a cursor and loop
through rows calling a function which creates and drops a temporary
table.  While I would like to see those transactions rewritten to
use sane techniques, they run fast enough without the write barriers
to be acceptable to the users, which puts the issue pretty low on
the priority list.  I don't have the numbers anymore, but I'm sure
it was closer to 100 times slower than 10 times.  In some workloads
you might not notice the difference, although I would watch out for
checkpoint behavior.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to handle a big table for data log

2010-07-27 Thread kuopo
Let me make my problem clearer. Here is a requirement to log data from a set
of objects consistently. For example, the object maybe a mobile phone and it
will report its location every 30s. To record its historical trace, I create
a table like
*CREATE TABLE log_table
(
  id integer NOT NULL,
 data_type integer NOT NULL,
 data_value double precision,
 ts timestamp with time zone NOT NULL,
 CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
)*;
In my location log example, the field data_type could be longitude or
latitude.

I create a primary key (id, data_type, ts) to make my queries more
efficient. The major type of queries would ask the latest data_value of a
data_type by given id and timestamp. For this kind of query, I make the
following SQL statement
*SELECT * FROM log_table WHERE id=[given id] and data_type='longitude' and
(ts = (SELECT max(ts) FROM log_table WHERE id=[given id]and
data_type='longitude' and ts=[given timestamp]));*
According to my evaluation, its performance is acceptable.

However, I concern more about the performance of insert operation. As I have
mentioned, the log_table is growing so I decide to partition it. Currently,
I partition it by date and only keep it 60 days. This partition is helpful.
But when I partition it by data_type (in my case, the number of data_type is
limited, say 10), the performance of insert operation will be degraded. I
guess this is caused by multiple vacuum/analyze on these partitioned
data_type log tables. However, if I put all data_type logs together, I can
expect that the performance of insert operation will also have degradation
if I want to expand the system to support more mobile phones or more
data_type.

This is my current situation. Please give me some hints to improve the
performance (especially for the insert part).


kuopo.


On Mon, Jul 19, 2010 at 11:37 PM, Jorge Montero 
jorge_mont...@homedecorators.com wrote:
 Large tables, by themselves, are not necessarily a problem. The problem is
 what you might be trying to do with them. Depending on the operations you
 are trying to do, partitioning the table might help performance or make it
 worse.

 What kind of queries are you running? How many days of history are you
 keeping? Could you post an explain analyze output of a query that is being
 problematic?
 Given the amount of data you hint about, your server configuration, and
 custom statistic targets for the big tables in question would be useful.

 kuopo sp...@cs.nctu.edu.tw 7/19/2010 1:27 AM 
 Hi,

 I have a situation to handle a log table which would accumulate a
 large amount of logs. This table only involves insert and query
 operations. To limit the table size, I tried to split this table by
 date. However, the number of the logs is still large (46 million
 records per day). To further limit its size, I tried to split this log
 table by log type. However, this action does not improve the
 performance. It is much slower than the big table solution. I guess
 this is because I need to pay more cost on the auto-vacuum/analyze for
 all split tables.

 Can anyone comment on this situation? Thanks in advance.


 kuopo.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Hannu Krosing
On Fri, 2010-07-23 at 09:52 -0700, Joshua D. Drake wrote:
 On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote:
   
   Let's extend this shall we:
   
   Avoid adding yet another network hop
  
  postgreSQL is multi-process, so you either have a separate pooler
  process or need to put pooler functionality in postmaster, bothw ways
  you still have a two-hop scenario for connect. you may be able to pass
  the socket to child process and also keep it, but doing this for both
  client and db sides seems really convoluted. 
 
 Which means, right now there is three hops. Reducing one is good.

No, it is still two, as postmaster passes the socket to spwaned child
postgresql process after login. 

the process is as follows

Client --connects-- postmaster --spawns-- postgreSQL server process

then socket is passed to be used directly so the use is


Client --talks-to--- postgreSQL server process

when using spooler it becomes


Client --connects-to-- Spooler --passes-requests-to--  postgreSQL 

I see no way to have spooler select the postgreSQL process, pass the
client connection in a way that taks directly to postgrSQL server
process AND be able to get the server connection back once the client is
finishe with either the request, transaction or connection (depending on
pooling mode).



 
  Or is there a prortable way to pass sockets back and forth between
  parent and child processes ?
  
  If so, then pgbouncer could use it as well.
  
   Remove of a point of failure
  
  rather move the point of failure from external pooler to internal
  pooler ;)
 
 Yes but at that point, it doesn't matter. 
 
  
   Reduction of administrative overhead
  
  Possibly. But once you start actually using it, you still need to
  configure and monitor it and do other administrator-y tasks.
 
 Yes, but it is inclusive.
 
  
   Integration into our core authentication mechanisms
  
  True, although for example having SSL on client side connection will be
  so slow that it hides any performance gains from pooling, at least for
  short-lived connections.
 
 Yes, but right now you can't use *any* pooler with LDAP for example. We
 could if pooling was in core. Your SSL argument doesn't really work
 because its true with or without pooling.

As main slowdown in SSL is connection setup, so you can get the network
security and pooling speedup if you run pool on client side and make the
pooler-server connection over SSL.


   Greater flexibility in connection control
  
  Yes, poolers can be much more flexible than default postgresql. See for
  example pgbouncers PAUSE , RECONFIGURE and RESUME commands 
 
 :D
 
  
   And, having connection pooling in core does not eliminate the use of an
   external pool where it makes since.
  
  Probably the easiest way to achieve pooling in core would be adding an
  option to start pgbouncer under postmaster control.
 
 Yeah but that won't happen. 

I guess it could happen as part of opening up the postgresql controlled
process part to be configurable and able to run third party stuff. 

Another thing to run under postmaster control would be pgqd . 

 Also I think we may have a libevent
 dependency that we have to work out.
 
  
  You probably can't get much leaner than pgbouncer.
 
 Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but
 even it has limitations (such as auth).

As pgbouncer is single-threaded and the main goal has been performance
there is not much enthusiasm about having _any_ auth method included
which cant be completed in a few cpu cycles. It may be possible to add
threads to wait for LDAP/Kerberos/... response or do SSL handshakes, but
i have not seen any interest from Marko to do it himself.

Maybe there is a way to modularise the auth part of postmaster in a way
that could be used from third party products through some nice API which
postmaster-controlled pgbouncer can start using.


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Hannu Krosing
On Sat, 2010-07-24 at 14:36 +0800, Craig Ringer wrote:
 On 24/07/10 13:23, Greg Smith wrote:
  Joshua Tolley wrote:
  Relatively minor, but it would be convenient to avoid having to query
  $external_pooler to determine the client_addr of an incoming connection.

  
  You suggest this as a minor concern, but I consider it to be one of the
  most compelling arguments in favor of in-core pooling.  A constant pain
  with external poolers is the need to then combine two sources of data in
  order to track connections fully, which is something that everyone runs
  into eventually and finds annoying.  It's one of the few things that
  doesn't go away no matter how much fiddling you do with pgBouncer, it's
  always getting in the way a bit.  And it seems to seriously bother
  systems administrators and developers, not just the DBAs.
 
 
 Putting a pooler in core won't inherently fix this, and won't remove the
 need to solve it for cases where the pooler can't be on the same machine.
 
 9.0 has application_name to let apps identify themselves. Perhaps a
 pooled_client_ip, to be set by a pooler rather than the app, could be
 added to address this problem in a way that can be used by all poolers
 new and existing, not just any new in-core pooling system.
 
 If a privileged set of pooler functions is was considered, as per my
 other recent mail, the pooler could use a management connection to set
 the client ip before handing the connection to the client, so the client
 couldn't change pooled_client_ip its self by accident or through malice.
 But even without that, it'd be awfully handy.

Or maybe we can add some command extensions to the protocol for passing
extra info, so that instead of sending just the (run_query:query)
command over socket we could send both the extra info and execute
(set_params:(proxy_client_ip:a.b.c.d)(proxy_client_post:n)(something
else))(run_query:query) in one packet (for performance) and have these
things be available in logging and pg_stat_activity

I see no need to try to somehow restrict these if you can always be sure
that they are set by the direct client. proxy can decide to pass some of
these from the real client but it would be a decision made by proxy, not
mandated by some proxying rules.






-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query using the Cube contrib module.

2010-07-27 Thread Liviu Mirea-Ghiban

 Hello,

I have a simple table which has a cube column and a cube GiST index. The 
column contains 3-dimensional points (not cubes or intervals). The 
problem is that I'm getting very slow queries when I'm using the index. 
The table has about 130,000 rows and is full-vacuumed after any 
updates/inserts (it is updated only once every 24 hours).



Table definition:
CREATE TABLE picof.photo_colors
(
  photo_id integer NOT NULL,
  color_percent real NOT NULL,
  lab_color picof.cube NOT NULL
)
WITH (
  OIDS=FALSE
);

CREATE INDEX photo_colors_index
  ON picof.photo_colors
  USING gist
  (lab_color);


My query:
SELECT photo_id FROM photo_colors
WHERE lab_color @ cube_enlarge('0, 0, 0', 10, 3)


Explain analyze:
Bitmap Heap Scan on photo_colors  (cost=13.40..421.55 rows=135 width=4) 
(actual time=7.958..15.493 rows=14313 loops=1)

  Recheck Cond: (lab_color @ '(-10, -10, -10),(10, 10, 10)'::cube)
  -  Bitmap Index Scan on photo_colors_index  (cost=0.00..13.36 
rows=135 width=0) (actual time=7.556..7.556 rows=14313 loops=1)

Index Cond: (lab_color @ '(-10, -10, -10),(10, 10, 10)'::cube)
Total runtime: 16.849 ms
(Executed in PostgreSQL 8.4.4 on Windows and CentOS - same query plan)


Now, while it might not seem much, this is part of a bigger query in 
which several such subqueries are being joined. The cost really adds up.


My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond? I've 
executed dozens of such queries and not once did the rechecking remove 
any rows. Is there any way to disable it, or do you have any other 
suggestions for optimizations (because I'm all out of ideas)?


Thank you in advance!

---
Liviu Mirea

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas
On Sat, Jul 24, 2010 at 2:23 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 24/07/10 01:28, Robert Haas wrote:

 Well, if we could change the backends so that they could fully
 reinitialize themselves (disconnect from a database to which they are
 bound, etc.), I don't see why we couldn't use the Apache approach.

 This would offer the bonus on the side that it'd be more practical to
 implement database changes for a connection, akin to MySQL's USE.
 Inefficient, sure, but possible.

Yep.

 I don't care about that current limitation very much. I think anyone
 changing databases all the time probably has the wrong design and should
 be using schema. I'm sure there are times it'd be good to be able to
 switch databases on one connection, though.

I pretty much agree with this.  I think this is merely slightly nice
on its own, but I think it might be a building-block to other things
that we might want to do down the road.  Markus Wanner's Postgres-R
replication uses worker processes; autovacuum does as well; and then
there's parallel query.  I can't help thinking that not needing to
fork a new backend every time you want to connect to a new database
has got to be useful.

 My question with all this remains: is it worth the effort when external
 poolers already solve the problem.

Whether it's worth the effort is something anyone who is thinking
about working on this will have to decide for themselves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-27 Thread Hannu Krosing
On Mon, 2010-07-26 at 14:34 -0400, Greg Smith wrote:
 Matthew Wakeling wrote:
  Yeb also made the point - there are far too many points on that graph 
  to really tell what the average latency is. It'd be instructive to 
  have a few figures, like only x% of requests took longer than y.
 
 Average latency is the inverse of TPS.  So if the result is, say, 1200 
 TPS, that means the average latency is 1 / (1200 transactions/second) = 
 0.83 milliseconds/transaction. 

This is probably only true if you run all transactions sequentially in
one connection? 

If you run 10 parallel threads and get 1200 sec, the average transaction
time (latency?) is probably closer to 8.3 ms ?

  The average TPS figure is normally on a 
 more useful scale as far as being able to compare them in ways that make 
 sense to people.
 
 pgbench-tools derives average, worst-case, and 90th percentile figures 
 for latency from the logs.  I have 37MB worth of graphs from a system 
 showing how all this typically works for regular hard drives I've been 
 given permission to publish; just need to find a place to host it at 
 internally and I'll make the whole stack available to the world.  So far 
 Yeb's data is showing that a single SSD is competitive with a small 
 array on average, but with better worst-case behavior than I'm used to 
 seeing.
 
 -- 
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us
 
 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-27 Thread Lew

Piotr Gasidło wrote:

EXPLAIN ANALYZE SELECT ...
Total runtime: 4.782 ms
Time: 25,970 ms


Vitalii Tymchyshyn wrote:

Actually it's 20ms, so I suspect your point about planning time is correct.


Craig Ringer wrote:

Oh, a commas-as-fraction-separator locale.

That makes sense. Thanks for the catch.


Strangely, the runtime is shown with a period for the separator, though.

--
Lew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to handle a big table for data log

2010-07-27 Thread Greg Spiegelberg
On Tue, Jul 20, 2010 at 9:51 PM, kuopo sp...@cs.nctu.edu.tw wrote:

 Let me make my problem clearer. Here is a requirement to log data from a
 set of objects consistently. For example, the object maybe a mobile phone
 and it will report its location every 30s. To record its historical trace, I
 create a table like
 *CREATE TABLE log_table
 (
   id integer NOT NULL,
  data_type integer NOT NULL,
  data_value double precision,
  ts timestamp with time zone NOT NULL,
  CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
 )*;
 In my location log example, the field data_type could be longitude or
 latitude.


I witnessed GridSQL in action many moons ago that managed a massive database
log table.  From memory, the configuration was 4 database servers with a
cumulative 500M+ records and queries were running under 5ms.  May be worth a
look.

http://www.enterprisedb.com/community/projects/gridsql.do

Greg


Re: [PERFORM] Slow query using the Cube contrib module.

2010-07-27 Thread Yeb Havinga

Liviu Mirea-Ghiban wrote:


My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond? 
I've executed dozens of such queries and not once did the rechecking 
remove any rows. Is there any way to disable it, or do you have any 
other suggestions for optimizations (because I'm all out of ideas)?
It's probably because the index nodes store data values with a lossy 
compression, which means that the index scan returns more rows than 
wanted, and that in turn is filtered out by the rescanning. See the 
comments for the 'RECHECK' parameter of CREATE OPERATOR CLASS 
(http://www.postgresql.org/docs/8.4/static/sql-createopclass.html). Its 
unwise to alter this behaviour without taking a look/modifying the 
underlying index implementation. The gist index scann part could perhaps 
be made a bit faster by using a smaller blocksize, but I'm not sure if 
or how the recheck part can be improved. Maybe rewriting the top query 
to not do bitmap heap scans in subqueries or inner loops?


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote:
 The problem is harder for us because a backend can't switch identities
 once it's been assigned to a database.  I haven't heard an adequate
 explanation of why that couldn't be changed, though.

 Possibly it might decrease the performance significantly enough by
 reducing the cache locality (syscache, prepared plans)?

 Those things are backend-local.  The worst case scenario is you've got
 to flush them all when you reinitialize, in which case you still save
 the overhead of creating a new process.

Flushing them all is not zero-cost; it's not too hard to believe that
it could actually be slower than forking a clean new backend.

What's much worse, it's not zero-bug.  We've got little bitty caches
all over the backend, including (no doubt) some caching behavior in
third-party code that wouldn't get the word about whatever API you
invented to deal with this.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to handle a big table for data log

2010-07-27 Thread Josh Berkus
On 7/20/10 8:51 PM, kuopo wrote:
 Let me make my problem clearer. Here is a requirement to log data from a
 set of objects consistently. For example, the object maybe a mobile
 phone and it will report its location every 30s. To record its
 historical trace, I create a table like
 /CREATE TABLE log_table
 (
   id integer NOT NULL,
  data_type integer NOT NULL,
  data_value double precision,
  ts timestamp with time zone NOT NULL,
  CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
 )/;
 In my location log example, the field data_type could be longitude or
 latitude.

If what you have is longitude and latitude, why this brain-dead EAV
table structure?  You're making the table twice as large and half as
useful for no particular reason.

Use the point datatype instead of anonymizing the data.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query using the Cube contrib module.

2010-07-27 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 Liviu Mirea-Ghiban wrote:
 My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond? 
 I've executed dozens of such queries and not once did the rechecking 
 remove any rows. Is there any way to disable it, or do you have any 
 other suggestions for optimizations (because I'm all out of ideas)?

 It's probably because the index nodes store data values with a lossy 
 compression, which means that the index scan returns more rows than 
 wanted, and that in turn is filtered out by the rescanning.

The recheck expression is only executed if the index reports that it's
not executed the search exactly.  If you don't see any difference
between the indexscan and bitmapscan output counts, it's probably
because the index can do the case exactly, so the recheck expression
isn't really getting used.  The planner has to include the expression
in the plan anyway, because the decision about lossiness is not known
until runtime.  But it's not costing any runtime.

The OP is mistaken to think there's anything wrong with this plan choice
 more than likely, it's the best available plan.  The reason there's
a significant gap between the indexscan runtime and the bitmapscan
runtime is that that's the cost of going and actually fetching all those
rows from the table.  The only way to fix that is to buy a faster disk
or get more RAM so that more of the table can be held in memory.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Peter Hussey
I have spent the last couple of weeks digging into a Postgres performance
problem that ultimately boiled down to this:  the planner was choosing to
use hash joins on a set of join keys that were much larger than the
configured work_mem.  We found we could make the  performance much better by
either
1) increasing work_mem to 500MB or more, or
2) forcing the planner to choose index-backed nested loops by turning off
hash and merge joins as well as bitmap and sequential scans.

Now we are trying to decide which of these paths to choose, and asking why
the planner doesn't handle this for us.

Background:  LabKey builds an open source platform for biomedical research
data.  The platform consists of a tomcat web application and a relational
database.  we support two databases, Postgres and SQL Server.  We started
with SQL Server because we were very familiar with it.  Two of our technical
team came from the SQL Server development team.  We chose Postgres because
we assessed that it was the open source database most likely to be able to
handle our application  requirements for capacity and complex, nested,
generated SQL handling.  Postgres is now the default database for our
platform and most of our key customers use it.  In general we've been very
satisfied with Postgres' performance and compatibility, but our customers
are starting to hit situations where we really need to be able to understand
why a particular operation is slow.  We are currently recommending version
8.4 and using that ourselves.

The core of the problem query was

SELECT * INTO snapshot_table FROM
  (SELECT ... FROM  tableA A LEFT  OUTER JOIN tableB B ON (A.lsid = B.lsid)
and A.datasetid = ? )  query1

the join column, lsid, is a poor choice for a join column as it is a long
varchar value (avg length 101 characters) that us only gets unique way out
on the right hand side.  But we are stuck with this choice.  I can post the
SQL query and table definitions if it will help, but changes to either of
those would be risky and difficult, whereas setting the work_mem value or
forcing nested loop joins is less risky.

The Performance curve looks something like this

Join Type  work_mem(MB) time to populate snapshot (min)
__
Hash  5085
Hash  200   38
Hash  400   21
Hash  500   12
Hash 1000   12
___
NestedLoop5015
NestedLoop200   11
NestedLoop400   11
NestedLoop500   10
NestedLoop   1000   10


Table A contains about 3.5 million rows, and table B contains about 4.4
million rows.  By looking at the EXPLAIN ANALYZE reports I concluded that
the planner seemed to be accurately determining the approximate number of
rows returned on each side of the join node.  I also noticed that at the
work_mem = 50 test, the hash join query execution was using over a GB of
space in the pgsql_tmp, space that grew and shrank slowly over the course of
the test.

Now for the questions:
1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
postgres.config, what problems might they see?  the documentation and the
guidelines we received from Rupinder Singh in support suggest a much lower
value, e.g. a max work_mem of 10MB.  Other documentation such as the Guide
to Posting Slow Query Questions suggest at least testing up to 1GB.  What
is a reasonable maximum to configure for all connnections?

2) How is work_mem used by a query execution?  For example, does each hash
table in an execution get allocated a full work_mem's worth of memory ?   Is
this memory released when the query is finished, or does it stay attached to
the connection or some other object?

3) is there a reason why the planner doesn't seem to recognize the condition
when the hash table won't fit in the current work_mem, and choose a
low-memory plan instead?

Excuse the long-winded post; I was trying to give the facts and nothing but
the facts.

Thanks,
Peter Hussey
LabKey Software


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Andres Freund
Hi,

On Tue, Jul 27, 2010 at 04:08:16PM -0700, Peter Hussey wrote:
 Now for the questions:
 1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
 postgres.config, what problems might they see?  the documentation and the
 guidelines we received from Rupinder Singh in support suggest a much lower
 value, e.g. a max work_mem of 10MB.  Other documentation such as the Guide
 to Posting Slow Query Questions suggest at least testing up to 1GB.  What
 is a reasonable maximum to configure for all connnections?
Well. That depends on the amount of expected concurrency and available
memory. Obviously you can set it way much higher in an OLAPish, low
concurrency setting than in an OLTP environment.

That setting is significantly complex to estimate in my opinion. For
one the actualy usage depends on the complexity of the queries, for
another to be halfway safe you have to use avail_mem/(max_connections
* max_nodes_of_most_complex_query). Which is often a very pessimistic
and unusably low estimate.

 2) How is work_mem used by a query execution?  For example, does each hash
 table in an execution get allocated a full work_mem's worth of memory ?   Is
 this memory released when the query is finished, or does it stay attached to
 the connection or some other object?
Each Node of the query can use one work_mem worth of data (sometimes a
bit more). The memory is released after the query finished (or
possibly earlier, dependent of the structure of the query).
The specific allocation pattern and implementation details (of malloc)
influence how and when that memory is actually returned to the os.

 3) is there a reason why the planner doesn't seem to recognize the condition
 when the hash table won't fit in the current work_mem, and choose a
 low-memory plan instead?
Hard to say without more information. Bad estimates maybe? Best show
your query plan (EXPLAIN ANALYZE), the table definition and some
details about common hardware (i.e. whether it has 1GB of memory or
256GB).

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Tom Lane
Peter Hussey pe...@labkey.com writes:
 I have spent the last couple of weeks digging into a Postgres performance
 problem that ultimately boiled down to this:  the planner was choosing to
 use hash joins on a set of join keys that were much larger than the
 configured work_mem.

What Postgres version is this, exactly?  (8.4 is not the answer I want.)

 the join column, lsid, is a poor choice for a join column as it is a long
 varchar value (avg length 101 characters) that us only gets unique way out
 on the right hand side.

Hm, but it is unique eventually?  It's not necessarily bad for hashing
as long as that's so.

 1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
 postgres.config, what problems might they see?

That would almost certainly be disastrous.  If you have to follow the
hack-work_mem path, I'd suggest increasing it locally in the session
executing the problem query, and only for the duration of that query.
Use SET, or even SET LOCAL.

 2) How is work_mem used by a query execution?

Well, the issue you're hitting is that the executor is dividing the
query into batches to keep the size of the in-memory hash table below
work_mem.  The planner should expect that and estimate the cost of
the hash technique appropriately, but seemingly it's failing to do so.
Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
to be sure.

 3) is there a reason why the planner doesn't seem to recognize the condition
 when the hash table won't fit in the current work_mem, and choose a
 low-memory plan instead?

That's the question, all right.  I wonder if it's got something to do
with the wide-varchar nature of the join key ... but again that's just
speculation with no facts.  Please show us EXPLAIN ANALYZE results
for the hash plan with both small and large work_mem, as well as for
the nestloop plan.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas
On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote:
 The problem is harder for us because a backend can't switch identities
 once it's been assigned to a database.  I haven't heard an adequate
 explanation of why that couldn't be changed, though.

 Possibly it might decrease the performance significantly enough by
 reducing the cache locality (syscache, prepared plans)?

 Those things are backend-local.  The worst case scenario is you've got
 to flush them all when you reinitialize, in which case you still save
 the overhead of creating a new process.

 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.

I'm not so sure I believe that.  Is a sinval overrun slower than
forking a clean new backend?  Is DISCARD ALL slower that forking a
clean new backend?  How much white space is there between either of
those and what would be needed here?  I guess it could be slower, but
I wouldn't want to assume that without evidence.

 What's much worse, it's not zero-bug.  We've got little bitty caches
 all over the backend, including (no doubt) some caching behavior in
 third-party code that wouldn't get the word about whatever API you
 invented to deal with this.

I think this is probably the biggest issue with the whole idea, and I
agree there would be some pain involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.

 I'm not so sure I believe that.

I'm not asserting it's true, just suggesting it's entirely possible.
Other than the fork() cost itself and whatever authentication activity
there might be, practically all the startup cost of a new backend can be
seen as cache-populating activities.  You'd have to redo all of that,
*plus* pay the costs of getting rid of the previous cache entries.
Maybe the latter costs less than a fork(), or maybe not.  fork() is
pretty cheap on modern Unixen.

 What's much worse, it's not zero-bug.

 I think this is probably the biggest issue with the whole idea, and I
 agree there would be some pain involved.

Yeah, if it weren't for that I'd say sure let's try it.  But I'm
afraid we'd be introducing significant headaches in return for a gain
that's quite speculative.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas
On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.

 I'm not so sure I believe that.

 I'm not asserting it's true, just suggesting it's entirely possible.
 Other than the fork() cost itself and whatever authentication activity
 there might be, practically all the startup cost of a new backend can be
 seen as cache-populating activities.  You'd have to redo all of that,
 *plus* pay the costs of getting rid of the previous cache entries.
 Maybe the latter costs less than a fork(), or maybe not.  fork() is
 pretty cheap on modern Unixen.

 What's much worse, it's not zero-bug.

 I think this is probably the biggest issue with the whole idea, and I
 agree there would be some pain involved.

 Yeah, if it weren't for that I'd say sure let's try it.  But I'm
 afraid we'd be introducing significant headaches in return for a gain
 that's quite speculative.

I agree that the gain is minimal of itself; after all, how often do
you need to switch databases, and what's the big deal if the
postmaster has to fork a new backend?  Where I see it as a potentially
big win is when it comes to things like parallel query.  I can't help
thinking that's going to be a lot less efficient if you're forever
forking new backends.  Perhaps the point here is that you'd actually
sort of like to NOT flush all those caches unless it turns out that
you're switching databases - many installations probably operate with
essentially one big database, so chances are good that even if you
distributed connections / parallel queries to backends round-robin,
you'd potentially save quite a bit of overhead.  Of course, for the
guy who has TWO big databases, you might hope to be a little smarter,
but that's another problem altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
 Peter Hussey pe...@labkey.com writes:

  2) How is work_mem used by a query execution?
 
 Well, the issue you're hitting is that the executor is dividing the
 query into batches to keep the size of the in-memory hash table below
 work_mem.  The planner should expect that and estimate the cost of
 the hash technique appropriately, but seemingly it's failing to do so.
 Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
 to be sure.

Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
this visible in the explain output?  If it's something subtle (like an
increased total cost), may I suggest that it'd be a good idea to make it
explicit somehow in the machine-readable outputs?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Jayadevan M
Hello,
 the join column, lsid, is a poor choice for a join column as it is a
 long varchar value (avg length 101 characters) that us only gets 
 unique way out on the right hand side.
Would a join on subtring on the 'way out on the right hand side' (did you 
mean 'rightmost characters' or 'only when we take almost all the 101 
characters'?) together with a function based index help?
Regards,
Jayadevan





DISCLAIMER: 

The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
 Well, the issue you're hitting is that the executor is dividing the
 query into batches to keep the size of the in-memory hash table below
 work_mem.  The planner should expect that and estimate the cost of
 the hash technique appropriately, but seemingly it's failing to do so.

 Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
 this visible in the explain output?

As of 9.0, any significant difference between Hash Batches and
Original Hash Batches would be a cue that the planner blew the
estimate.  For Peter's problem, we're just going to have to look
to see if the estimated cost changes in a sane way between the
small-work_mem and large-work_mem cases.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Other than the fork() cost itself and whatever authentication activity
 there might be, practically all the startup cost of a new backend can be
 seen as cache-populating activities.  You'd have to redo all of that,
 *plus* pay the costs of getting rid of the previous cache entries.
 Maybe the latter costs less than a fork(), or maybe not.  fork() is
 pretty cheap on modern Unixen.

 I agree that the gain is minimal of itself; after all, how often do
 you need to switch databases, and what's the big deal if the
 postmaster has to fork a new backend?  Where I see it as a potentially
 big win is when it comes to things like parallel query.  I can't help
 thinking that's going to be a lot less efficient if you're forever
 forking new backends.

Color me unconvinced.  To do parallel queries with pre-existing worker
processes, you'd need to hook up with a worker that was (at least) in
your own database, and then somehow feed it the query plan that it needs
to execute.  I'm thinking fork() could easily be cheaper.  But obviously
this is all speculation (... and Windows is going to be a whole 'nother
story in any case ...)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Craig Ringer
On 28/07/10 04:40, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote:
 The problem is harder for us because a backend can't switch identities
 once it's been assigned to a database.  I haven't heard an adequate
 explanation of why that couldn't be changed, though.
 
 Possibly it might decrease the performance significantly enough by
 reducing the cache locality (syscache, prepared plans)?
 
 Those things are backend-local.  The worst case scenario is you've got
 to flush them all when you reinitialize, in which case you still save
 the overhead of creating a new process.
 
 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.
 
 What's much worse, it's not zero-bug.  We've got little bitty caches
 all over the backend, including (no doubt) some caching behavior in
 third-party code that wouldn't get the word about whatever API you
 invented to deal with this.

In addition to caches, there may be some places where memory is just
expected to leak. Since it's a one-off allocation, nobody really cares;
why bother cleaning it up when it's quicker to just let the OS do it
when the backend terminates?

Being able to reset a backend for re-use would require that per-session
memory use be as neatly managed as per-query and per-transaction memory,
with no leaked stragglers left lying around.

Such cleanup (and management) has its own costs. Plus, you have a
potentially increasingly fragmented memory map to deal with the longer
the backend lives. Overall, there are plenty of advantages to letting
the OS clean it up.

... however, if the requirement is introduced that a given backend may
only be re-used for connections to the same database, lots of things get
simpler. You have to be able to change the current user (which would be
a bonus anyway), reset GUCs, etc, but how much else is there to do?

That way you can maintain per-database pools of idle workers (apache
prefork style) with ageing-out of backends that're unused. Wouldn't this
do the vast majority of what most pools are needed for anyway? And
wouldn't it potentially save quite a bit of load by avoiding having
backends constantly switching databases, flushing caches, etc?

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance