Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread Alvaro Herrera
johannes graën wrote:
> Hi Pavel, *,
> 
> you were right with ANALYZing the DB first. However, even after doing
> so, I frequently see Seq Scans where an index was used before. This
> usually cooccurs with parallelization and looked different before
> upgrading to 10. I can provide an example for 10 [1], but I cannot
> generate a query plan for 9.6 anymore.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2017-06-29 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov :
> 

> > I just  tried UNION queries and got following error:
> >
> > ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
> 
> it is sad :(

I think we could lift this restriction for UNION ALL, but UNION sounds
difficult.


BTW I wonder how much of the original problem is caused by using a
prepared query.  I understand the desire to avoid repeated planning
work, but I think in this case it may be working against you.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-11 Thread Alvaro Herrera
Vladimir Sitnikov wrote:
> Alvaro>Something like
> INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')>I
> did not
> Frits>try that, to be honest.
> 
> pgjdbc does automatically rewrite insert values(); into insert ...
> values(),(),(),() when reWriteBatchedInserts=true. I don't expect manual
> multivalues to be noticeably faster there.

Ahh, so that's what that option does :-)  Nice to know -- great feature.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-10 Thread Alvaro Herrera
Frits Jalvingh wrote:

> So, I am still very interested in getting normal inserts faster, because
> that will gain speed for all work.. If Oracle can do it, and Postgres is
> able to insert fast with copy- where lies the bottleneck with the insert
> command? There seems to be quite a performance hit with the JDBC driver
> itself (as the stored procedure is a lot faster), so I can look into that.
> But even after that there is quite a gap..

Did you try inserting multiple tuples in one command?  Something like
INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')
It's supposed to be faster than single-row inserts, though I don't
know by how much.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of characters in column preventing index usage

2017-02-17 Thread Alvaro Herrera
Hustler DBA wrote:
> I am seeing this strange behavior, I don't know if this is by design by
> Postgres.
> 
> I have an index on a column which is defined as "character varying(255)".
> When the value I am searching for is of a certain length, the optimizer
> uses the index but when the value is long, the optimizer doesn't use the
> index but does a seq scan on the table. Is this by design? How can I make
> the optimizer use the index no matter what the size/length of the value
> being searched for?

As I recall, selectivity for strings is estimated based on the length of
the string.  Since your sample string looks suspiciously like an UUID,
perhaps you'd be better served by using an UUID column for it, which may
give better results.  This would prevent you from using the shortened
version for searches (which I suppose you can do with LIKE using the
varchar type), but you could replace it with something like this:

select *
from tab
where ID between '01625cfa-2bf8-45cf--' and
  '01625cfa-2bf8-45cf--';

Storage (both the table and indexes) is going to be more efficient this
way too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE takes too much memory

2016-07-04 Thread Alvaro Herrera
Kouber Saparev wrote:
> I tried to DELETE about 7 million rows at once, and the query went up to
> 15% of the RAM (120 GB in total), which pushed some indexes out and the
> server load went up to 250, so I had to kill the query.
> 
> The involved table does not have neither foreign keys referring to other
> tables, nor other tables refer to it. The size of the table itself is 19 GB
> (15% of 120 GB). So why the DELETE tried to put the entire table in memory,
> or what did it do to take so much memory?

Are there triggers in the table?  Deferred triggers in particular can
use memory.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big number of connections

2016-03-31 Thread Alvaro Herrera
Andrew Dunstan wrote:

> On 03/31/2016 03:12 PM, Igor Neyman wrote:

> > >   We are going to build system based on PostgreSQL database for huge 
> > > number of individual users (few thousands). Each user will have his own 
> > > account, for authorization we will use Kerberos (MIT or Windows).
> > >Most of users will have low activity, but for various reasons, connection 
> > >should be open all the time.
> > >I'd like to know what potential problems and limitations we can expect 
> > >with such deployment.
> > >   During preliminary testing we have found that for each connection we 
> > > need ~1MB RAM. Is there any way to decrease this ? Is there any risk, 
> > > that such number of users will degrade performance ?
> > >   I'll be happy to hear any remarks and suggestions related to design, 
> > > administration and handling of such installation.

> >Take a look at PgBouncer.
> >It should solve your problems.
> 
> If they are going to keep the client connections open, they would need to
> run pgbouncer in statement or transaction mode.

As I understand, in pgbouncer you cannot have connections that serve
different users.  If each individual requires its own database-level
user, pgbouncer would not help at all.

I would look seriously into getting rid of the always-open requirement
for connections.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [SPAM] Re: [PERFORM] autovacuum disk IO

2016-03-02 Thread Alvaro Herrera
Scott Marlowe wrote:
> On Wed, Mar 2, 2016 at 9:11 AM, Moreno Andreo  
> wrote:

> > ... or maybe add some more RAM to have more disk caching (if you're on
> > *nix) this worked for me in the past... even if IMHO it's more a
> > temporary "patch" while upgrading (if it can't be done in a hurry) than a
> > real solution...
> 
> Oh yeah, definitely worth looking at. But RAM can't speed up writes,
> just reads, so it's very workload dependent. If you're IO subsystem is
> maxing out on writes, faster drives / IO. If it's maxing out on reads,
> more memory. But if your dataset is much bigger than memory (say 64GB
> RAM and a 1TB data store) then more RAM isn't going to be the answer.

In the particular case of autovacuum, it may be helpful to create a
"ramdisk" and put the stats temp file in it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] checkpoints, proper config

2015-12-10 Thread Alvaro Herrera
Tomas Vondra wrote:

> Also, I don't think it makes much sense to set
> 
>(checkpoint_warning > checkpoint_timeout)
> 
> as it kinda defeats the whole purpose of the warning.

I agree, but actually, what is the sense of checkpoint_warning?  I think
it was useful back when we didn't have log_checkpoints, but now that we
have detailed checkpoint logging I think it's pretty much useless noise.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Alvaro Herrera
Jim Nasby wrote:

 Yeah, this is not something that would be very easy to accomplish, because a
 buffer can get evicted and written to disk at any point. It wouldn't be too
 hard to read every unlogged table during recovery and see if there are any
 pages that were written after the last checkpoint, but that obviously won't
 be very fast.

If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


In the thread from 2011 (which this started as a reply to) the OP was
doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
Matheus de Oliveira wrote:
 On Mon, Feb 9, 2015 at 1:58 PM, bkrug bk...@usatech.com wrote:
 
  Couldn't postgres reserve a special XID that is never available for normal
  transactions but that indicates that any transaction can see it because it
  is so old? Then instead of constantly having to freeze old XIDs each time
  the XID is going to wrap, vacuum can just set it to the special XID and
  never touch it again unless something really changes.
 
 
 
 It changed in recent versions (9.3 or 9.4, I don't recall exactly which)
 and moved to tuple header, but what you described is exactly what was done,
 the xid was 2.

Actually, it's been done this way for ages -- it was introduced in 2001
(release 7.2) by these commits:

Author: Tom Lane t...@sss.pgh.pa.us
Branch: master Release: REL7_2 [2589735da] 2001-08-25 18:52:43 +

Replace implementation of pg_log as a relation accessed through the
buffer manager with 'pg_clog', a specialized access method modeled
on pg_xlog.  This simplifies startup (don't need to play games to
open pg_log; among other things, OverrideTransactionSystem goes away),
should improve performance a little, and opens the door to recycling
commit log space by removing no-longer-needed segments of the commit
log.  Actual recycling is not there yet, but I felt I should commit
this part separately since it'd still be useful if we chose not to
do transaction ID wraparound.


Author: Tom Lane t...@sss.pgh.pa.us
Branch: master Release: REL7_2 [bc7d37a52] 2001-08-26 16:56:03 +

Transaction IDs wrap around, per my proposal of 13-Aug-01.  More
documentation to come, but the code is all here.  initdb forced.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
bkrug wrote:
 The problem I'm facing is that I have many large (several GB) tables that are
 not being changed (they are several days old) but auto-vacuum keeps scanning
 and updating them every time the xid wraps around and thus my rsync back-up
 process sees that the disk files have changed and must copy them.

We have considered changing this, but it needs a concerted effort.  It's
not a simple problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
bkrug wrote:
 Matheus de Oliveira wrote
  It changed in recent versions (9.3 or 9.4, I don't recall exactly which)
  and moved to tuple header, but what you described is exactly what was
  done,
  the xid was 2.
 
 Should the relfrozenxid of pg_class then equal 2 for very old and already
 vacuumed tables? Because that is not what I am seeing.

No.  The problem is that it's not easy to change the relfrozenxid when
an INSERT/UPDATE command creates a tuple with a non-frozen XID.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Lock pileup causes server to stall

2014-11-12 Thread Alvaro Herrera
Jesper Krogh wrote:
 
  On 10/11/2014, at 22.40, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  
  Josh Berkus wrote:
  All,
  
  pg version: 9.3.5
  RHEL 6.5
  128GB/32 cores
  Configured with shared_buffers=16GB
  Java/Tomcat/JDBC application
  
  Server has an issue that whenever we get lock waits (transaction lock
  waits, usually on an FK dependancy) lasting over a minute or more than
  10 at once, *all* queries on the server slow to a crawl, taking 100X to
  400X normal execution times.
  
  Current FK checking makes you wait if the referenced tuple is modified
  on any indexed column, not just those that are actually used in
  foreign keys.  Maybe this case would be sped up if we optimized that.
 
 Even if it is an gin index that is being modified?   seems like a harsh 
 limitation to me.

Well, as I recall it's only unique indexes, so it's not *that* harsh.

Anyway, the fklocks patch was stupidly complex (and still got much stuff
wrong).  I didn't want to add more ground to objections by additionally
breaking the abstraction between heapam and the concept of columns
referenced by a foreign key constraint.  So it was discussed and
decided we'd leave that for future improvement.  Patches are welcome,
particularly if they come from the future.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Lock pileup causes server to stall

2014-11-10 Thread Alvaro Herrera
Josh Berkus wrote:
 All,
 
 pg version: 9.3.5
 RHEL 6.5
 128GB/32 cores
 Configured with shared_buffers=16GB
 Java/Tomcat/JDBC application
 
 Server has an issue that whenever we get lock waits (transaction lock
 waits, usually on an FK dependancy) lasting over a minute or more than
 10 at once, *all* queries on the server slow to a crawl, taking 100X to
 400X normal execution times.

Current FK checking makes you wait if the referenced tuple is modified
on any indexed column, not just those that are actually used in
foreign keys.  Maybe this case would be sped up if we optimized that.

 * This applies even to queries which are against other databases, so
 it's not purely a lock blocking issue.

Oh.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-31 Thread Alvaro Herrera
Matt Clarkson wrote:

 The LWLOCK_STATS below suggest that ProcArrayLock might be the main
 source of locking that's causing throughput to take a dive as the client
 count increases beyond the core count.

 Any thoughts or comments on these results are welcome!

Do these results change if you use Heikki's patch for CSN-based
snapshots?  See
http://www.postgresql.org/message-id/539ad153.9000...@vmware.com for the
patch (but note that you need to apply on top of 89cf2d52030 in the
master branch -- maybe it applies to HEAD the 9.4 branch but I didn't
try).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_repack solves alter table set tablespace lock

2014-01-27 Thread Alvaro Herrera
Ying He escribió:
 Thank you Josh. Won't double post again. Just thought reorg mailing list is 
 quite inactive.

Well, that tells you something about its maintenance state and what sort
of help you can expect if you find yourself in trouble with it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-30 Thread Alvaro Herrera
Sergey Konoplev escribió:
 On Fri, Dec 20, 2013 at 7:59 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Eh.  Why can't you just do something like
 
  WITH moved AS (
  DELETE FROM src WHERE ..
  RETURNING *
  ) INSERT INTO dst SELECT * FROM moved;
 
 Avero, I think it could be cheaper to do this like it is shown below, correct?
 
 psql dbname -c 'copy src to stdout' | \
 psql dbname -c 'copy dst from stdin; truncate src;'

Yes, if you can get rid of the old records by removing or emptying a
partition (or de-inheriting it, as suggested elsewhere in the thread),
that's better than DELETE because that way you don't create dead rows to
vacuum later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Alvaro Herrera
Dave Johansen escribió:
 On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
 scott.marl...@gmail.comwrote:

  That's pretty much it. What I did was to create the new month table
  and day tables, alter my triggers to reflect this, then move the data
  with insert into / select from query for each old day partition. Then
  once their data is moved you can just drop them. Since you changed the
  triggers first those tables are no long taking input so it's usually
  safe to drop them now.
 
 It would be nice if there was just a move command, but that seems like
 the type of model that we want and we'll probably move to that.

Eh.  Why can't you just do something like

WITH moved AS (
DELETE FROM src WHERE ..
RETURNING *
) INSERT INTO dst SELECT * FROM moved;

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Alvaro Herrera
Metin Doslu wrote:

 When we send concurrent Select queries to these tables, query performance
 doesn't scale out with the number of CPU cores. We find that complex Select
 queries scale out better than simpler ones. We also find that increasing
 the block size from 8 KB to 32 KB, or increasing shared_buffers to include
 the working set mitigates the problem to some extent.

Maybe you could help test this patch:
http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Alvaro Herrera
Claudio Freire escribió:
 On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  Note that there's no particular need to specify desc in the index
  definition.  This same index can support searches in either direction
  on the called column.
 
 Yeah, but it's faster if it's in the same direction, because the
 kernel read-ahead code detects sequential reads, whereas it doesn't
 when it goes backwards. The difference can be up to a factor of 10 for
 long index scans.

That might be true when an index is new, but as it grows, the leaf pages
are not going to be sequential anymore.  And this doesn't much apply for
an equality lookup anyway, does it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Alvaro Herrera
Tom Lane escribió:
 Jeff Janes jeff.ja...@gmail.com writes:
  On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com 
  wrote:
  If I not mistaken, may be two code paths like this here:
  (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext
  (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext
 
  Yeah, I think you are correct.
 
 mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
 It calls get_variable_range, which only looks at the pg_statistic entries.

Uh?  It's right there in line 2976 in HEAD.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Alvaro Herrera
Sergey Burladyan escribió:
 I also find this trace for other query:
 explain select * from xview.user_items_v v where ( v.item_id = 132358330 );
 
 #0  0x7ff766967620 in read () from /lib/libc.so.6
 #1  0x7ff7689cfc25 in FileRead ()
 #2  0x7ff7689ea2f6 in mdread ()
 #3  0x7ff7689cc473 in ?? ()
 #4  0x7ff7689ccf54 in ReadBufferExtended ()
 #5  0x7ff7688050ca in index_fetch_heap ()
 #6  0x7ff76880523e in index_getnext ()
 #7  0x7ff768a63306 in ?? ()
 #8  0x7ff768a67624 in ?? ()
 #9  0x7ff768a67d9c in ?? ()
 #10 0x7ff768a688fc in scalargtsel ()

It'd be useful to see what's in frames 7-9, but this might be related to
get_actual_variable_range().  I don't see anything else nearby that
would try to read portions of the table.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum fringe case?

2013-01-23 Thread Alvaro Herrera
AJ Weber escribió:

 On 1/23/2013 2:13 PM, Jeff Janes wrote:

 Scheduling a manual vacuum should be fine (but keep in mind that
 vacuum has very different default cost_delay settings than autovacuum
 does.  If the server is completely idle that shouldn't matter, but if
 it is only mostly idle, you might want to throttle the IO a bit).  But
 I certainly would not disable autovacuum without further evidence.  If
 a table only needs to be vacuumed once a day and you preemptively do
 it at 3a.m., then autovac won't bother to do it itself during the day.
   So there is no point, but much risk, in also turning autovac off.
 If I set autovacuum_max_workers = 1, will that effectively
 single-thread it so I don't have two running at once?  Maybe that'll
 mitigate disk contention a little at least?

If you have a single one, it will go three times as fast.  If you want
to make the whole thing go slower (i.e. cause less impact on your I/O
system when running), crank up autovacuum_vacuum_cost_delay.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query completed in 1s in PG 9.1 and ~ 700s in PG 9.2

2012-11-08 Thread Alvaro Herrera
Rodrigo Rosenfeld Rosas escribió:
 Em 07-11-2012 22:58, Tom Lane escreveu:
 Rodrigo Rosenfeld Rosasrr.ro...@gmail.com  writes:
 Ok, I could finally strip part of my database schema that will allow you
 to run the explain query and reproduce the issue.
 There is a simple SQL dump in plain format that you can restore both on
 9.1 and 9.2 and an example EXPLAIN query so that you can see the
 difference between both versions.
 Please keep me up to date with regards to any progress. Let me know if
 the commit above fixed this issue.
 AFAICT, HEAD and 9.2 branch tip plan this query a bit faster than 9.1
 does.
 
 Great! What is the estimate for 9.2.2 release?

Hasn't been announced, but you can grab a snapshot right now from
ftp.postgresql.org if you want.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replaying 48 WAL files takes 80 minutes

2012-10-29 Thread Alvaro Herrera
Albe Laurenz wrote:
 I am configuring streaming replication with hot standby
 with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64).
 PostgreSQL was compiled from source.
 
 It works fine, except that starting the standby took for ever:
 it took the system more than 80 minutes to replay 48 WAL files
 and connect to the primary.
 
 Can anybody think of an explanation why it takes that long?

Can you do a quick xlogdump of those files?  Maybe there is something
unusual (say particular types of GIN/GiST index updates) on the files
that take longer.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-21 Thread Alvaro Herrera
Excerpts from kuopo's message of dom nov 21 11:15:52 -0300 2010:
 In my experiment, I need about 1~3 min to finish the analyze operation
 on the big table (which depends on the value of vacuum_cost_delay). I
 am not surprised because this table is a really big one (now, it has
 over 200M records).

Okay.  You may want to consider lowering the statistics size for all the
column in that table; that would reduce analyze time, at the cost of
possibly worsening the plans for that table, depending on how irregular
the distribution is.  See ALTER TABLE / SET STATISTICS in the
documentation, and the default_statistics_target parameter in
postgresql.conf.

 However, the most of my concerns is the behavior of analyze/vacuum.
 You mentioned that the analyze-only operation cannot be optimized as
 the same way on optimizing vacuum. Does that mean the analyze
 operation on a table would unavoidably affect the vacuum proceeded on
 another one?

That's correct.  I think you can run VACUUM ANALYZE, and it would do
both things at once; AFAIK this is also optimized like VACUUM is, but I
admit I'm not 100% sure (and I can't check right now).

 If this is a normal reaction for an analyze operation,
 maybe I should try to lower vacuum_cost_delay or use more powerful
 hardware to minimize the interfered period. So, the pages for the
 small table would not increase quickly.

I think it would make sense to have as low a cost_delay as possible for
this ANALYZE.  (Note you can change it locally with a SET command; no
need to touch postgresql.conf.  So you can change it when you analyze
just this large table).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-19 Thread Alvaro Herrera
Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010:
 Hi,
 
 Thanks for your response. I've checked it again and found that the
 main cause is the execution of ANALYZE. As I have mentioned, I have
 two tables: table A is a big one (around 10M~100M records) for log
 data and table B is a small one (around 1k records) for keeping some
 current status. There are a lot of update operations and some search
 operations on the table B. For the performance issue, I would like to
 keep table B as compact as possible. According your suggestion, I try
 to invoke standard vacuum (not full) more frequently (e.g., once per
 min).
 
 However, when I analyze the table A, the autovacuum or vacuum on the
 table B cannot find any removable row version (the number of
 nonremoveable row versions and pages keeps increasing). After the
 analysis finishes, the search operations on the table B is still
 inefficient. If I call full vacuum right now, then I can have quick
 response time of the search operations on the table B again.

Hmm, I don't think we can optimize the analyze-only operation the same
way we optimize vacuum (i.e. allow vacuum to proceed while it's in
progress).  Normally analyze shouldn't take all that long anyway -- why
is it that slow?  Are you calling it in a transaction that also does
other stuff?  Are you analyzing more than one table in a single
transaction, perhaps even the whole database?

Perhaps you could speed it up by lowering vacuum_cost_delay, if it's set
to a nonzero value.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-16 Thread Alvaro Herrera
Excerpts from kuopo's message of vie nov 12 05:01:24 -0300 2010:
 Hi,
 
 I have a question about the behavior of autovacuum. When I have a big
 table A which is being processed by autovacuum, I also manually use
 (full) vacuum to clean another table B. Then I found that I always got
 something like “found 0 removable, 14283 nonremovable row”. However,
 if I stop the autovacuum functionality and use vacuum on that big
 table A manually, I can clean table B (ex. found 22615 removable, 2049
 nonremovable row).
 
 Is this correct? Why do vacuum and autovacuum have different actions?

Vacuum full does not assume that it can clean up tuples while other
transactions are running, and that includes the (non full, or lazy)
vacuum that autovacuum is running.  Autovacuum only runs lazy vacuum;
and that one is aware that other concurrent vacuums can be ignored.

Just don't use vacuum full unless strictly necessary.  It has other
drawbacks.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Alvaro Herrera
Excerpts from Greg Smith's message of jue oct 21 14:04:17 -0300 2010:

 What I would like to do is beef up the documentation with some concrete 
 examples of how to figure out if your cache and associated write path 
 are working reliably or not.  It should be possible to include does 
 this handle full page writes correctly? in that test suite.  Until we 
 have something like that, I'm concerned that bugs in filesystem or 
 controller handling may make full_page_writes unsafe even with a BBU, 
 and we'd have no way for people to tell if that's true or not.

I think if you assume that there are bugs in the filesystem which you
need to protect against, you are already hosed.  I imagine there must be
some filesystem bug that makes it safe to have full_page_writes=on, but
unsafe to have full_page_writes=off; but I'd probably discard those as a
rare minority and thus not worth worrying about.

I agree it would be worth testing though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-18 Thread Alvaro Herrera
Excerpts from Samuel Gendler's message of lun oct 18 03:13:01 -0300 2010:
 On Sat, Oct 16, 2010 at 8:29 AM, Alvaro Herrera
 alvhe...@commandprompt.comwrote:
 
  Excerpts from Samuel Gendler's message of sáb oct 16 02:35:46 -0300 2010:
 
   An issue with automatically analyzing the entire hierarchy is
   'abstract' table definitions.  I've got a set of tables for
   storing the same data at different granularities of aggregation.
   Within each granularity, I've got partitions, but because the set
   of columns is identical for each granularity, I've got an abstract
   table definition that is inherited by everything.  I don't need or
   want statistics kept on that table because I never query across
   the abstract table, only the parent table of each aggregation
   granularity
 
  Hmm, I think you'd be better served by using LIKE instead of regular
  inheritance.

 Yep.  I inherited the architecture, though, and changing it hasn't been a
 high priority.

I understand that; my point is merely that maybe we shouldn't work
through many hoops to solve this particular facet of the problem,
because it seems to be pilot error.  (If you really needed to avoid the
extra I/O that would be caused by unnecessary analyzes, you could turn
autovac off for the abstract tables).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-16 Thread Alvaro Herrera
Excerpts from Samuel Gendler's message of sáb oct 16 02:35:46 -0300 2010:

 An issue with automatically analyzing the entire hierarchy is 'abstract'
 table definitions.  I've got a set of tables for storing the same data at
 different granularities of aggregation.  Within each granularity, I've got
 partitions, but because the set of columns is identical for each
 granularity, I've got an abstract table definition that is inherited by
 everything.  I don't need or want statistics kept on that table because I
 never query across the abstract table, only the parent table of each
 aggregation granularity

Hmm, I think you'd be better served by using LIKE instead of regular
inheritance.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-15 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié jun 09 15:47:55 -0400 2010:

 In going back through emails I had marked as possibly needing another
 look before 9.0 is released, I came across this issue again.  As I
 understand it, analyze (or analyse) now collects statistics for both
 the parent individually, and for the parent and its children together.
  However, as I further understand it, autovacuum won't actually fire
 off an analyze unless there's enough activity on the parent table
 considered individually to warrant it.  So if you have an empty parent
 and a bunch of children with data in it, your stats will still stink,
 unless you analyze by hand.

So, is there something we could now do about this, while there's still
time before 9.1?

I haven't followed this issue very closely, but it seems to me that what
we want is that we want an ANALYZE in a child table to be mutated into
an analyze of its parent table, if the conditions are right; and that an
ANALYZE of a parent removes the child tables from being analyzed on the
same run.

If we analyze the parent, do we also update the children stats, or is it
just that we keep two stats for the parent, one with children and one
without, both being updated when the parent is analyzed?

If the latter's the case, maybe we should modify ANALYZE a bit more, so
that we can analyze the whole hierarchy in one go, and store the lot of
stats with a single pass (each child alone, the parent alone, the parent
plus children).  However it's not real clear how would this work with
multiple inheritance levels.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where does data in pg_stat_user_tables come from?

2010-09-13 Thread Alvaro Herrera
Excerpts from Josh Berkus's message of lun sep 13 20:53:51 -0400 2010:
 
  If you select from pg_stat_user_tables, the counters should be
  reasonably close unless your default_statistics_target is way off and
  then pg_class.reltuples would be wrong.
 
 At least in 8.3, running ANALYZE does not update pg_stat_user_tables in
 any way.  Does it in later versions?

It's been pure nonsense in this thread.  Please show an example of
what's not working.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgbench could not send data to client: Broken pipe

2010-09-09 Thread Alvaro Herrera
Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010:

 Thanks for the insight. we're currently in performance testing of the
 app. Currently, the JVM is the bottleneck, once we get past that
 i'm sure it will be the database at which point I'll have the kind
 of data you're talking about.

Hopefully you're not running the JVM stuff in the same machine.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inefficient query plan

2010-08-23 Thread Alvaro Herrera
Excerpts from Jann Röder's message of lun ago 23 00:23:38 -0400 2010:

 Hash Join  (cost=516.66..17710110.47 rows=8358225 width=16)
   Hash Cond: ((b.itemid)::bpchar = a.itemid)
   -  Seq Scan on b  (cost=0.00..15110856.68 rows=670707968 width=16)
   -  Hash  (cost=504.12..504.12 rows=1003 width=16)
 -  Index Scan using idx_issueid on a  (cost=0.00..504.12
 rows=1003 width=16)
   Index Cond: (issueid = 'A1983PW823'::bpchar)

Hmm, I'm placing bets on the bpchar weirdness.  I'd try getting rid of
that and using plain varchar for all the columns.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PARSE WAITING

2010-08-23 Thread Alvaro Herrera
Excerpts from David Kerr's message of lun ago 23 18:15:56 -0400 2010:
 Howdy all,
 
 We're doing some performance testing, and when we scaled it our app up to 
 about 250 concurrent users
 we started seeing a bunch of processes sititng in PARSE WAITING state.
 
 Can anyone give me insite on what this means? what's the parse waiting for?

It means the parse phase is waiting for a lock.  You can see exactly
what it's waiting for by looking at pg_locks WHERE NOT GRANTED.

Have you got lots of partitions, or something?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PARSE WAITING

2010-08-23 Thread Alvaro Herrera
Excerpts from David Kerr's message of lun ago 23 18:47:02 -0400 2010:

 unlink(base/pgsql_tmp/pgsql_tmp28335.12593) = 0
 unlink(base/pgsql_tmp/pgsql_tmp28335.6041) = 0
 unlink(base/pgsql_tmp/pgsql_tmp28335.3030) = 0
 unlink(base/pgsql_tmp/pgsql_tmp28335.14737) = 0
 
 which isn't the fastest operation.. just for my info, can anyone tell me what
 pgsql_tmp is, and why the engine is wacking each file individually?

These are temp files, which you can remove without concern if the server
is down.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Quesion on the use of indexes

2010-08-17 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
 Benjamin Krajmalnik k...@servoyant.com writes:
  A little background - I have various multi-column indexes whenever I
  have queries which restrict the output based on the values of the 2
  fields (for example, a client code and the date of a transaction).
 
  Is there a performance gain using this approach as opposed to using 2
  separate indexes, one on the first column and one on the second column?
 
 Maybe, maybe not ... it's going to depend on a bunch of factors, one of
 which is what your update load is like compared to the queries that read
 the indexes.  There's a bit of coverage of this in the fine manual: see
 http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
 and the next few pages.

Another important factor is how selective is each clause in isolation
compared to how selective they are together.  We have found that doing
BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
compared to a two-column index.  (I have yet to see a case where indexes
beyond two columns are useful; at this point, combined bitmap indexscans
are enough.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-11 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar ago 10 11:40:16 -0400 2010:

 I am trying to retrieve, for many sets of rows grouped on a couple of 
 fields, the value of an ungrouped field where the row has the highest 
 value in another ungrouped field.

I think this does what you want (schema is from the tenk1 table in the
regression database):

select string4 as group,
   (array_agg(stringu1 order by unique1 desc))[1] as value
from tenk1
group by 1 ;

Please let me know how it performs with your data.  The plan is rather simple:

regression=# explain analyze select string4 as group, (array_agg(stringu1 order 
by unique1 desc))[1] as value from tenk1 group by 1 ;
  QUERY PLAN
   
───
 GroupAggregate  (cost=0.00..1685.16 rows=4 width=132) (actual 
time=22.825..88.922 rows=4 loops=1)
   -  Index Scan using ts4 on tenk1  (cost=0.00..1635.11 rows=1 width=132) 
(actual time=0.135..33.188 rows=1 loops=1)
 Total runtime: 89.348 ms
(3 filas)


-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
 Peter Hussey pe...@labkey.com writes:

  2) How is work_mem used by a query execution?
 
 Well, the issue you're hitting is that the executor is dividing the
 query into batches to keep the size of the in-memory hash table below
 work_mem.  The planner should expect that and estimate the cost of
 the hash technique appropriately, but seemingly it's failing to do so.
 Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
 to be sure.

Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
this visible in the explain output?  If it's something subtle (like an
increased total cost), may I suggest that it'd be a good idea to make it
explicit somehow in the machine-readable outputs?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical query slower on 8.4 vs 8.3

2010-07-15 Thread Alvaro Herrera
Excerpts from Patrick Donlin's message of jue jul 15 11:12:53 -0400 2010:
 I'll read over that wiki entry, but for now here is the EXPLAIN ANALYZE 
 output assuming I did it correctly. I have run vacuumdb --full --analyze, it 
 actually runs as a nightly cron job. 

These plans seem identical (though the fact that the leading whitespace
was trimmed means it's untrustworthy -- please in the future send them
as text attachments instead so that your mailer doesn't interfere with
formatting).  The 8.4 plan is even a full second faster, according to
the total runtime line.

The slowness could've been caused by caching effects ...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Alvaro Herrera
Excerpts from Merlin Moncure's message of sáb jul 03 18:53:46 -0400 2010:

 What about my suggestion doesn't work for your requirements?  (btw,
 let me disagree with my peers and state pl/perl is lousy for this type
 of job, only sql/and pl/sql can interact with postgresql variables
 natively for the most part).

IIRC the other reason pl/perl sucks for this kind of thing is that it
forces a subtransaction to be created before the function call, which is
expensive.  (I might be misremembering and what actually causes a
subtransaction is a SPI call inside a PL/Perl function, which wouldn't
apply here.)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cpu bound postgresql setup.

2010-06-24 Thread Alvaro Herrera
Excerpts from Rajesh Kumar Mallah's message of jue jun 24 13:25:32 -0400 2010:

 What prompted me to post to list is that the server transitioned from
 being IO bound to CPU bound and 90% of syscalls being
 lseek(XXX, 0, SEEK_END) = YYY

It could be useful to find out what file is being seeked.  Correlate the
XXX with files in /proc/pid/fd (at least on Linux) to find out more.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-24 Thread Alvaro Herrera
Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:

 select relname, pg_relation_size(relname) from pg_class
  where pg_get_userbyid(relowner) = 'emol_warehouse_1'
  and relname not like 'pg_%'
  order by pg_relation_size(relname) desc;
 ERROR:  relation rownum_temp does not exist
 
 emol_warehouse_1= select relname from pg_class where relname = 'rownum_temp';
 relname
 --
   rownum_temp
 (1 row)

What's the full row?  I'd just add a WHERE relkind = 'r' to the above
query anyway.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] System tables screwed up? (WAS requested shared memory size overflows size_t)

2010-06-24 Thread Alvaro Herrera
Excerpts from Craig James's message of jue jun 24 19:24:44 -0400 2010:
 On 6/24/10 4:19 PM, Alvaro Herrera wrote:
  Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:
 
  select relname, pg_relation_size(relname) from pg_class
where pg_get_userbyid(relowner) = 'emol_warehouse_1'
and relname not like 'pg_%'
order by pg_relation_size(relname) desc;
  ERROR:  relation rownum_temp does not exist
 
  emol_warehouse_1=  select relname from pg_class where relname = 
  'rownum_temp';
   relname
  --
 rownum_temp
  (1 row)
 
  What's the full row?  I'd just add a WHERE relkind = 'r' to the above
  query anyway.
 
 Thanks, in fact that works.  But my concern is that these are system tables 
 and system functions and yet they seem to be confused.  I've used this query 
 dozens of times and never seen this behavior before.  It makes me really 
 nervous...

I think you're being bitten by lack of schema qualification.  Perhaps
you ought to pass pg_class.oid to pg_relation_size instead of relname.
What did you do to make pg_relation_size to work on type name?

Why is this a -performance question anyway?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow index lookup

2010-06-22 Thread Alvaro Herrera
Excerpts from Anj Adu's message of mar jun 22 17:44:39 -0400 2010:
 This query seems unreasonable slow on a well-indexed table (13 million
 rows). Separate indexes are present on guardid_id , from_num and
 targetprt columns.

Maybe you need to vacuum or reindex?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Aggressive autovacuuming ?

2010-06-21 Thread Alvaro Herrera
Excerpts from Scott Marlowe's message of dom jun 20 16:13:15 -0400 2010:
 On Sun, Jun 20, 2010 at 11:44 AM, Jesper Krogh jes...@krogh.cc wrote:
  Hi.
 
  I have been wondering if anyone has been experimenting with really
  agressive
  autovacuuming.
 
 I have been using moderately aggressive autovac, with 6 or more
 threads running with 1ms sleep, then keeping track of them to see if
 they're being too aggresive.  Basically as long as io utilization
 doesn't hit 100% it doesn't seem to have any negative or even
 noticeable effect.

Keep in mind that autovacuum scales down the cost limit the more workers
there are.  So if you have 10ms sleeps and 1 worker, it should roughly
use a similar amount of I/O than if you have 10ms sleeps and 10 workers
(each worker would sleep 10 times more frequently).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Alvaro Herrera
Excerpts from jgard...@jonathangardner.net's message of mié jun 16 02:30:30 
-0400 2010:

 NOTE: If I do one giant commit instead of lots of littler ones, I get
 much better speeds for the slower cases, but I never exceed 5,500
 which appears to be some kind of wall I can't break through.
 
 If there's anything else I should tinker with, I'm all ears.

increase wal_buffers?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-16 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun jun 14 23:57:11 -0400 2010:
 Scott Carey sc...@richrelevance.com writes:
  Great points.  There is one other option that is decent for the WAL:
  If splitting out a volume is not acceptable for the OS and WAL -- 
  absolutely split those two out into their own partitions.  It is most 
  important to make sure that WAL and data are not on the same filesystem, 
  especially if ext3 is involved.
 
 Uh, no, WAL really needs to be on its own *spindle*.  The whole point
 here is to have one disk head sitting on the WAL and not doing anything
 else except writing to that file.

However, there's another point here -- probably what Scott is on about:
on Linux (at least ext3), an fsync of any file does not limit to
flushing that file's blocks -- it flushes *ALL* blocks on *ALL* files in
the filesystem.  This is particularly problematic if you have pgsql_tmp
in the same filesystem and do lots of disk-based sorts.

So if you have it in the same spindle but on a different filesystem, at
least you'll avoid that extra fsync work, even if you have to live with
the extra seeking.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-10 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue jun 10 11:46:25 -0400 2010:

 Yes, the folks at commandprompt need to be told about this.  Loudly.
 It's a serious packaging error.

Just notified Lacey, the packager (not so loudly, though); she's working
on new packages, and apologizes for the inconvenience.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010:
 On Sun, 23 May 2010, David Jarvis wrote:
  The measurement table indexes (on date and weather station) were not being
  used because the only given date ranges (e.g., 1900 - 2009) were causing the
  planner to do a full table scan, which is correct.
 
 I wonder if you might see some benefit from CLUSTERing the tables on the 
 index.

Eh, isn't this a GIN or GiST index?  I don't think you can cluster on
those, can you?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010:
 Sorry, Alvaro.
 
 I was contemplating using a GIN or GiST index as a way of optimizing the
 query.

My fault -- I didn't read the whole thread.

 Instead, I found that re-inserting the data in order of station ID (the
 primary look-up column) and then CLUSTER'ing on the station ID, taken date,
 and category index increased the speed by an order of magnitude.

Hmm, that's nice, though I cannot but wonder whether the exclusive lock
required by CLUSTER is going to be a problem in the long run.

 I might be able to drop the station/taken/category index in favour of the
 simple station index and CLUSTER on that, instead (saving plenty of disk
 space). Either way, it's fast right now so I'm not keen to try and make it
 much faster.

Hm, keep in mind that if the station clause alone is not selective
enough, scanning it may be too expensive.  The current three column
index is probably a lot faster to search (though of course it's causing
more work to be kept up to date on insertions).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-30 Thread Alvaro Herrera
Josh Berkus escribió:

 #autovacuum_vacuum_scale_factor = 0.2
 
 This is set because in my experience, 20% bloat is about the level at
 which bloat starts affecting performance; thus, we want to vacuum at
 that level but not sooner.  This does mean that very large tables which
 never have more than 10% updates/deletes don't get vacuumed at all until
 freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
 you don't *want* to vacuum a billion-row table which has only 100,000
 updates.

Hmm, now that we have partial vacuum, perhaps we should revisit this.


 It would be worth doing a DBT2/DBT5 test run with different autovac
 settings post-8.4 so see if we should specifically change the vacuum
 threshold.

Right.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-26 Thread Alvaro Herrera
Rick wrote:

 So, in a large table, the scale_factor is the dominant term. In a
 small
 table, the threshold is the dominant term. But both are taken into
 account.

Correct.

 The default values are set for small tables; it is not being run for
 large tables.

So decrease the scale factor and leave threshold alone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-16 Thread Alvaro Herrera
Josh Berkus wrote:
 Tom,
 
 Neither database has and per-table autovacuum settings.
 
 However, since this is a production database, I had to try
 something, and set vacuum_cost_limit up to 1000.  The issue with
 vacuuming one page at a time went away, or at least I have not seen
 it repeat in the last 16 hours.

How many autovac workers are there?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-15 Thread Alvaro Herrera
Josh Berkus wrote:

 Basically, vacuuming of a table which normally takes about 20 minutes
 interactively with vacuum_cost_delay set to 20 had not completed after
 14 hours.  When I trussed it, I saw activity which indicated to me that
 autovacuum was doing a pollsys, presumably for cost_limit, every data page.
 
 Autovacuum was running with vacuum_cost_limit = 200 and
 autovacuum_vacuum_cost_delay = 20, which I believe is the default for 8.3.
 
 Truss output:
 
 pollsys(0xFD7FFFDF83E0, 0, 0xFD7FFFDF8470, 0x) = 0

So what is it polling?  Please try truss -v pollsys; is there a way in
Solaris to report what each file descriptor is pointing to?  (In linux
I'd look at /proc/pid/fd)

We don't call pollsys anywhere.  Something in Solaris must be doing it
under the hood.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-15 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  We don't call pollsys anywhere.  Something in Solaris must be doing it
  under the hood.
 
 pg_usleep calls select(), and some googling indicates that select() is
 implemented as pollsys() on recent Solaris versions.  So Josh's
 assumption that those are delay calls seems plausible.  But it shouldn't
 be sleeping after each page with normal cost_delay parameters, should it?

Certainly not ... The only explanation would be that the cost balance
gets over the limit very frequently.  So one of the params would have to
be abnormally high (vacuum_cost_page_hit, vacuum_cost_page_miss,
vacuum_cost_page_dirty).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Alvaro Herrera
Chris wrote:
 I have a lot of centos servers which are running postgres.  Postgres isn't 
 used
 that heavily on any of them, but lately, the stats collector process keeps
 causing tons of IO load.  It seems to happen only on servers with centos 5.

Does this correlate to an increase in size of the pgstat.stat file?
Maybe you could try resetting stats, so that the file goes back to an
initial size and is slowly repopulated.  I'd suggest monitoring the size
of the stats file, just in case there's something abnormal with it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Database size growing over time and leads to performance impact

2010-03-31 Thread Alvaro Herrera
Scott Carey wrote:
 
 On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
  
  Dont VACUUM FULL, its not helping you, and is being removed in newer 
  versions.
  
 
 Off topic:  How is that going to work?  CLUSTER doesn't work on tables
 without an index.  I would love to be able to CLUSTER on some column
 set that doesn't necessarily have an index.

VACUUM FULL has been rewritten in 9.0 so that it uses the CLUSTER logic,
except that it doesn't require an index.

If you want to do it in earlier versions, you can use a no-op SET TYPE
command, like so:

ALTER TABLE foo ALTER COLUMN bar SET TYPE baz;

assuming that table foo has a column bar which is already of type baz.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Alvaro Herrera
Andres Freund escribió:

 I find it way much easier to believe such issues exist on a tables in 
 constrast to indexes. The likelihood to get sequential accesses on an index 
 is 
 small enough on a big table to make it unlikely to matter much.

Vacuum walks indexes sequentially, for one.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Greg Stark escribió:
 On Tue, Mar 16, 2010 at 2:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Pierre C li...@peufeu.com writes:
  Does PG issue checkpoint writes in sorted order ?
 
  No.  IIRC, a patch for that was submitted, and rejected because no
  significant performance improvement could be demonstrated.  We don't
  have enough information about the actual on-disk layout to be very
  intelligent about this, so it's better to just issue the writes and
  let the OS sort them.
 
 Keep in mind that postgres is issuing writes to the OS buffer cache.
 It defers fsyncing the files as late as it can in the hopes that most
 of those buffers will be written out by the OS before then. That gives
 the OS a long time window in which to flush them out in whatever order
 and whatever schedule is most convenient.

Maybe it would make more sense to try to reorder the fsync calls
instead.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Maybe it would make more sense to try to reorder the fsync calls
  instead.
 
 Reorder to what, though?  You still have the problem that we don't know
 much about the physical layout on-disk.

Well, to block numbers as a first step.

However, this reminds me that sometimes we take the block-at-a-time
extension policy too seriously.  We had a customer that had a
performance problem because they were inserting lots of data to TOAST
tables, causing very frequent extensions.  I kept wondering whether an
allocation policy that allocated several new blocks at a time could be
useful (but I didn't try it).  This would also alleviate fragmentation,
thus helping the physical layout be more similar to logical block
numbers.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Tom Lane escribi�:
  Reorder to what, though?  You still have the problem that we don't know
  much about the physical layout on-disk.
 
  Well, to block numbers as a first step.
 
 fsync is a file-based operation, and we know exactly zip about the
 relative positions of different files on the disk.

Doh, right, I was thinking in the sync-file-range kind of API.


  We had a customer that had a
  performance problem because they were inserting lots of data to TOAST
  tables, causing very frequent extensions.  I kept wondering whether an
  allocation policy that allocated several new blocks at a time could be
  useful (but I didn't try it).  This would also alleviate fragmentation,
  thus helping the physical layout be more similar to logical block
  numbers.
 
 That's not going to do anything towards reducing the actual I/O volume.
 Although I suppose it might be useful if it just cuts the number of
 seeks.

Oh, they had no problems with I/O volume.  It was relation extension
lock that was heavily contended for them.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Tom Lane escribió:
  That's not going to do anything towards reducing the actual I/O volume.
  Although I suppose it might be useful if it just cuts the number of
  seeks.
 
  Oh, they had no problems with I/O volume.  It was relation extension
  lock that was heavily contended for them.
 
 Really?  I guess that serialized all the I/O ... I'll bet if we got rid
 of that locking somehow, they *would* have a problem with I/O volume.

Well, that would solve the problem as far as I'm concerned and they'd
have to start talking to their storage provider ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Internal operations when the planner makes a hash join.

2010-02-23 Thread Alvaro Herrera
negora wrote:

 According to how I understood the process, the engine would get the
 name from the student with ID 1 and would look for the name of the
 father with ID 1 in the hashed table. It'd do exactly the same with the
 student #2 and father #2. But my big doubt is about the 3rd one
 (Anthony). Would the engine know that it already had retrieved the
 father's name for the student 1 and would avoid searching for it into
 the hashed table (using some kind of internal mechanism which allows to
 re-utilize the name)? Or would it search into the hashed table again?br

The hash table is searched again.  But that's fast, because it's a hash
table.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AutoVacuum_NapTime

2010-02-22 Thread Alvaro Herrera
George Sexton wrote:

 If I'm cold starting the system, would it vacuum all 330 databases and then
 wait 720 minutes and then do them all again, or would it distribute the
 databases more or less evenly over the time period?

the latter

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Kevin Grittner wrote:
 Hannu Krosing  wrote:
  
  Can it be, that each request does at least 1 write op (update
  session or log something) ?
  
 Well, the web application connects through a login which only has
 SELECT rights; but, in discussing a previous issue we've pretty well
 established that it's not unusual for a read to force a dirty buffer
 to write to the OS.  Perhaps this is the issue here again.  Nothing
 is logged on the database server for every request.

I don't think it explains it, because dirty buffers are obviously
written to the data area, not pg_xlog.

 I wonder if it might also pay to make the background writer even more
 aggressive than we have, so that SELECT-only queries don't spend so
 much time writing pages.

That's worth trying.

 Anyway, given that these are replication
 targets, and aren't the database of origin for any data of their
 own, I guess there's no reason not to try asynchronous commit. 

Yeah; since the transactions only ever write commit records to WAL, it
wouldn't matter a bit that they are lost on crash.  And you should see
an improvement, because they wouldn't have to flush at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Kevin Grittner wrote:

  Anyway, given that these are replication
  targets, and aren't the database of origin for any data of their
  own, I guess there's no reason not to try asynchronous commit. 
 
 Yeah; since the transactions only ever write commit records to WAL, it
 wouldn't matter a bit that they are lost on crash.  And you should see
 an improvement, because they wouldn't have to flush at all.

Actually, a transaction that performed no writes doesn't get a commit
WAL record written, so it shouldn't make any difference at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Kevin Grittner wrote:
 Alvaro Herrera alvhe...@commandprompt.com wrote:

  Actually, a transaction that performed no writes doesn't get a
  commit WAL record written, so it shouldn't make any difference at
  all.
  
 Well, concurrent to the web application is the replication.  Would
 asynchronous commit of that potentially alter the pattern of writes
 such that it had less impact on the reads?

Well, certainly async commit would completely change the pattern of
writes: it would give the controller an opportunity to reorder them
according to some scheduler.  Otherwise they are strictly serialized.

 I'm thinking, again, of
 why the placement of the pg_xlog on a separate file system made such
 a dramatic difference to the read-only response time -- might it
 make less difference if the replication was using asynchronous
 commit?

Yeah, I think it would have been less notorious, but this is all
theoretical.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-11 Thread Alvaro Herrera
Kevin Grittner wrote:

 Another example of why I shouldn't trust my memory.  Per the
 hardware tech:
  
  
 OS:  /dev/sda   is RAID1  -  2  x  2.5 15k SAS disk
 pg_xlog: /dev/sdb   is RAID1  -  2  x  2.5 15k SAS disk
  
 These reside on a ServeRAID-MR10k controller with 256MB BB cache.
  
  
 data:/dev/sdc   is RAID5  -  30 x 3.5 15k SAS disk
  
 These reside on the DS3200 disk subsystem with 512MB BB cache per
 controller and redundant drive loops.

Hmm, so maybe the performance benefit is not from it being on a separate
array, but from it being RAID1 instead of RAID5?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-02-07 Thread Alvaro Herrera
Andres Freund escribió:

 I personally think the fsync on the directory should be added to the stable 
 branches - other opinions?
 If wanted I can prepare patches for that.

Yeah, it seems there are two patches here -- one is the addition of
fsync_fname() and the other is the fsync_prepare stuff.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Tory M Blue escribió:

 I looked into the numeric data type, but the docs say that it can be slow.

It is slower than values that fit in a single CPU register, sure.  Is it
slow enough that you can't use it?  That's a different question.  I'd
give it a try -- maybe it's not all that slow.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Jochen Erwied escribió:

 Maybe using 'numeric(19)' instead of bigint is an alternative. I actually
 don't know how these numbers are stored internally (some kind of BCD, or as
 base-100?), but IMHO they should be faster than strings, although not as
 fast as 'native' types.

base 1 in the current implementation

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-21 Thread Alvaro Herrera
Scott Marlowe escribió:
 On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
 stonec.regis...@sympatico.ca wrote:

  4) Is this the right PG version for our needs?
 
 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
 had, and still am having, problems with it crashing in production.
 Not often, maybe once every couple of months, but just enough that I'm
 not ready to try and use it there yet.  And I can't force the same
 failure in testing, at least not yet.

uh.  Is there a report of the crash somewhere with details, say stack
traces and such?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-21 Thread Alvaro Herrera
Scott Marlowe escribió:
 On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Scott Marlowe escribió:
  On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
  stonec.regis...@sympatico.ca wrote:
 
   4) Is this the right PG version for our needs?
 
  8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
  had, and still am having, problems with it crashing in production.
  Not often, maybe once every couple of months, but just enough that I'm
  not ready to try and use it there yet.  And I can't force the same
  failure in testing, at least not yet.
 
  uh.  Is there a report of the crash somewhere with details, say stack
  traces and such?
 
 No, the only server that does this is in production as our stats db
 and when it happens it usually gets restarted immediately.  It does
 this about once every two months.  Do the PGDG releases have debugging
 symbols and what not?  I'll see about having a stack trace ready to
 run for the next time it does this.

You mean the RPMs?  Yes, I think Devrim publishes debuginfo packages
which you need to install separately.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Lefteris escribió:
 Yes, I am reading the plan wrong! I thought that each row from the
 plan reported the total time for the operation but it actually reports
 the starting and ending point.
 
 So we all agree that the problem is on the scans:)
 
 So the next question is why changing shared memory buffers will fix
 that? i only have one session with one connection, do I have like many
 reader workers or something?

No amount of tinkering is going to change the fact that a seqscan is the
fastest way to execute these queries.  Even if you got it to be all in
memory, it would still be much slower than the other systems which, I
gather, are using columnar storage and thus are perfectly suited to this
problem (unlike Postgres).  The talk about compression ratios caught
me by surprise until I realized it was columnar stuff.  There's no way
you can get such high ratios on a regular, row-oriented storage.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Alvaro Herrera escribió:

 No amount of tinkering is going to change the fact that a seqscan is the
 fastest way to execute these queries.  Even if you got it to be all in
 memory, it would still be much slower than the other systems which, I
 gather, are using columnar storage and thus are perfectly suited to this
 problem (unlike Postgres).  The talk about compression ratios caught
 me by surprise until I realized it was columnar stuff.  There's no way
 you can get such high ratios on a regular, row-oriented storage.

FWIW if you want a fair comparison, get InnoDB numbers.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FSM - per database or per installation?

2009-12-23 Thread Alvaro Herrera
Craig James wrote:
 Heikki Linnakangas wrote:

 The parameter is gone in 8.4, BTW.
 
 Both max_fsm_relations and max_fsm_pages?

Yes, both are gone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is Diskeeper Automatic Mode safe?

2009-11-16 Thread Alvaro Herrera
Craig James escribió:

 Do it more than once.  This is a highly erratic test that can catch
 your system at a wide variety of points, some of which cause no
 problems, and some of which can be catastrophic.  If you test and it
 fails, you know you have a problem.  If you test and it doesn't fail,
 you don't know much.  It's only when you've tested a number of times
 without failure that you've gained any real knowledge.

Of course, you're only truly safe when you've tested infinite times,
which may take a bit longer than management expects.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread Alvaro Herrera
decibel escribió:

 If you want it to be seemless and fully optimal, you would
 introduce a new int256 type (or whatever the name of the type you
 are trying to represent). Adding new types to PostgreSQL is not
 that hard. This would allow queries (=, , , ) as well.
 
 If you want an example of that, we had Command Prompt create a full
 set of hash datatypes (SHA*, and I think md5). That stuff should be
 on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com
 and I'll get it added.

It's at project shatypes.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to post Performance Questions

2009-09-21 Thread Alvaro Herrera
Kevin Grittner wrote:
 Michael Glaesemann g...@seespotcode.net wrote:
  On Sep 14, 2009, at 16:55 , Josh Berkus wrote:
  
  Please read the following two documents before posting your  
  performance query here:
 
  http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
  http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
  This will help other users to troubleshoot your problems far
  more rapidly.
  
  Can something similar be added to the footer of (at least) the  
  performance list?
  
 Perhaps on this page?:
  
 http://www.postgresql.org/community/lists/

Done this part.  (It'll take some time to propagate.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-07 Thread Alvaro Herrera
Karl Denninger escribió:

 The individual boolean fields don't kill me and in terms of some of the
 application issues they're actually rather easy to code for.
 
 The problem with re-coding for them is extensibility (by those who
 install and administer the package); a mask leaves open lots of extra
 bits for site-specific use, where hard-coding booleans does not, and
 since the executable is a binary it instantly becomes a huge problem for
 everyone but me.

Did you try hiding the bitmask operations inside a function as Tom
suggested?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Alvaro Herrera
Karl Denninger escribió:
 Tom Lane wrote:

  You never showed us any EXPLAIN results,
 Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
 and EXPLAIN ANALYZE results for the original query.  Sheesh.

You did?  Where?  This is your first message in this thread:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
No EXPLAINs anywhere to be seen.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] number of rows estimation for bit-AND operation

2009-08-21 Thread Alvaro Herrera
Robert Haas escribió:

 Scott, did you check whether a toast table got created here and what
 the size of it was?

A table with only bool columns (and, say, one int8 column) would not
have a toast table.  Only varlena columns produce toast tables.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Alvaro Herrera
Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

It would be useful:

CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

Each table in the session1234 schema would not be WAL-logged, and
would be automatically dropped on crash recovery (actually the whole
schema would be).  But while the server is live it behaves like a
regular schema/table and can be seen by all backends (i.e. not temp)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Per-database warm standby?

2009-08-14 Thread Alvaro Herrera
Tom Lane wrote:
 Craig James craig_ja...@emolecules.com writes:
  8.4 has vastly improved the warm-standby features, but it looks to me like 
  this is still an installation-wide backup, not a per-database backup.  That 
  is, if you have (say) a couple hundred databases, and you only want 
  warm-backup on one of them, you can't do it (except using other solutions 
  like Slony).  Is that right?
 
 Correct, and that's always going to be true of any WAL-based solution.

Except that we could create a WAL filter to restore only relevant
stuff to particular databases ...  Would that work?  Of course, it would
have to ensure that global objects are also recovered, but we could
simply ignore commands for other databases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage of writer process

2009-08-13 Thread Alvaro Herrera
Alex wrote:
 The writer process seems to be using inordinate amounts of memory:
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
 COMMAND
 11088 postgres  13  -2 3217m 2.9g 2.9g S0 38.7   0:10.46 postgres:
 writer process
 20190 postgres  13  -2 3219m  71m  68m S0  0.9   0:52.48 postgres:
 cribq cribq [local] idle
 
 I am writing moderately large (~3k) records to my database a few times
 a second.  Even when I stop doing that, the process continues to take
 up all of that memory.
 
 Am I reading this right?  Why is it using so much memory?

shared_buffers?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [HACKERS] freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

2009-08-13 Thread Alvaro Herrera
Jeff Davis wrote:

 Why aren't we more opportunistic about freezing tuples? For instance, if
 we already have a dirty buffer in cache, we should be more aggressive
 about freezing those tuples than freezing tuples on disk.

The most widely cited reason is that you lose forensics data.  Although
they are increasingly rare, there are still situations in which the heap
tuple machinery messes up and the xmin/xmax/etc fields of the tuple are
the best/only way to find out what happened and thus fix the bug.  If
you freeze early, there's just no way to know.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 Well, it's a TO-THINK-ABOUT anyway.  I think the appropriate next step
 would not be to write code, but to do a detailed investigation of what
 would be gained or lost.  I don't remember exactly what we do with the
 flat-file contents.

Maybe what we need is not to get rid of the flat files, but to speed
them up.  If we're worried about speed in the pg_authid flatfile, and
come up with a solution to that problem, what will we do with the
pg_database flatfile when it grows too large?  We can't just get rid of
it, because autovacuum needs to access it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 Actually, I had forgotten that we were using the pg_database flatfile
 for purposes other than authentication checks.  In particular, we need
 it during backend startup to map from database name to database OID,
 without which it's impossible to locate the system catalogs for the
 target database.  It's pretty hard to see a way around that one.
 We could grovel through pg_database itself, as indeed is done to rebuild
 the flatfile during system start.  But that's certainly not going to be
 fast in cases where there are enough DBs to make the flatfile slow.

Also, IIRC flatfiles were introduced precisely to avoid having to read
the catalogs manually.

 So on third thought, Alvaro's right: the only real solution here is to
 adopt a more efficient representation of the flat files.  Maybe some
 sort of simple hashtable arrangement would work.  (Rendering them not so
 flat anymore...)

As long as there's a simple API, there should be no problem.

(Except that it would be nice to be able to build the file incrementally
...  If we have to write out a million lines each time a millionth user
is created, there will still be a bottleneck at CREATE USER time.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

 In some sense this is a bootstrap problem: what does it take to get to
 the point of being able to read pg_database and its indexes?  That is
 necessarily not dependent on the particular database we want to join.
 Maybe we could solve it by having the relcache write a global cache
 file containing only entries for the global tables, and load that before
 we have identified the database we want to join (after which, we'll load
 another cache file for the local entries).

This sounds good, because autovacuum could probably use this too.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql and syslog

2009-08-07 Thread Alvaro Herrera
Michael Nacos escribió:

 I would be very surprised if logging had a significant overhead any method
 you choose. there's probably something very wrong with your setup if this
 is the case.

Either something very wrong, or the load is extremely high.  In the
latter case perhaps it would make sense to ship syslog to a remote
machine.  Since it uses UDP sockets, it wouldn't block when overloaded
but rather lose messages (besides, it means it has low overhead).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] Postgres user authentification or LDAP authentification

2009-07-26 Thread Alvaro Herrera
Lauris Ulmanis wrote:
 Hello!
 
  
 
 I posted you a message about slowness of creation users more than 500 000
 (#4919). It seems there is no workaround of this problem because of using
 pg_auth flat file.
 
  
 
 To override this problem is it possible to use LDAP authentification metod
 to identify each user and speed up system?

No.  The users still need to exist in the PG auth system.

I'm sure this is just some missing optimization.  Feel free to work on
the code to improve performance for these cases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Alvaro Herrera
Marc Cousin escribió:

 There are other things I am thinking of : maybe it would be better to have 
 sort space on another (and not DBRD'ded) raid set ? we have a quite
 cheap setup right now for the database, and I think maybe this would help 
 scale better. I can get a filesystem in another volume group, which is not 
 used that much for now.

You know, that's the first thing it came to me when I read you're using
DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
Lauris Ulmanis wrote:
 Hello again!
 
 I did test on my local test server
 
 I created up 500 000 users in function loop very quickly - within 48
 seconds. I did again this script reaching up to 1 billion users - results
 was the same - 48 seconds. It is very quickly.
 
 But problem seems is with transaction preparation because if in database is
 1 billion users and I want to create 1 new - it will take 4 seconds! 
 
 After that I generated up to 2 billion users in this server (generation
 process took just 1.44 minutes of times - again quickly).
 
 And did 1 user creation again - now it took 9 seconds of time!
 
 What is a reason of this slowness? Is there a workaround or solution how to
 avoid it? 

My bet is on the pg_auth flat file.  I doubt we have ever tested the
behavior of that code with 1 billion users ...

Do you really need 1 billion users?  Are you planning on giving accounts
to every human being in the planet or what?  I mean, what's the point of
this test?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
toruvinn wrote:
 On Wed, 15 Jul 2009 16:02:09 +0200, Alvaro Herrera  
 alvhe...@commandprompt.com wrote:
 My bet is on the pg_auth flat file.  I doubt we have ever tested the
 behavior of that code with 1 billion users ...

 I was always wondering, though, why PostgreSQL uses this approach and not 
 its catalogs.

It does use the catalog for most things.  THe flatfile is used for the
situations where the catalogs are not yet ready to be read.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Huge difference in query performance between 8.3 and 8.4 (possibly)

2009-07-09 Thread Alvaro Herrera
Robin Houston escribió:

 We have a query that runs very slowly on our 8.3 database. (I can't
 tell you exactly how slowly, because it has never successfully run to
 completion even when we left it running overnight.) On the 8.4
 database on my laptop, it runs in about 90 seconds. Of course there
 are several differences between the two instances, but I wonder
 whether query planning improvements in 8.4 could essentially account
 for it.

Of course.  Great news.  Congratulations.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsvector_update_trigger performance?

2009-06-24 Thread Alvaro Herrera
Oleg Bartunov wrote:
 On Wed, 24 Jun 2009, Chris St Denis wrote:

 Is tsvector_update_trigger() smart enough to not bother updating a 
 tsvector if the text in that column has not changed?

 no, you should do check yourself. There are several examples in mailing lists.

Or you could try using the supress_redundant_updates_trigger() function
that has been included in 8.4 (should be easy to backport)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Alvaro Herrera
Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):
 
 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

What PG version is this?


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >