Re: [PERFORM] Scalability in postgres

2009-05-31 Thread Scott Marlowe
On Sat, May 30, 2009 at 9:41 PM, Greg Smith  wrote:
> On Fri, 29 May 2009, Scott Carey wrote:
>
>> There are operations/IT people won't touch Ubuntu etc with a ten foot pole
>> yet for production.
>
> The only thing I was suggesting is that because 2.6.28 is the latest Ubuntu
> kernel, that means it's gotten a lot more exposure and testing than, say,
> other options like 2.6.27 or 2.6.29.
>
> I build a fair number of RedHat/CentOS systems with an upgraded kernel based
> on mature releases from kernel.org, and a config as close as possible to the
> original RedHat one, with the generic kernel defaults for all the new
> settings.  I keep liking that combination better than just using an Ubuntu
> version with a newer kernel.  I've seen a couple of odd kernel setting
> choices in Ubuntu releases before that motivate that choice; the scheduler
> trainwreck described at
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/188226 comes to mind.

8.04  was a frakking train wreck in many ways.  It wasn't until 8.04.2
came out that it was even close to useable as a server OS, and even
then, not for databases yet.  It's still got broken bits and pieces
marked "fixed in 8.10"...  Uh, hello, it's your LTS release, fixes
should be made there as a priority.  There's a reason my dbs run on
Centos / RHEL.  It's not the fastest release ever, but it doesn't go
down on me and it just works.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] degenerate performance on one server of 3

2009-05-31 Thread Erik Aronesty
I have 3 servers, all with identical databases, and each performing
very differently for the same queries.

www3 is my fastest, www2 is the worst, and www1 is in the middle...
even though www2 has more ram, faster CPU and faster drives (by far),
and is running a newer version of postgres. I have been reluctant to
post because I know it's something that I'm doing wrong in the
settings or something that I should be able to figure out.

Last year at this time www2 was the fastest... in fact, I bought the
machine to be my "primary" server, and it performed as such with
the striped volumes and higher RAM, it outpaced the other 2 in every
query.  It has, over time, "degenerated" to being so slow it
frequently has to be taken out of the load-balance set.   The only
major changes to the sever have been "yum update (or ports upgrade)"
to the newer releases ... over time.

The query planner "knows" about the problem, but I'm not sure *why*
there's a difference... since the tables all have the same data ...
loaded from a dump nightly.

The planner shows a different number of "rows" even though the items
table has 22680 rows in all 3 instances.  I ran a vacuum analyze just
before these runs hoping to get them all into a similar "clean" state.

The difference is outlined below, with the query planner output from a
table-scan query that greatly exaggerates the differences in
performance, along with some info about the configuration and platform
differences.

QUERY: explain select count(*) from items where name like '%a%'

www3: psql (PostgreSQL) 8.1.14
www3: Linux www3 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux

www3: Mem:   1996288k total,  1537576k used,   458712k free,23124k buffers
www3: Swap:0k total,0k used,0k free,  1383208k cached

www3: shared_buffers = 1# min 16 or
max_connections*2, 8KB each

www3: QUERY PLAN
www3: --
www3:  Aggregate  (cost=3910.07..3910.08 rows=1 width=0)
www3:->  Seq Scan on items  (cost=0.00..3853.39 rows=22671 width=0)
www3:  Filter: (name ~~ '%a%'::text)
www3: (3 rows)
www3:

www1: psql (PostgreSQL) 8.1.17
www1: Linux www1 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008
x86_64 x86_64 x86_64 GNU/Linux

www1: Mem:   1019376k total,   973064k used,46312k free,27084k buffers
www1: Swap:  1959888k total,17656k used,  1942232k free,   769776k cached

www1: shared_buffers = 6000  # min 16 or
max_connections*2, 8KB each

www1: QUERY PLAN
www1: --
www1:  Aggregate  (cost=5206.20..5206.21 rows=1 width=0)
www1:->  Seq Scan on items  (cost=0.00..5149.50 rows=22680 width=0)
www1:  Filter: (name ~~ '%a%'::text)
www1: (3 rows)
www1:

www2: psql (PostgreSQL) 8.2.13
www2: FreeBSD www2 6.3-RELEASE-p7 FreeBSD 6.3-RELEASE-p7 #0: Sun Dec
21 03:24:04 UTC 2008
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/SMP  amd64

www2: Mem: 57M Active, 1078M Inact, 284M Wired, 88M Cache, 213M Buf, 10M Free
www2: Swap: 4065M Total, 144K Used, 4065M Free

www2: shared_buffers = 360MB   # min 128kB or
max_connections*16kB

www2: QUERY PLAN
www2: --
www2:  Aggregate  (cost=17659.45..17659.46 rows=1 width=0)
www2:->  Seq Scan on items  (cost=0.00..17652.24 rows=2886 width=0)
www2:  Filter: (name ~~ '%a%'::text)
www2: (3 rows)
www2:

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-05-31 Thread Tom Lane
Erik Aronesty  writes:
> I have 3 servers, all with identical databases, and each performing
> very differently for the same queries.

I'm betting on varying degrees of table bloat.  Have you tried vacuum
full, cluster, etc?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-05-31 Thread Fabrix
2009/5/29 Scott Carey 

>
> On 5/28/09 6:54 PM, "Greg Smith"  wrote:
>
> > 2) You have very new hardware and a very old kernel.  Once you've done
> the
> > above, if you're still not happy with performance, at that point you
> > should consider using a newer one.  It's fairly simple to build a Linux
> > kernel using the same basic kernel parameters as the stock RedHat one.
> > 2.6.28 is six months old now, is up to 2.6.28.10, and has gotten a lot
> > more testing than most kernels due to it being the Ubuntu 9.04 default.
> > I'd suggest you try out that version.
>
>
> Comparing RedHat's 2.6.18, heavily patched, fix backported kernel to the
> original 2.6.18 is really hard.  Yes, much of it is old, but a lot of stuff
> has been backported.
> I have no idea if things related to this case have been backported.
>  Virtual
> memory management is complex and only bug fixes would likely go in however.
> But RedHat 5.3 for example put all the new features for Intel's latest
> processor in the release (which may not even be in 2.6.28!).
>
> There are operations/IT people won't touch Ubuntu etc with a ten foot pole
> yet for production.  That may be irrational, but such paranoia exists.  The
> latest postgres release is generally a hell of a lot safer than the latest
> linux kernel, and people get paranoid about their DB.
>
> If you told someone who has to wake up at 3AM by page if the system has an
> error that "oh, we patched our own kenrel build into the RedHat OS" they
> might not be ok with that.
>
> Its a good test to see if this problem is fixed in the kernel. I've seen
> CentOS 5.2 go completely nuts with system CPU time and context switches
> with
> kswapd many times before.  I haven't put the system under the same stress
> with 5.3 yet however.
>

One of the server is: Intel Xeon X7350 2.93GHz, RH 5.3 and kernel
2.6.18-128.el5.
and the perfonmace is bad too, so i don't  think the probles is the kernel

The two servers that I tested (HP-785 Opteron and IBM x3950 M2 Xeon) have
NUMA architecture. and I thought the problem was caused by NUMA.

http://archives.postgresql.org/pgsql-admin/2008-11/msg00157.php

I'm trying another server, an HP blade bl 680 with Xeon E7450 (4 CPU x 6
cores= 24 cores) without NUMA architecture, but the CPUs are also going up.

procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa st
 1  0  0 46949972 116908 1703296400153122  1  0
98  0  0
 2  0  0 46945880 116916 170330680072   140 2059 3140  1  1
97  0  0
329  0  0 46953260 116932 170332080024   612 1435 194237 44
3 53  0  0
546  0  0 46952912 116940 1703320800 4   136 1090 327047 96
4  0  0  0
562  0  0 46951052 116940 1703322400 0 0 1095 323034 95
4  0  0  0
514  0  0 46949200 116952 1703321200 0   224 1088 330178 96
3  1  0  0
234  0  0 46948456 116952 1703321200 0 0 1106 315359 91
5  4  0  0
 4  0  0 46958376 116968 170332720016   396 1379 223499 47
3 49  0  0
 1  1  0 46941644 116976 1703322400   152  1140 2662 5540  4  2
93  1  0
 1  0  0 46943196 116984 1703324800   104   604 2307 3992  4  2
94  0  0
 1  1  0 46931544 116996 1703356800   104  4304 2318 3585  1  1
97  1  0
 0  0  0 46943572 117004 170335680032   204 2007 2986  1  1
98  0  0


 Now i don't  think the probles is NUMA.


The developer team will fix de aplication  and then i will test again.

I believe that when the application closes the connection the problem could
be solved, and then 16 cores in a server does the work instead of a 32 or
24.


Regards...

--Fabrix


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Brian Cox

Tom Lane [...@sss.pgh.pa.us] wrote:


Are those processes actually doing anything, or just waiting?  strace
or local equivalent would be the most conclusive check.
These must not have been hung, because they finally completed (after 
10-15 hrs - some time between 11pm and 8am). Question is why does it 
take so long to do this on such a relatively small table?



This query isn't very helpful because it fails to show locks that are
not directly associated with tables.

How can that (locks not directly associated...) be determined?


Thanks,
Brian

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Tom Lane
Brian Cox  writes:
> Tom Lane [...@sss.pgh.pa.us] wrote:
>> Are those processes actually doing anything, or just waiting?  strace
>> or local equivalent would be the most conclusive check.

> These must not have been hung, because they finally completed (after 
> 10-15 hrs - some time between 11pm and 8am). Question is why does it 
> take so long to do this on such a relatively small table?

They might have been blocked behind some other process that was sitting
in an open transaction for some reason.  The other likely cause is badly
chosen autovacuum delay, but I think that was already covered.

>> This query isn't very helpful because it fails to show locks that are
>> not directly associated with tables.

> How can that (locks not directly associated...) be determined?

Don't assume every row in pg_locks has a join partner in pg_class.
You could use an outer join ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Brian Cox

Tom Lane [...@sss.pgh.pa.us] wrote:

They might have been blocked behind some other process that was sitting
in an open transaction for some reason.  The other likely cause is badly
chosen autovacuum delay, but I think that was already covered.
Well, after I noticed this running for a while, I shutdown the postgres 
port and restarted postgres. The autovacuum of these tables kicked in 
promptly when postgres was back up. I then let them run. So, I don't 
think that surmise #1 is likely.
As for #2, I'm using the default. These tables get updated once a day 
with each row (potentially) being updated 1-24 times over many minutes 
to a handful of hours. Dp you think it would be better to manually 
vacuum these tables? If so, would it be best to disable autovacuum of 
them? And while I'm at it, if you disable autovacuum of the master table 
will that disable it for the actual partitions?


 > Don't assume every row in pg_locks has a join partner in pg_class.

You could use an outer join ...
Yes, of course. It never occurred that there could be db locks not 
associated with tables.


Thanks,
Brian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Tom Lane
Brian Cox  writes:
> Dp you think it would be better to manually 
> vacuum these tables? If so, would it be best to disable autovacuum of 
> them? And while I'm at it, if you disable autovacuum of the master table 
> will that disable it for the actual partitions?

No, no, and no.  What would be best is to find out what actually
happened.  The evidence is gone now, but if you see it again please
take a closer look.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Brian Cox

Tom Lane [...@sss.pgh.pa.us] wrote:

No, no, and no.  What would be best is to find out what actually
happened.  The evidence is gone now, but if you see it again please
take a closer look.
OK. You mentioned strace. It's got a lot of options; any in particular 
that would be useful if this happens again?


Brian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Tom Lane
Brian Cox  writes:
> OK. You mentioned strace. It's got a lot of options; any in particular 
> that would be useful if this happens again?

I'd just do "strace -p processID" and watch it for a little while.
If it's not hung, you'll see the process issuing kernel calls at
some rate or other.

If it is hung, you'll most likely see something like

semop(...) 

and it just sits there.  Also, if you see nothing but a series of
select()s with varying timeouts, that would suggest a stuck spinlock
(although I doubt that was happening, as it would eventually timeout
and report a failure).

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum hung?

2009-05-31 Thread Tom Lane
Brian Cox  writes:
> OK. You mentioned strace. It's got a lot of options; any in particular 
> that would be useful if this happens again?

Oh, and don't forget the more-complete pg_locks state.  We'll want all
the columns of pg_locks, not just the ones you showed before.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-05-31 Thread Craig Ringer
Tom Lane wrote:
> Erik Aronesty  writes:
>> I have 3 servers, all with identical databases, and each performing
>> very differently for the same queries.
> 
> I'm betting on varying degrees of table bloat.  Have you tried vacuum
> full, cluster, etc?

Or, if you have been using VACUUM FULL, try REINDEXing the tables,
because it could easily be index bloat. Clustering the table will take
care of index bloat as well as table bloat.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-05-31 Thread Tom Lane
Craig Ringer  writes:
> Tom Lane wrote:
>> I'm betting on varying degrees of table bloat.  Have you tried vacuum
>> full, cluster, etc?

> Or, if you have been using VACUUM FULL, try REINDEXing the tables,
> because it could easily be index bloat. Clustering the table will take
> care of index bloat as well as table bloat.

Index bloat wouldn't explain the slow-seqscan behavior the OP was
complaining of.  Still, you're right that if the tables are bloated
then their indexes probably are too ... and that VACUUM FULL alone
will not fix that.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-05-31 Thread Erik Aronesty
it was all vacuum full...thanks

the other 2 servers truncate and reload that table from time to time
... IE: they are always vacuumed

as the "master" ... that server never does it... hence the bloat

but why wasn't autovac enough to reclaim at least *most* of the space?
  that table *does* get updated every day... but rows are not
overwritten, just edited.   it seems that most of the pages should be
"reused" via autovac 



On Sun, May 31, 2009 at 11:40 PM, Tom Lane  wrote:
> Craig Ringer  writes:
>> Tom Lane wrote:
>>> I'm betting on varying degrees of table bloat.  Have you tried vacuum
>>> full, cluster, etc?
>
>> Or, if you have been using VACUUM FULL, try REINDEXing the tables,
>> because it could easily be index bloat. Clustering the table will take
>> care of index bloat as well as table bloat.
>
> Index bloat wouldn't explain the slow-seqscan behavior the OP was
> complaining of.  Still, you're right that if the tables are bloated
> then their indexes probably are too ... and that VACUUM FULL alone
> will not fix that.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Vacuuming technique doubt

2009-05-31 Thread S Arvind
 Having a doubt, we want to vacuum and reindex some 50 most used tables
daily on specific time. Is it best to have a function in postgres and call
it in cron or is there any other good way to do the two process for
specified tables at specified time?
-Arvind S

*
**"Many of lifes failure are people who did not realize how close they were
to success when they gave up."
-Thomas Edison*