Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-14 Thread Mischa Sandberg

Tom Lane wrote:

Mischa Sandberg [EMAIL PROTECTED] writes:



Tom Lane wrote:

Does Solaris have any call that allows locking a shmem segment in RAM?

Yes, mlock(). But want to understand what's going on before patching.


Sure, but testing it with mlock() might help you understand what's going
on, by eliminating one variable: we don't really know if the shmem is
getting swapped, or something else.



For a dedicated DB server machine, Solaris has a feature:
create intimate shared memory with shmat(..., SHM_SHARE_MMU).
All backends share the same TLB entries (!). 


We use that already.  (Hmm, might be interesting for you to turn it
*off* and see if anything changes.  See src/backend/port/sysv_shmem.c.)


Gah. Always must remember to RTFSource.
And reproduce the problem on a machine I control :-)

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-14 Thread Josh Berkus
Folks,

First off, you'll be glad to know that I've persuaded two of the Sun 
performance engineers to join this list soon.   So you should be able to 
get more difinitive answers to these questions.

Second, 7.4 still did linear scanning of shared_buffers as part of LRU and 
for other activities.   I don't know how that would cause swapping, but it 
certainly could cause dramatic slowdowns (like 2-5x) if you overallocated 
shared_buffers.   

Possibly this is also triggering a bug in Solaris 2.6.   2.6 is pretty 
darned old (1997); maybe you should upgrade?   We're testing with s_b set 
to 300,000 on Solaris 10 (Niagara) so this is obviously not a current 
Solaris issue.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg

Jim C. Nasby wrote:
...

Actually, in 8.1.x I've seen some big wins from greatly increasing the
amount of shared_buffers, even as high as 50% of memory, thanks to the
changes made to the buffer management code. ...


Anyone else run into a gotcha that one of our customers ran into?
PG 7.4.8 running on Solaris 2.6, USparc w 4GB RAM.
Usually about 50 active backends.
(No reason to believe this wouldn't apply to 8.x).

Initially shared_buffers were set to 1000 (8MB).
Then, we moved all apps but the database server off the box.

Raised shared_buffers to 2000 (16MB).
Modest improvement in some frequent repeated queries.

Raised shared_buffers to 16000 (128MB).
DB server dropped to a CRAWL.

vmstat showed that it was swapping like crazy.
Dropped shared_buffers back down again. 
Swapping stopped.


Stared at ps u a lot, and realized that the shm seg appeared to
be counted as part of the resident set (RSS).
Theory was that the kernel was reading the numbers the same way,
and swapping out resident sets, since they obviously wouldn't
all fit in RAM :-)

Anyone from Sun reading this list, willing to offer an opinion?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Tom Lane
Mischa Sandberg [EMAIL PROTECTED] writes:
 vmstat showed that it was swapping like crazy.
 Dropped shared_buffers back down again. 
 Swapping stopped.

Does Solaris have any call that allows locking a shmem segment in RAM?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Michael Fuhr
On Tue, Jun 13, 2006 at 06:22:07PM -0400, Tom Lane wrote:
 Mischa Sandberg [EMAIL PROTECTED] writes:
  vmstat showed that it was swapping like crazy.
  Dropped shared_buffers back down again. 
  Swapping stopped.
 
 Does Solaris have any call that allows locking a shmem segment in RAM?

The Solaris 9 shmctl manpage mentions this token:

  SHM_LOCK
Lock the shared memory segment specified by shmid in
memory. This command can be executed only by a process
that has an effective user ID equal to super-user.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 03:21:34PM -0700, Mischa Sandberg wrote:
 Jim C. Nasby wrote:
 ...
 Actually, in 8.1.x I've seen some big wins from greatly increasing the
 amount of shared_buffers, even as high as 50% of memory, thanks to the
 changes made to the buffer management code. ...
 
 Anyone else run into a gotcha that one of our customers ran into?
 PG 7.4.8 running on Solaris 2.6, USparc w 4GB RAM.
 Usually about 50 active backends.
 (No reason to believe this wouldn't apply to 8.x).
 
 Initially shared_buffers were set to 1000 (8MB).
 Then, we moved all apps but the database server off the box.
 
 Raised shared_buffers to 2000 (16MB).
 Modest improvement in some frequent repeated queries.
 
 Raised shared_buffers to 16000 (128MB).
 DB server dropped to a CRAWL.
 
 vmstat showed that it was swapping like crazy.
 Dropped shared_buffers back down again. 
 Swapping stopped.

What's sort_mem set to? I suspect you simply ran the machine out of
memory.

Also, Solaris by default will only use a portion of memory for
filesystem caching, which will kill PostgreSQL performance.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Joshua D. Drake

Initially shared_buffers were set to 1000 (8MB).
Then, we moved all apps but the database server off the box.

Raised shared_buffers to 2000 (16MB).
Modest improvement in some frequent repeated queries.

Raised shared_buffers to 16000 (128MB).
DB server dropped to a CRAWL.


Versions below 8.1 normally don't do well with high shared_buffers. 8.1 
would do much better. If you dropped that to more like 6k you would 
probably continue to see increase over 2k.


Sincerely,

Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg

Tom Lane wrote:

Mischa Sandberg [EMAIL PROTECTED] writes:

vmstat showed that it was swapping like crazy.
Dropped shared_buffers back down again. 
Swapping stopped.


Does Solaris have any call that allows locking a shmem segment in RAM?


Yes, mlock(). But want to understand what's going on before patching.
No reason to believe that the multiply-attached shm seg was being swapped out
(which is frankly insane). Swapping out (and in) just the true resident set of
every backend would be enough to explain the vmstat io we saw.

http://www.carumba.com/talk/random/swol-09-insidesolaris.html

For a dedicated DB server machine, Solaris has a feature:
create intimate shared memory with shmat(..., SHM_SHARE_MMU).
All backends share the same TLB entries (!). 


Context switch rates on our in-house solaris boxes running PG
have been insane (4000/sec). Reloading the TLB map on every process
context switch might be one reason Solaris runs our db apps at less
than half the speed of our perftesters' Xeon beige-boxes.

That's guesswork. Sun is making PG part of their distro ... 
perhaps they've some knowledgeable input.


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg

Jim C. Nasby wrote:

On Tue, Jun 13, 2006 at 03:21:34PM -0700, Mischa Sandberg wrote:

Raised shared_buffers to 16000 (128MB).
DB server dropped to a CRAWL.

vmstat showed that it was swapping like crazy.
Dropped shared_buffers back down again. 
Swapping stopped.


What's sort_mem set to? I suspect you simply ran the machine out of
memory.


8192 (8MB). No issue when shared_buffers was 2000; same apps always.


Also, Solaris by default will only use a portion of memory for
filesystem caching, which will kill PostgreSQL performance.


Yep, tested /etc/system segmap_percent at 20,40,60. 
No significant difference between 20 and 60.

Default is 10%? 12%? Can't recall.

Was not changed from 20 during the shared_buffer test.
--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 04:20:34PM -0700, Mischa Sandberg wrote:
 Jim C. Nasby wrote:
 On Tue, Jun 13, 2006 at 03:21:34PM -0700, Mischa Sandberg wrote:
 Raised shared_buffers to 16000 (128MB).
 DB server dropped to a CRAWL.
 
 vmstat showed that it was swapping like crazy.
 Dropped shared_buffers back down again. 
 Swapping stopped.
 
 What's sort_mem set to? I suspect you simply ran the machine out of
 memory.
 
 8192 (8MB). No issue when shared_buffers was 2000; same apps always.
 
So if all 50 backends were running a sort, you'd use 400MB. The box has
4G, right?

 Also, Solaris by default will only use a portion of memory for
 filesystem caching, which will kill PostgreSQL performance.
 
 Yep, tested /etc/system segmap_percent at 20,40,60. 
 No significant difference between 20 and 60.

That's pretty disturbing... how large is your database?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg

Jim C. Nasby wrote:

On Tue, Jun 13, 2006 at 04:20:34PM -0700, Mischa Sandberg wrote:

Jim C. Nasby wrote:

What's sort_mem set to? I suspect you simply ran the machine out of
memory.

8192 (8MB). No issue when shared_buffers was 2000; same apps always.
 
So if all 50 backends were running a sort, you'd use 400MB. The box has

4G, right?


Umm ... yes. if. 35-40 of them are doing pure INSERTS. 
Not following your train.


Yep, tested /etc/system segmap_percent at 20,40,60. 
No significant difference between 20 and 60.

That's pretty disturbing... how large is your database?


~10GB. Good locality. Where heading?

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mark Kirkwood

Mischa Sandberg wrote:

Jim C. Nasby wrote:
...

Actually, in 8.1.x I've seen some big wins from greatly increasing the
amount of shared_buffers, even as high as 50% of memory, thanks to the
changes made to the buffer management code. ...


Anyone else run into a gotcha that one of our customers ran into?
PG 7.4.8 running on Solaris 2.6, USparc w 4GB RAM.
Usually about 50 active backends.
(No reason to believe this wouldn't apply to 8.x).

Initially shared_buffers were set to 1000 (8MB).
Then, we moved all apps but the database server off the box.

Raised shared_buffers to 2000 (16MB).
Modest improvement in some frequent repeated queries.

Raised shared_buffers to 16000 (128MB).
DB server dropped to a CRAWL.

vmstat showed that it was swapping like crazy.
Dropped shared_buffers back down again. Swapping stopped.

Stared at ps u a lot, and realized that the shm seg appeared to
be counted as part of the resident set (RSS).
Theory was that the kernel was reading the numbers the same way,
and swapping out resident sets, since they obviously wouldn't
all fit in RAM :-)

Anyone from Sun reading this list, willing to offer an opinion?



A while ago I ran 7.4.? on a Solaris 2.8 box (E280 or E220 can't recall) 
with 2G of ram - 40 users or so with shared_buffers = approx 12000 - 
with no swapping I recall (in fact I pretty sure there was free memory!).


I suspect something else is your culprit - what is work_mem (or 
sort_mem) set to? I'm thinking that you have this high and didn't have 
much memory headroom to begin with, so that upping shared_buffers from 
16MB - 128MB tipped things over the edge!


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Tom Lane
Mischa Sandberg [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Does Solaris have any call that allows locking a shmem segment in RAM?

 Yes, mlock(). But want to understand what's going on before patching.

Sure, but testing it with mlock() might help you understand what's going
on, by eliminating one variable: we don't really know if the shmem is
getting swapped, or something else.

 For a dedicated DB server machine, Solaris has a feature:
 create intimate shared memory with shmat(..., SHM_SHARE_MMU).
 All backends share the same TLB entries (!). 

We use that already.  (Hmm, might be interesting for you to turn it
*off* and see if anything changes.  See src/backend/port/sysv_shmem.c.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 05:01:34PM -0700, Mischa Sandberg wrote:
 Jim C. Nasby wrote:
 On Tue, Jun 13, 2006 at 04:20:34PM -0700, Mischa Sandberg wrote:
 Jim C. Nasby wrote:
 What's sort_mem set to? I suspect you simply ran the machine out of
 memory.
 8192 (8MB). No issue when shared_buffers was 2000; same apps always.
  
 So if all 50 backends were running a sort, you'd use 400MB. The box has
 4G, right?
 
 Umm ... yes. if. 35-40 of them are doing pure INSERTS. 
 Not following your train.

If sort_mem is set too high and a bunch of sorts fire off at once,
you'll run the box out of memory and it'll start swapping. Won't really
matter much whether it's swapping shared buffers or not; performance
will just completely tank.

Actually, I think that Solaris can be pretty aggressive about swapping
stuff out to try and cache more data. Perhaps that's what's happening?

 Yep, tested /etc/system segmap_percent at 20,40,60. 
 No significant difference between 20 and 60.
 That's pretty disturbing... how large is your database?
 
 ~10GB. Good locality. Where heading?

I guess I should have asked what your working set size was... unless
that's very small, it doesn't make sense that changing the cache size
that much wouldn't help things.

BTW, on some versions of Solaris, segmap_percent doesn't actually work;
you have to change something else that's measured in bytes.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org