Are you running Solaris 10u3+? In that case this problem may be due
to

6583268 tmpfs tries too hard to reserve memory 
<http://monaco.sfbay/detail.jsp?cr=6583268>.

This is currently fixed in Nevada. I guess it will be back ported to
S10 patch.

-Prakash.

*
*
adrian cockcroft wrote:
> How fast do disks turn? You get one page per revolution. Adding more 
> swap disks would only help if there was more than one thread trying to 
> read the data. Ultra 1 had a nice fast 7200rpm SCSI disk...
>
> Adrian
>
> On 8/15/07, *Peter C. Norton* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     What is confusing me in this case is that the speeds seem to be
>     unreasonably slow. I would expect, for instance, that the speed of
>     swap access would scale with the speed of the disk, cpu, etc.
>
>     In this case, the speed of swap looks like it has stayed about what it
>     was on an ultra1 while everything else has shot ahead, leading me to
>     feel like there's a specific limit that may be there that could be
>     easiliy removed or tuned to not make this as visible.
>
>     Thanks,
>
>     -Peter
>
>     On Wed, Aug 15, 2007 at 03:49:33PM -0700, adrian cockcroft wrote:
>     > I've seen this before and its expected behavior, this is my
>     explanation:
>     >
>     > Pages are written using large sequential writes to swap in
>     physical memory
>     > scanner order, so they are guaranteed to be randomly jumbled on
>     disk. This
>     > means that swap is as fast as possible for writes and as slow as
>     possible
>     > for reads, which will be random seeks one page at a time. Its
>     been this way
>     > forever. Anything that swaps/pages out will be horribly slow on
>     the way back
>     > in.
>     > Add enough RAM to never swap, or possibly mount a real disk or a
>     solid state
>     > disk for /tmp
>     >
>     > Adrian
>     >
>     > On 8/15/07, Peter C. Norton <[EMAIL PROTECTED]>
>     wrote:
>     > >
>     > > On Wed, Aug 15, 2007 at 04:29:54PM -0400, Jim Mauro wrote:
>     > > >
>     > > > What would be interesting here is the paging statistics
>     during your test
>     > > > case.
>     > > > What does "vmstat 1", and "vmstat -p 1" look like while your
>     generating
>     > > > this behavior?
>     > > >
>     > > > Is it really case the reading/writing from swap is slow, or
>     simply that
>     > > > the system
>     > > > on the whole is slow because it's dealing with a sustained
>     memory
>     > > deficit?
>     > >
>     > > It tends to look something like this:
>     > >
>     > > $ vmstat 1
>     > >
>     kthr      memory            page            disk          faults      cpu
>     > > r b w   swap  free  re  mf pi po fr de sr cd cd m1 m1   in  
>     sy   cs us sy
>     > > id
>     > > 0 0 0 19625708 3285120 1 4  1  1  1  0  6  1  1 11
>     11  455  260  247  4  0
>     > > 96
>     > > 0 1 70 16001276 645628 2 27 3428 0 0 0  0 442 447 0 0
>     3012  516 1982
>     > > 97  3  0
>     > > 0 1 70 16001276 642208 0 0 3489 0 0  0  0 437 432 0 0
>     3074  381 2002
>     > > 97  3  0
>     > > 0 1 70 16001276 638964 0 0 3343 0 0  0  0 417 417 0 0
>     2997  350 1914
>     > > 98  2  0
>     > > 0 1 70 16001276 635504 0 0 3442 0 0  0  0 430 434 0 0
>     3067  536 2016
>     > > 97  3  0
>     > > 0 1 70 16001276 632076 0 0 3434 0 0  0  0 429 425 0 0
>     3164  885 2125
>     > > 97  3  0
>     > > 0 1 70 16001276 628548 0 0 3549 0 0  0  0 445 445 0 0
>     3185  582 2105
>     > > 97  3  0
>     > > 0 1 70 16001276 625104 0 0 3459 0 0  0  0 463 469 0 0
>     3376  594 2100
>     > > 97  3  0
>     > >
>     > > $ vmstat -p 1
>     > >      memory
>     > > page          executable      anonymous      filesystem
>     > >
>     > >
>     swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  
> fpf
>     > > 19625616 3285052 1 4   1   0
>     > > 6    0    0    0    0    0    0    1    0    1
>     > > 16001244 440392 21 31  0   0   0    0    0    0    0    0    0
>     > > 2911    0    0
>     > > 16001244 437120 21 0   0   0   0    0    0    0    0    0    0
>     > > 3188    0    0
>     > > 16001244 433592 14 0   0   0   0    0    0    0    0    0    0
>     > > 3588    0    0
>     > > 16001244 429732 28 0   0   0   0    0    0    0    0    0    0
>     > > 3712    0    0
>     > > 16001244 426036 18 0   0   0   0    0    0    0    0    0    0
>     > > 3679    0    0
>     > > 16001244 422448 2  0   0   0   0    0    0    0    0    0    0
>     > > 3468    0    0
>     > > 16001244 418980 5  0   0   0   0    0    0    0    0    0    0
>     > > 3435    0    0
>     > > 16001244 416012 8  0   0   0   0    0    0    0    0    0    0
>     > > 2855    0    0
>     > > 16001244 412648 8  0   0   0   0    0    0    0    0    0    0
>     > > 3256    0    0
>     > > 16001244 409292 31 0   0   0   0    0    0    0    0    0    0
>     > > 3426    0    0
>     > > 16001244 405760 10 0   0   0   0    0    0    0    0    0    0
>     > > 3602    0    0
>     > >
>     > >
>     > > > Also, I'd like to understand better what you're looking to
>     optimize for.
>     > > > In general, "tuning" for swap is a pointless exercise (and
>     it's not my
>     > > > contention
>     > > > that that is what you're looking to do - I'm not actually
>     sure), because
>     > > the
>     > > > IO performance of the swap device is really a second order
>     effect of
>     > > > having a memory working set size larger than physical RAM,
>     which means
>     > > > the kernel spends a lot of time doing memory management things.
>     > >
>     > > I think we're trying to optimize for usage of swap having as
>     little
>     > > impact as possible.  With multiple large java processes
>     needing to run
>     > > in as little time as possible, and with the business demands that
>     > > exist making it impossible to have an overall rss < real mem
>     100% of
>     > > the time, we want to minimize the impact of pagins.
>     > >
>     > > > The poor behavior of swap may really be a just a symptom of
>     other
>     > > activities
>     > > > related to memory management.
>     > >
>     > > Possibly.
>     > >
>     > > > What kind of machine is this, and what does CPU utilization
>     look like
>     > > > when you're inducing this behavior?
>     > >
>     > > These are a variety of systems. IBM 360, sun v20z, and x4100
>     (we have
>     > > m1's and m2's. I personally have only tested on m1 systems). This
>     > > behavior seems consistant on all of them.
>     > >
>     > > The program we're using to pin memory is this:
>     > >
>     > >
>     > >
>     > > #include <stdio.h>
>     > > #include <stdlib.h>
>     > > #include <unistd.h>
>     > >
>     > > int main(int argc, char** argv)
>     > > {
>     > >     if (argc != 2) {
>     > >         printf("Bad args\n");
>     > >         return 1;
>     > >     }
>     > >
>     > >     const int count = atoi(argv[1]);
>     > >     if (count <= 3) {
>     > >         printf("Bad count: %s\n", argv[1]);
>     > >         return 1;
>     > >     }
>     > >
>     > >     // Malloc
>     > >     const int nints = count >> 2;
>     > >     int* buf = (int*)malloc(count);
>     > >     if (buf == NULL) {
>     > >         perror("Failed to malloc");
>     > >         return 1;
>     > >     }
>     > >
>     > >     // Init
>     > >     for (int i=0; i < nints; i++) {
>     > >         buf[i] = rand();
>     > >     }
>     > >
>     > >     // Maintain working set
>     > >     for (;;) {
>     > >         return 1;
>     > >     }
>     > >
>     > >     const int count = atoi(argv[1]);
>     > >     if (count <= 3) {
>     > >         printf("Bad count: %s\n", argv[1]);
>     > >         return 1;
>     > >     }
>     > >
>     > >     // Malloc
>     > >     const int nints = count >> 2;
>     > >     int* buf = (int*)malloc(count);
>     > >     if (buf == NULL) {
>     > >         perror("Failed to malloc");
>     > >         return 1;
>     > >     }
>     > >
>     > >     // Init
>     > >     for (int i=0; i < nints; i++) {
>     > >         buf[i] = rand();
>     > >     }
>     > >
>     > >     // Maintain working set
>     > >     for (;;) {
>     > >         for (int i=0; i < nints; i++) {
>     > >             buf[i]++;
>     > >         }
>     > >         //sleep(1);
>     > >     }
>     > >
>     > >     return 0;
>     > > }
>     > >
>     > > Nothing too complex. Reads and writes to /tmp and /var/tmp in our
>     > > tests were all done with dd.
>     > >
>     > > I am following up with sun support for this, but in the mean
>     time I am
>     > > curious if you our anyone else out there see the same behavior?
>     > >
>     > > Thanks,
>     > >
>     > > -Peter
>     > >
>     > > --
>     > > The 5 year plan:
>     > > In five years we'll make up another plan.
>     > > Or just re-use this one.
>     > >
>     > > _______________________________________________
>     > > perf-discuss mailing list
>     > > perf-discuss@opensolaris.org <mailto:perf-discuss@opensolaris.org>
>     > >
>
>     --
>     The 5 year plan:
>     In five years we'll make up another plan.
>     Or just re-use this one.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
>   

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to