Hi Steven,

Sounds very familiar. Painfully familiar :(

But I really don't know. All I can see is that in this particular
configuration the instance has 2 x Intel Xeon E5-2670, eight-core
processors. I can't find any info on whether it's flex or round robin. AWS
typically don't make the underlying hardware known. The exception is on the
chip-types on the higher-end instance types which is where I got the info
above from.

Below is an excerpt from atop when the problem occur. The CPUs jump to high
sys usage, not sure if that was similar to what you saw?

How did you get it resolved in the end?

ATOP - ip-10-155-231-112
                   2013/04/02  01:25:40                 ------
                                                           2s elapsed
59;169H 0   70.15s  |              |  user   8.19s |               |
        |              |               | #proc   1015  |              |
 #zombie    0 |               | clones     0 |               |
  |               |              |  #exit      2 |
CPU | sys    3182%  |              |  user     30% |               | irq
    1%  |              |               |               | idle      0% |
          | wait      0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      98%  |              |  user      1% |               | irq
    1%  |              |               |               | idle      0% |
          | cpu000 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      96%  |              |  user      4% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu001 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu002 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu003 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu004 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu005 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      98%  |              |  user      2% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu006 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu007 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu008 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu009 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu010 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu011 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu012 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      97%  |              |  user      3% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu013 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu014 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu015 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu016 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      82%  |              |  user     18% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu017 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu018 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu019 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu020 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu021 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu022 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu023 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu024 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu025 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu026 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu027 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu028 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu029 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys     100%  |              |  user      0% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu030 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
cpu | sys      99%  |              |  user      1% |               | irq
    0%  |              |               |               | idle      0% |
          | cpu031 w  0%  |              |               |               |
steal     0%  |              |  guest     0% |
CPL | avg1   90.60  |              |  avg5   60.80 |               |
        | avg15  39.77 |               |               |              |
 csw     1011 |               | intr   17568 |               |
  |               |              |  numcpu    32 |
MEM | tot    58.5G  |              |  free  418.4M |               | cache
 45.0G  | dirty   0.6M |               | buff    5.8M  |              |
 slab  501.2M |               |              |               |
  |               |              |               |
SWP | tot     0.0M  |              |  free    0.0M |               |
        |              |               |               |              |
          |               |              |               |               |
vmcom  49.8G  |              |  vmlim  29.3G |
PAG | scan    1858  |              |               |               | stall 0
 |              |               |               |              |
    |               |              |  swin       0 |               |
        |              |  swout      0 |
NET | transport     | tcpi     318 |          | tcpo     392  | udpi
 34  |         |  udpo      39 | tcpao      0  |              |  tcppo
 2 | tcprs      0  |              |  tcpie      0 |  tcpor      0 |
      | udpnp      0 |  udpip      0 |
NET | network       |              |  ipi      357 |               | ipo
   397  | ipfrw 0 |               | deliv    357  |              |
      |               |              |               |               |
icmpi      0  |              |  icmpo      0 |
NET | eth0    ----  |              |  pcki     318 | pcko     358  |
        | si  200 Kbps |  so  947 Kbps |               | coll       0 |
          | mlti       0  | erri       0 |               |  erro       0 |
drpi       0  |              |  drpo       0 |
NET | lo      ----  |              |  pcki      39 | pcko      39  |
        | si   79 Kbps |  so   79 Kbps |               | coll       0 |
          | mlti       0  | erri       0 |               |  erro       0 |
drpi       0  |              |  drpo       0 |
debug2: channel 0: window 997757 sent adjust 50819


On Tue, Apr 2, 2013 at 3:07 AM, Steven Crandell
<steven.crand...@gmail.com>wrote:

> Armand,
>
> All of the symptoms you describe line up perfectly with a problem I had
> recently when upgrading DB hardware.
> Everything ran find until we hit some threshold somewhere at which point
> the locks would pile up in the thousands just as you describe, all while we
> were not I/O bound.
>
> I was moving from a DELL 810 that used a flex memory bridge to a DELL 820
> that used round robin on their quad core intels.
> (Interestingly we also found out that DELL is planning on rolling back to
> the flex memory bridge later this year.)
>
> Any chance you could find out if your old processors might have been using
> flex while you're new processors might be using round robin?
>
> -s
>
>
>

Reply via email to