[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Lachlan Mulcahy
Hi All,

We are currently doing a zfs send/recv with mbuffer to send incremental
changes across and it seems to be running quite slowly, with zfs receive
the apparent bottle neck.

The process itself seems to be using almost 100% of a single CPU in sys
time.

Wondering if anyone has any ideas if this is normal or if this is just
going to run forever and never finish...


details - two machines connected via Gigabit Ethernet on the same LAN.

Sending server:

zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O
tdp03r-int:9090

Receiving server:

mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data

mbuffer showing:

in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful



My debug:

DTraceToolkit hotkernel reports:

zfs`lzjb_decompress10   0.0%
unix`page_nextn31   0.0%
genunix`fsflush_do_pages   37   0.0%
zfs`dbuf_free_range   183   0.1%
genunix`list_next5822   3.7%
unix`mach_cpu_idle 150261  96.1%


Top shows:

   PID USERNAME NLWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
 22945 root1  600   13M 3004K cpu/6  144:21  3.79% zfs
   550 root   28  590   39M   22M sleep   10:19  0.06% fmd

I'd say the 3.7% or so here is so low because we are providing not per CPU,
but aggregate CPU usage. mpstat seems to show the real story.

mpstat 1 shows output much like this each second:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   329  108   830   1730 00   0   0 100
  10   00   1001   940   2310 00   0   0 100
  20   00320   280510 00   0   0 100
  30   00180   110000 00   0   0 100
  40   00166   100200 00   0   0 100
  50   00 6020000 00   0   0 100
  60   00 2000000 00   0   0 100
  70   00 9040000160   0   0 100
  80   00 6030000 00   3   0  97
  90   00 3100000 00   0   0 100
 100   00222   350110 00  89   0  11
 110   00 2000000 00   0   0 100
 120   00 3020100 20   0   0 100
 130   00 2000000 00   0   0 100
 140   0024   1760020610   0   0 100
 150   00140   240010 20   0   0 100
 160   00 2000000 00   0   0 100
 170   0010280050780   1   0  99
 180   00 2000000 00   0   0 100
 190   00 5120000100   0   0 100
 200   00 2000000 00   0   0 100
 210   00 9240000 40   0   0 100
 220   00 4000000 00   0   0 100
 230   00 2000000 00   0   0 100


So I'm lead to believe that zfs receive is spending almost 100% of a single
CPUs time doing a lot of genunix`list_next ...

Any ideas what is going on here?

Best Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Bob Friesenhahn

On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:

genunix`list_next    5822   3.7%
unix`mach_cpu_idle 150261  96.1%


Rather idle.


Top shows:

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 22945 root    1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
   550 root   28  59    0   39M   22M sleep   10:19  0.06% fmd


Having 'fmd' (fault monitor daemon) show up in top at all is rather 
ominous since it implies that faults are actively being reported.  You 
might want to look into what is making it active.



So I'm lead to believe that zfs receive is spending almost 100% of a single 
CPUs time doing a lot of
genunix`list_next ...


Or maybe it is only doing 3.7% of this.  There seems to be a whole lot 
of nothing going on.



Any ideas what is going on here?


Definitely check if low-level faults are being reported to fmd.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Bill Sommerfeld
On 12/05/11 10:47, Lachlan Mulcahy wrote:
 zfs`lzjb_decompress10   0.0%
 unix`page_nextn31   0.0%
 genunix`fsflush_do_pages   37   0.0%
 zfs`dbuf_free_range   183   0.1%
 genunix`list_next5822   3.7%
 unix`mach_cpu_idle 150261  96.1%

your best bet in a situation like this -- where there's a lot of cpu time
spent in a generic routine -- is to use an alternate profiling method that
shows complete stack traces rather than just the top function on the stack.

often the names of functions two or three or four deep in the stack will point
at what's really responsible.

something as simple as:

dtrace -n 'profile-1001 { @[stack()] = count(); }'

(let it run for a bit then interrupt it).

should show who's calling list_next() so much.

- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Lachlan Mulcahy
Hi Bob,

On Mon, Dec 5, 2011 at 11:19 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:

 genunix`list_next **   5822   3.7%
 unix`mach_cpu_idle** 150261  96.1%


 Rather idle.


Actually this is a bit misleading since it is averaged over all the cores
in the system.

This system has 24 cores and the approximate 90% used on a single core in
system time can be shown to be 3.75%

90 / 24 = 3.75

Similarly 23/24 cores are close to 100% idle.

 Top shows:

PID USERNAME NLWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
  22945 root1  600   13M 3004K cpu/6  144:21  3.79% zfs
550 root   28  590   39M   22M sleep   10:19  0.06% fmd


Having 'fmd' (fault monitor daemon) show up in top at all is rather ominous
 since it implies that faults are actively being reported.  You might want
 to look into what is making it active.


Nothing in /var/adm/messages as this is the last two days entries:

Dec  4 18:46:20 mslvstdp03r sshd[20926]: [ID 800047 auth.crit] fatal: Read
from socket failed: Connection reset by peer
Dec  5 01:40:07 mslvstdp03r sshd[21808]: [ID 800047 auth.crit] fatal: Read
from socket failed: Connection reset by peer


Anything else you suggest I'd check for faults? (Though I'm sort of
doubting it is an issue, I'm happy to be thorough)



 So I'm lead to believe that zfs receive is spending almost 100% of a
 single CPUs time doing a lot of
 genunix`list_next ...


Or maybe it is only doing 3.7% of this.  There seems to be a whole lot of
 nothing going on.


See the math above -- also mpstat shows a single CPU at around 90% system
time and since top reports zfs as the only active process the
circumstantial evidence is fairly indicative.


 Any ideas what is going on here?


Definitely check if low-level faults are being reported to fmd.


Any logs other than /var/adm/messages I should check (apologies, I'm quite
new to (open)solaris )

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Bob Friesenhahn

On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:


Anything else you suggest I'd check for faults? (Though I'm sort of doubting it 
is an issue, I'm happy to be
thorough)


Try running

  fmdump -ef

and see if new low-level fault events are comming in during the zfs 
receive.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Lachlan Mulcahy
Hi Bob,


On Mon, Dec 5, 2011 at 12:31 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:


 Anything else you suggest I'd check for faults? (Though I'm sort of
 doubting it is an issue, I'm happy to be
 thorough)


 Try running

  fmdump -ef

 and see if new low-level fault events are comming in during the zfs
 receive.


Just a bunch of what I would guess is unrelated USB errors?

Dec 05 20:58:09.0246 ereport.io.usb.epse
Dec 05 20:58:50.9207 ereport.io.usb.epse
Dec 05 20:59:36.0242 ereport.io.usb.epse
Dec 05 20:59:39.0230 ereport.io.usb.epse
Dec 05 21:00:21.0223 ereport.io.usb.epse
Dec 05 21:01:06.0215 ereport.io.usb.epse
Dec 05 21:01:09.0314 ereport.io.usb.epse
Dec 05 21:01:50.9213 ereport.io.usb.epse
Dec 05 21:02:36.0299 ereport.io.usb.epse
Dec 05 21:02:39.0298 ereport.io.usb.epse

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

2011-12-05 Thread Lachlan Mulcahy
Hi All,

Just a follow up - it seems like whatever it was doing it eventually got
done with and the speed picked back up again. The send/recv finally
finished -- I guess I could do with a little patience :)

Lachlan

On Mon, Dec 5, 2011 at 10:47 AM, Lachlan Mulcahy lmulc...@marinsoftware.com
 wrote:

 Hi All,

 We are currently doing a zfs send/recv with mbuffer to send incremental
 changes across and it seems to be running quite slowly, with zfs receive
 the apparent bottle neck.

 The process itself seems to be using almost 100% of a single CPU in sys
 time.

 Wondering if anyone has any ideas if this is normal or if this is just
 going to run forever and never finish...


 details - two machines connected via Gigabit Ethernet on the same LAN.

 Sending server:

 zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O
 tdp03r-int:9090

 Receiving server:

 mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data

 mbuffer showing:

 in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful



 My debug:

 DTraceToolkit hotkernel reports:

 zfs`lzjb_decompress10   0.0%
 unix`page_nextn31   0.0%
 genunix`fsflush_do_pages   37   0.0%
 zfs`dbuf_free_range   183   0.1%
 genunix`list_next5822   3.7%
 unix`mach_cpu_idle 150261  96.1%


 Top shows:

PID USERNAME NLWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
  22945 root1  600   13M 3004K cpu/6  144:21  3.79% zfs
550 root   28  590   39M   22M sleep   10:19  0.06% fmd

 I'd say the 3.7% or so here is so low because we are providing not per
 CPU, but aggregate CPU usage. mpstat seems to show the real story.

 mpstat 1 shows output much like this each second:

 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
 idl
   00   00   329  108   830   1730 00   0   0
 100
   10   00   1001   940   2310 00   0   0
 100
   20   00320   280510 00   0   0
 100
   30   00180   110000 00   0   0
 100
   40   00166   100200 00   0   0
 100
   50   00 6020000 00   0   0
 100
   60   00 2000000 00   0   0
 100
   70   00 9040000160   0   0
 100
   80   00 6030000 00   3   0
 97
   90   00 3100000 00   0   0
 100
  100   00222   350110 00  89   0
 11
  110   00 2000000 00   0   0
 100
  120   00 3020100 20   0   0
 100
  130   00 2000000 00   0   0
 100
  140   0024   1760020610   0   0
 100
  150   00140   240010 20   0   0
 100
  160   00 2000000 00   0   0
 100
  170   0010280050780   1   0
 99
  180   00 2000000 00   0   0
 100
  190   00 5120000100   0   0
 100
  200   00 2000000 00   0   0
 100
  210   00 9240000 40   0   0
 100
  220   00 4000000 00   0   0
 100
  230   00 2000000 00   0   0
 100


 So I'm lead to believe that zfs receive is spending almost 100% of a
 single CPUs time doing a lot of genunix`list_next ...

 Any ideas what is going on here?

 Best Regards,
 --
 Lachlan Mulcahy
 Senior DBA,
 Marin Software Inc.
 San Francisco, USA

 AU Mobile: +61 458 448 721
 US Mobile: +1 (415) 867 2839
 Office : +1 (415) 671 6080




-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss