Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-11-13 Thread Karsten Weiss
 Does this maybe ring a bell with someone?

Update: The cause of the problem was

OpenSolaris bug 6826836 Deadlock possible in dmu_object_reclaim()
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6826836

It could be fixed by upgrading the OpenSolaris 2009.06 system to 
0.5.11-0.111.17 (via the non-free official support repository).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-11-10 Thread Karsten Weiss
 I'm not very familar with mdb. I've tried this:
 
Ah, this looks much better:

root   641  0.0  0.0 7660 2624 ?S   Nov 08  2:16 /sbin/zfs receive 
-dF datapool/share/ (...)

# echo 0t641::pid2proc|::walk thread|::findstack -v | mdb -k
stack pointer for thread ff09236198e0: ff003d9b5670
[ ff003d9b5670 _resume_from_idle+0xf1() ]
  ff003d9b56a0 swtch+0x147()
  ff003d9b56d0 cv_wait+0x61(ff0a4fbd4228, ff0a4fbd40e8)
  ff003d9b5710 dmu_tx_wait+0x80(ff0948aa4600)
  ff003d9b5750 dmu_tx_assign+0x4b(ff0948aa4600, 1)
  ff003d9b57e0 dmu_free_long_range_impl+0x12a(ff0911456d60, 
ff0a4fbd4028, 0, , 0)
  ff003d9b5840 dmu_free_long_range+0x5b(ff0911456d60, 53e34, 0, 
)
  ff003d9b58d0 dmu_object_reclaim+0x112(ff0911456d60, 53e34, 13, 1e00, 
11, 108)
  ff003d9b5930 restore_object+0xff(ff003d9b5950, ff0911456d60, 
ff003d9b59c0)
  ff003d9b5a90 dmu_recv_stream+0x48d(ff003d9b5be0, ff094d089440, 
ff003d9b5ad8)
  ff003d9b5c40 zfs_ioc_recv+0x2c0(ff092492b000)
  ff003d9b5cc0 zfsdev_ioctl+0x10b(b6, 5a1c, 8044e50, 13, 
ff0948b60e50, ff003d9b5de4)
  ff003d9b5d00 cdev_ioctl+0x45(b6, 5a1c, 8044e50, 13, 
ff0948b60e50, ff003d9b5de4)
  ff003d9b5d40 spec_ioctl+0x83(ff0921e54640, 5a1c, 8044e50, 13, 
ff0948b60e50, ff003d9b5de4, 0)
  ff003d9b5dc0 fop_ioctl+0x7b(ff0921e54640, 5a1c, 8044e50, 13, 
ff0948b60e50, ff003d9b5de4, 0)
  ff003d9b5ec0 ioctl+0x18e(3, 5a1c, 8044e50)
  ff003d9b5f10 sys_syscall32+0x101()

Does this maybe ring a bell with someone?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-11-08 Thread Karsten Weiss
Does anyone know the current state of bug #6975124? Has there been any progress 
since August?

I currently have an OpenSolaris 2009.06 snv_111b system (entire 
0.5.11-0.111.14) which *repeatedly* gets stuck after a couple of minutes during 
a large (xxx GB) incremental zfs receive operation. The process does not crash, 
it simply keeps sleeping and there is no progress at all.

   PID USERNAME NLWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
   641 root  1   60  0  7660K 2624K sleep   2:160.00% zfs

Both truss and mdb are not able to show *any* activity or status of the zfs
receive process:

# truss -p 641
*hangs*

I'm not very familar with mdb. I've tried this:

# mdb -p 641
mdb: failed to initialize //lib/libc_db.so.1: libthread_db call failed 
unexpectedly
mdb: warning: debugger will only be able to examine raw LWPs
Loading modules: [ ld.so.1 libumem.so.1 libavl.so.1 libnvpair.so.1 ]
 ::stack
 ::stackregs
 ::status
debugging PID 641 (32-bit)
file: /sbin/zfs
threading model: raw lwps
status: process is running, debugger stop directive pending

I'm wondering if #6975124 could be the cause of my problem, too.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-11-08 Thread Markus Kovero

 I'm wondering if #6975124 could be the cause of my problem, too.

there are several zfs send (and receive) related issues with 111b. You might 
seriously want to consider upgrading to more recent opensolaris (134) or 
openindiana

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-08-06 Thread Jim Barker
I have been looking at why a zfs receive operation is terribly slow and one 
observation that seemed directly linked to why it is slow is that at any one 
time one of the cpus is pegged at 100% sys while the other 5 in my case are 
relatively quiet.  I haven't dug any deeper than that, but was curious to know 
if anyone else observed the same behavior?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-08-06 Thread Jim Barker
Just an update, I had a ticket open with Sun regarding this and it looks like 
they have a CR for what I was seeing (6975124).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-08-06 Thread Andrew Gabriel

Jim Barker wrote:

Just an update, I had a ticket open with Sun regarding this and it looks like 
they have a CR for what I was seeing (6975124).
  


That would seem to describe a zfs receive which has stopped for 12 hours.
You described yours as slow, which is not the term I personally would 
use for one which is stopped.
However, you haven't given anything like enough detail here of your 
situation and what's happening for me to make any worthwhile guesses.


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-08-06 Thread Jim Barker
Andrew,

Correct.  The reason I initially opened the case was because I could 
essentially hang a zfs receive operation and any further zfs commands issued 
on the box would never come back.  Just today I had one of my slow receives 
just come to a screaching halt and where I saw 1 cpu spike all the time, it is 
now exhibiting the same behavior as the hang (absolutely no activity, quiet as 
a mouse).  I guess I didn't wait long enough for the slow process to finally 
hang.  It is hung now and will stay that way until the end of time.  I thought 
I had found a way to get around the freeze, but I guess I just delayed the 
freeze a little longer.  I provided Oracle some explorer output and a crash 
dump to analyze and this is the data they used to provide the information I 
passed on.

Jim Barker
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-06-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Thomas Maier-Komor
 
 you can probably improve overall performance by using mbuffer [1] to
 stream the data over the network. At least some people have reported
 increased performance. mbuffer will buffer the datastream and
 disconnect
 zfs send operations from network latencies.
 
 Get it there:
 original source: http://www.maier-komor.de/mbuffer.html
 binary package:  http://www.opencsw.org/packages/CSWmbuffer/

mbuffer is also available in opencsw / blastwave.  IMHO, easier and faster
and better than building things from source, most of the time.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Maximum zfs send/receive throughput

2010-06-25 Thread Mika Borner


It seems we are hitting a boundary with zfs send/receive over a network 
link (10Gb/s). We can see peak values of up to 150MB/s, but on average 
about 40-50MB/s are replicated. This is far away from the bandwidth that 
a 10Gb link can offer.


Is it possible, that ZFS is giving replication a too low 
priority/throttling it too much?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-06-25 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Mika Borner
 
 It seems we are hitting a boundary with zfs send/receive over a network
 link (10Gb/s). We can see peak values of up to 150MB/s, but on average
 about 40-50MB/s are replicated. This is far away from the bandwidth
 that
 a 10Gb link can offer.
 
 Is it possible, that ZFS is giving replication a too low
 priority/throttling it too much?

I don't think this is called replication, so ... careful about
terminology.

zfs send can go as fast as your hardware is able to read.  If you'd like to
know how fast your hardware is, try this:
zfs send somefilesystem | pv -i 30  /dev/null
(You might want to install pv from opencsw or blastwave.)

I think, in your case, you'll see something around 40-50MB/s

I will also add this much:  If you send the original snapshot of your
complete filesystem, it'll probably go very fast.  (Much faster than 40-50
MB/s).  Because all those blocks are essentially sequential blocks on disk.
When you're sending incrementals ... They are essentially more fragmented
... so the total throughput is lower.  The disks have to perform a greater
random IO percentage.

I have a very fast server, and my zfs send is about half as fast as yours.

In both cases, it's enormously faster than some other backup tool, like tar
or rsync or whatever.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-06-25 Thread Thomas Maier-Komor
On 25.06.2010 14:32, Mika Borner wrote:
 
 It seems we are hitting a boundary with zfs send/receive over a network
 link (10Gb/s). We can see peak values of up to 150MB/s, but on average
 about 40-50MB/s are replicated. This is far away from the bandwidth that
 a 10Gb link can offer.
 
 Is it possible, that ZFS is giving replication a too low
 priority/throttling it too much?
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

you can probably improve overall performance by using mbuffer [1] to
stream the data over the network. At least some people have reported
increased performance. mbuffer will buffer the datastream and disconnect
zfs send operations from network latencies.

Get it there:
original source: http://www.maier-komor.de/mbuffer.html
binary package:  http://www.opencsw.org/packages/CSWmbuffer/

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss