Re: [zfs-discuss] Help needed with zfs send/receive

Arnaud Brand Wed, 03 Feb 2010 05:00:41 -0800

Le 03/02/2010 09:25, Arnaud Brand a écrit :

Le 03/02/2010 04:44, Brent Jones a écrit :
On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones<br...@servuhome.net>  wrote:
On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand<t...@tib.cc>  wrote:
Hi folks,
I'm having (as the title suggests) a problem with zfs send/receive.
Command line is like this :
pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfsrecv -v -F
-d tank

This works like a charm as long as the snapshot is small enough.
When it gets too big (meaning somewhere between 17G and 900G), Iget ssh
errors (can't read from remote host).
I tried various encryption options (the fastest being in my casearcfour)
with no better results.
I tried to setup a script to insert dd on the sending and receivingside to
buffer the flow, still read errors.
I tried with mbuffer (which gives better performance), it didn'tget better.Today I tried with netcat (and mbuffer) and I got betterthroughput, but it
failed at 269GB transferred.
The two machines are connected to the switch with 2x1GbE (Intel)joined
together with LACP.
The switch logs show no errors on the ports.
kstat -p | grep e1000g shows one recv error on the sending side.
I can't find anything in the logs which could give me a clue aboutwhat's
happening.

I'm running build 131.
If anyone has the slightest clue of where I could look or what Icould do topinpoint/solve the problem, I'd be very gratefull if (s)he couldshare it
with me.

Thanks and have a nice evening.

Arnaud



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
This issue seems to have started after snv_129 for me. I get "connect
reset by peer", or transfers (of any kind) simply timeout.

Smaller transfers succeed most of the time, while larger ones usually
fail. Rolling back to snv_127 (my last one) does not exhibit this
issue. I have not had time to narrow down any causes, but I did find
one bug report that found some TCP test scenarios failed during one of
the builds, but unable to find that CR at this time.

--
Brent Jones
br...@servuhome.net
Ah, I found the CR that seemed to describe the situation (broken
pipe/connection reset by peer)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510
This CR is marked as fixed in b131 and only relates to loopback or amI missing something ?
The transfer I started yesterday night finished with no errors :
/usr/bin/nc -l -p 8023 | /usr/local/bin/mbuffer -s1024k -m512M -P40 |dd of=/tank/repl.zfs bs=128k
summary: 1992 GByte in 7 h 10 min 78.9 MB/s, 8472x empty
2139334374612 bytes (2,1 TB) copied, 25860,5 s, 82,7 MB/s

So this seems to be linked someway to an high CPU load.
I'll change the network cables and, as Richard suggested, remove LACP.
Then I'll launch another transfer while at the same time zfs receivingthe file I transferred this night.If the transfer fails I guess it will be related to e1000g problemsunder load, not zfs, so a better place to post would beopensolaris-discuss.
Thanks for your help,
Arnaud

Seems to be network related : the transfer failed after 129GB evenwithout LACP and with other network cables.

I'll post to networking-discuss.

Arnaud
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help needed with zfs send/receive

Reply via email to