Re: [zfs-discuss] Puzzling problem with zfs receive exit status
On 03/29/12 10:46 PM, Borja Marcos wrote: Hello, I hope someone has an idea. I have a replication program that copies a dataset from one server to another one. The replication mechanism is the obvious one, of course: zfs send -Ri from snapshot(n-1) snapshot(n) file scp file remote machine (I do it this way instead of using a pipeline so that a network error won't interrupt a receive data stream) and on the remote machine, zfs receive -Fd pool It's been working perfectly for months, no issues. However, yesterday we began to see something weird: the zfs receive being executed on the remote machine is exiting with an exit status of 1, even though the replication is finished, and I see the copied snapshots on the remote machine. Any ideas? It's really puzzling. It seems that the replication is working (a zfs list -t snapshot shows the new snapshots correctly applied to the dataset) but I'm afraid there's some kind of corruption. Does zfs receive produce any warnings? Have you tried adding -v? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Puzzling problem with zfs receive exit status
On Mar 29, 2012, at 11:59 AM, Ian Collins wrote: Does zfs receive produce any warnings? Have you tried adding -v? Thank you very much Ian and Carsten. Well, adding a -v gave me a clue. Turns out that one of the old snapshots had a clone created. zfs receive -v was complaining that it couldn't destroy an old snapshot, which wasn't visible but had been cloned (and forgotten) long ago. A truss of the zfs receive process shown it accessing the clone. So, zfs receive was doing its job, the new snapshot was applied correctly, but it was exiting with an exit value of 1, without printing any warnings, which I think is wrong. I've destroyed the clone and everything has gone back to normal. Now zfs receive exits with 0. Still I'm not sure if it could be a bug, the snapshot was cloned in November 2011 and it had been sitting around for a long time. The pool had less than 20 % of free space two days ago, maybe it triggered something. Anyway, as I said, with the clone removed everything has gone back to normal. Thank you very much, Borja. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Puzzling problem with zfs receive exit status
On Mar 29, 2012, at 4:33 AM, Borja Marcos wrote: On Mar 29, 2012, at 11:59 AM, Ian Collins wrote: Does zfs receive produce any warnings? Have you tried adding -v? Thank you very much Ian and Carsten. Well, adding a -v gave me a clue. Turns out that one of the old snapshots had a clone created. zfs receive -v was complaining that it couldn't destroy an old snapshot, which wasn't visible but had been cloned (and forgotten) long ago. A truss of the zfs receive process shown it accessing the clone. So, zfs receive was doing its job, the new snapshot was applied correctly, but it was exiting with an exit value of 1, without printing any warnings, which I think is wrong. You are correct. Both zfs and zpool have a bad case of exit 1 if something isn't right. At Nexenta, I filed a bug against the ambiguity of the return code. You should consider filing a similar bug with Oracle. In the open-source ZFS implementations, there is some other work to get out of the way before properly tackling this, but that work is in progress :-) I've destroyed the clone and everything has gone back to normal. Now zfs receive exits with 0. Still I'm not sure if it could be a bug, the snapshot was cloned in November 2011 and it had been sitting around for a long time. The pool had less than 20 % of free space two days ago, maybe it triggered something. Anyway, as I said, with the clone removed everything has gone back to normal. good! Thank you very much, Borja. -- richard -- DTrace Conference, April 3, 2012, http://wiki.smartos.org/display/DOC/dtrace.conf ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Puzzling problem with zfs receive exit status
On Mar 29, 2012, at 5:11 PM, Richard Elling wrote: Thank you very much Ian and Carsten. Well, adding a -v gave me a clue. Turns out that one of the old snapshots had a clone created. zfs receive -v was complaining that it couldn't destroy an old snapshot, which wasn't visible but had been cloned (and forgotten) long ago. A truss of the zfs receive process shown it accessing the clone. So, zfs receive was doing its job, the new snapshot was applied correctly, but it was exiting with an exit value of 1, without printing any warnings, which I think is wrong. You are correct. Both zfs and zpool have a bad case of exit 1 if something isn't right. At Nexenta, I filed a bug against the ambiguity of the return code. You should consider filing a similar bug with Oracle. In the open-source ZFS implementations, there is some other work to get out of the way before properly tackling this, but that work is in progress :-) I understand that either a warning or, at least, a syslog message with LOG_WARNING is in order. Regarding the open source camp, yes, I'm using ZFS on FreeBSD as well :) Borja. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss