Re: HEADS UP: Capsicum overhaul.
On Fri, Mar 01, 2013 at 09:45:02PM -0600, Larry Rosenman wrote: On Sat, 2 Mar 2013, Pawel Jakub Dawidek wrote: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. Similar to another post: vn up Updating '.': Udatabases/py-sqlite3/Makefile Udatabases/py-sqlite3/files/setup.py Udatabases/py-sqlite3/files/setup3.py svn: E93: Can't move '/usr/ports/.svn/tmp/svn-X6U5KQ' to '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient # svn up svn: E155037: Previous operation has not finished; run 'cleanup' if it was interrupted # svn cleanup svn: E93: Can't move '/usr/ports/.svn/tmp/svn-Bb1iSM' to '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient This should be now fixed in r247616. Thank you for the report! -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpRi27mevghK.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote: Pawel Jakub Dawidek p...@freebsd.org writes: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. tmux started to behave weirdly, sometimes failing to attach: $ printenv PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin OLDPWD=/ DISPLAY=:0 PWD=/home/foo TERM=xterm USER=foo HOME=/home/foo SHELL=/bin/sh $ ktrace -i tmux -L test -f /dev/null $ echo $? 1 $ kdump -r | pastebinit -a 'tmux fails to attach' http://pastebin.com/U3nCPrFY $ env -i TERM=$TERM ktrace -i /usr/local/bin/tmux -L test -f /dev/null $ ^D [exited] $ kdump -r | pastebinit -a 'tmux fails to attach (workaround)' http://pastebin.com/w1dsUAU4 I've tried so far: * booting allbsd.org snapshot - no joy * enabling capsicum options - no joy * reverting recent capsicum commits - works fine Yes, it was already reported to me and I'm investigating the problem. Thanks. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpg0QzEgXbEG.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote: Pawel Jakub Dawidek p...@freebsd.org writes: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. tmux started to behave weirdly, sometimes failing to attach: I committed a work-around in r247740, but the root of the problem is yet to be found. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp7tAe7RGpg5.pgp Description: PGP signature
Re: kernel build failure
On Sun, Mar 03, 2013 at 06:47:00PM -0500, Michael Butler wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 SVN r247736 prompts this .. cc -c -O2 -pipe -fno-strict-aliasing -march=pentium4 -std=c99 -Wall - -Wredundant-decls -Wnested-externs -Wstrict-prototypes - -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef - -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs - -fdiagnostics-show-option -Wno-error-tautological-compare - -Wno-error-empty-body -Wno-error-parentheses-equality -nostdinc -I. - -I/usr/src/sys -I/usr/src/sys/contrib/altq -D_KERNEL - -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -mno-aes -mno-avx - -mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror /usr/src/sys/kern/uipc_usrreq.c /usr/src/sys/kern/uipc_usrreq.c:1689:18: error: use of undeclared identifier 'fdep'; did you mean 'fde'? filecaps_free(fdep-fde_caps); ^~~~ fde /usr/src/sys/kern/uipc_usrreq.c:1682:36: note: 'fde' declared here unp_freerights(struct filedescent *fde, int fdcount) ^ 1 error generated. This was because I divided larger change into smaller changes. r247738 should be fine. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp3VnbiqBD_t.pgp Description: PGP signature
Re: r247839: broken pipe - for top, sudo and ports
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote: On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote: On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems (3) the same symptoms)), many services drop a sporadic broken pipe This happesn to system's top (I have to type it several times to get finally a top), it happens to sudo su -, it happens to SSH (drops connection with broken pipe) and as I reported earlier, it seems to affect the entire port system, since I can not build any port, I receive *** [do-extract] Signal 13 This is dramatic for me, because several modules (rtc, linux_adobe ...) can not be recompiled as it is required by the last /usr/src/UPDATING entry 20130304. Since dbus fails to start and even the nVidia driver (which is a kernel module, it canot be built and therefore ... ). Dimitry, I put you into CC, just in case. It seems that the last commits (not only the new DRM2 mess) broke something. I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\ Have you tried backing up to just before all of pjd@'s file descriptor and capsicum commits? It broke some other stuff initially related to fd passing, so I don't think it is beyond imagination that it broke something with UNIX domain sockets in general. Is there a consensus already if this is result of my changes or davide's r247804? I just upgraded my laptop to today's HEAD and I don't see any weird behaviour yet. If someone can provide a way to reproduce the problem, I'd be happy to investigate. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp7F62niw8U0.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Wed, Mar 13, 2013 at 11:18:36AM -0400, John Baldwin wrote: On Tuesday, March 12, 2013 4:16:32 pm Dirk Engling wrote: While debugging my own daemon I noticed that pidfile_open does not perform the appropriate checks for a running daemon if the caller does not provide a pidptr to pidfile_open fd = flopen(pfh-pf_path, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode); fails when another daemon holds the lock and flopen sets errno to EAGAIN, the check 4 lines below in if (errno == EWOULDBLOCK pidptr != NULL) { means that the pidfile_read is never executed. This results in my second daemon receiving an EAGAIN which clearly was meant to report a race condition between two daemons starting at the same time and the first one not yet finishing pidfile_write. The expected behavior would be to set errno to EEXIST, even if no pidptr was passed. Yes, I think it should actually perform the same logic even if pidptr is NULL of waiting for the other daemon to finish starting up. Something like this: Index: lib/libutil/pidfile.c === --- pidfile.c (revision 248162) +++ pidfile.c (working copy) @@ -100,6 +100,7 @@ pidfile_open(const char *path, mode_t mode, pid_t struct stat sb; int error, fd, len, count; struct timespec rqtp; + pid_t dummy; pfh = malloc(sizeof(*pfh)); if (pfh == NULL) @@ -126,7 +127,9 @@ pidfile_open(const char *path, mode_t mode, pid_t fd = flopen(pfh-pf_path, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode); if (fd == -1) { - if (errno == EWOULDBLOCK pidptr != NULL) { + if (errno == EWOULDBLOCK) { + if (pidptr == NULL) + pidptr = dummy; count = 20; rqtp.tv_sec = 0; rqtp.tv_nsec = 500; I agree EEXIST should be returned, but I don't like reading existing pidfile (including waiting for the other process to write its PID) just to throw read PID away. How about this patch? http://people.freebsd.org/~pjd/patches/pidfile.c.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp85B_RZoSMW.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Wed, Mar 13, 2013 at 10:59:17PM +0100, Dirk Engling wrote: On Wed, 13 Mar 2013, Pawel Jakub Dawidek wrote: How about this patch? http://people.freebsd.org/~pjd/patches/pidfile.c.patch If you move the lines + if (errno == 0 || errno == EAGAIN) + errno = EEXIST; out of the else branch, you can get rid of the if branch, guard the else branch by a + if (pidptr) { and let the if (errno == 0 || errno == EAGAIN) fix the errno I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp1futIji1g8.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13.03.13 23:08, Pawel Jakub Dawidek wrote: I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? http://erdgeist.org/arts/software/Code/pidfile.c.diff Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true on FreeBSD, but is not portable. Also in case pidptr is NULL you compare errno three times instead of just one (not a big deal of course, just something that could be done a bit more optimal:)). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpdpnLR5mSxt.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 09:42:40AM -0400, John Baldwin wrote: On Thursday, March 14, 2013 4:44:20 am Pawel Jakub Dawidek wrote: On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13.03.13 23:08, Pawel Jakub Dawidek wrote: I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? http://erdgeist.org/arts/software/Code/pidfile.c.diff Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true on FreeBSD, but is not portable. Also in case pidptr is NULL you compare errno three times instead of just one (not a big deal of course, just something that could be done a bit more optimal:)). Geeze, why not just add an else. That's the really short diff: Heh, I did consider that as well, but here you check errno twice, instead of once. Guys, is there anything wrong with the patch I proposed? Index: pidfile.c === --- pidfile.c (revision 248162) +++ pidfile.c (working copy) @@ -140,7 +140,8 @@ pidfile_open(const char *path, mode_t mode, pid_t *pidptr = -1; if (errno == 0 || errno == EAGAIN) errno = EEXIST; - } + } else if (errno == EWOULDBLOCK) + errno = EEXIST; free(pfh); return (NULL); } -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpXFp9L0bjdx.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 10:11:07AM -0700, Chuck Swiger wrote: Hi-- On Mar 14, 2013, at 9:50 AM, John Baldwin wrote: On Thursday, March 14, 2013 12:29:58 pm Pawel Jakub Dawidek wrote: [ ... ] Heh, I did consider that as well, but here you check errno twice, instead of once. Guys, is there anything wrong with the patch I proposed? I'm sure the compiler can work that out just fine and it should do whatever is most readable to the programmer. I don't care either way. Strong +1. Having the code be correct and readable is much more important then trying to hand-optimize a single-digit # of integer compares in startup code that usually runs ~once per process. Well, I think my version is more obvious, just the diff is larger. Anyway, I think enough has been said already about this crucial change:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpeAnHpmUc3i.pgp Description: PGP signature
Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168
, v_mount = 0x0, v_nmntvnodes = { tqe_next = 0xfe014fd95760, tqe_prev = 0xfe011d500958}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = { lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe01967007b0}, v_cache_dd = 0x0, v_lock = {lock_object = {lo_name = 0x80dddbb1 zfs, lo_flags = 91881472, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = { lock_object = {lo_name = 0x807bfbb9 vnode interlock, lo_flags = 16908288, lo_data = 0, lo_witness = 0x0}, mtx_lock = 6}, v_vnlock = 0xfe01967007c8, v_actfreelist = { tqe_next = 0xfe0031985b10, tqe_prev = 0xfe014fd95820}, v_bufobj = {bo_mtx = {lock_object = { lo_name = 0x807bfbc9 bufobj interlock, lo_flags = 16908288, lo_data = 0, lo_witness = 0x0}, mtx_lock = 6}, bo_ops = 0x80a5af10, bo_object = 0x0, bo_synclist = { le_next = 0x0, le_prev = 0x0}, bo_private = 0xfe0196700760, __bo_vnode = 0xfe0196700760, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe0196700880}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe01967008a0}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfe01967008e8}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 0, v_iflag = 128, v_vflag = 4, v_writecount = 0, v_hash = 26636295, v_type = VBAD} # kgdb -n 0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: negative refcount 0xfe0059a400c8 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff823aff8770 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff823aff8820 vpanic() at vpanic+0x127/frame 0xff823aff8860 kassert_panic() at kassert_panic+0x136/frame 0xff823aff88d0 closef() at closef+0x1ff/frame 0xff823aff8960 closefp() at closefp+0xa0/frame 0xff823aff89b0 amd64_syscall() at amd64_syscall+0x1f9/frame 0xff823aff8ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff823aff8ab0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80aeaaa8a, rsp = 0x7fffbd28, rbp = 0x7fffbd40 --- Uptime: 21m3s [...] (kgdb) bt #0 doadump (textdump=1) at pcpu.h:231 #1 0x804f5827 in kern_reboot (howto=260) at /freebsd-src/local/sys/kern/kern_shutdown.c:447 #2 0x804f5d36 in vpanic (fmt=value optimized out, ap=value optimized out) at /freebsd-src/local/sys/kern/kern_shutdown.c:754 #3 0x804f5bc6 in kassert_panic (fmt=value optimized out) at /freebsd-src/local/sys/kern/kern_shutdown.c:642 #4 0x804b900f in closef (fp=value optimized out, td=value optimized out) at refcount.h:66 #5 0x804b7030 in closefp (fdp=0xfe018dc79800, fd=value optimized out, fp=0xfe0059a400a0, td=0xfe016dfca920, holdleaders=value optimized out) at /freebsd-src/local/sys/kern/kern_descrip.c:1136 #6 0x806e26c9 in amd64_syscall (td=0xfe016dfca920, traced=0) at subr_syscall.c:134 #7 0x806cb13b in Xfast_syscall () at exception.S:387 #8 0x00080aeaaa8a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) Thanks, Shawn Webb ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgprAfmBcAPgt.pgp Description: PGP signature
Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168
On Sun, Jun 30, 2013 at 01:18:36PM +0200, Mateusz Guzik wrote: On Sun, Jun 30, 2013 at 05:21:42PM +1000, Kubilay Kocak wrote: I'm seeing what I believe is related panic, reliably being generated by the Python regression test suite on a newly created FreeBSD 10-CURRENT buildbot. Symptoms first seen in an freebsd.org FTP snapshot dated Thu May 30 20:01:46 UTC 2013 and also reproducible on a freshly updated r252400 It is additionally reproducible after checking out pure upstream python sources, using the following steps: hg clone http://hg.python.org/cpython cd cpython configure make buildbottest An interesting possible correlation is that it seems to drop out during/around test_socket Turns out the bug is quite funny ;) Try this: diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c index 5d8e814..7a4db04 100644 --- a/sys/kern/uipc_usrreq.c +++ b/sys/kern/uipc_usrreq.c @@ -1764,8 +1764,8 @@ unp_externalize(struct mbuf *control, struct mbuf **controlp, int flags) } for (i = 0; i newfds; i++, fdp++) { fde = fdesc-fd_ofiles[*fdp]; - fde-fde_file = fdep[0]-fde_file; - filecaps_move(fdep[0]-fde_caps, + fde-fde_file = fdep[i]-fde_file; + filecaps_move(fdep[i]-fde_caps, fde-fde_caps); if ((flags MSG_CMSG_CLOEXEC) != 0) fde-fde_flags |= UF_EXCLOSE; Thanks for tracking it down before I had time to get to it! The change looks good. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpHVKRcu5rHH.pgp Description: PGP signature
HEADSUP! dhclient(8) sandboxing.
Hi. I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. The work was sponsored by the FreeBSD Foundation. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpG83WhmJLcx.pgp Description: PGP signature
Re: HEADSUP! dhclient(8) sandboxing.
On Wed, Jul 03, 2013 at 11:04:21PM -0700, Alfred Perlstein wrote: On 7/3/13 3:52 PM, Pawel Jakub Dawidek wrote: Hi. I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. The work was sponsored by the FreeBSD Foundation. It broke running dhclient on igb0 for me. It says interface not found or something to that effect. Can I help somehow? Basically just ifconfig down igb0 then try to run dhclient. It will not work. If you up the interface and then run it, it is OK. See attached image. Thanks for the report. Could you try this patch? http://people.freebsd.org/~pjd/patches/dhclient.c.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp0Gs_Z1C8ZM.pgp Description: PGP signature
Re: HEADSUP! dhclient(8) sandboxing.
On Thu, Jul 04, 2013 at 04:55:14PM +0400, Andrey Chernov wrote: On 04.07.2013 2:52, Pawel Jakub Dawidek wrote: I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. I don't test one your very recent commit yet, but whole previous commits chain case dhclient broken: Starting dhclient. em0: no link .. got link em0: not found exiting. /etc/rc.d/dhclient: WARNING: failed to start dhclient and a bit later in rc Starting dhclient. em0: not found exiting. /etc/rc.d/dhclient: WARNING: failed to start dhclient It should be fixed in r252697. Could you give it a try? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpvt0u0Xs1b4.pgp Description: PGP signature
Re: r253070 and disappearing zpool
On Mon, Jul 22, 2013 at 10:29:40AM +0300, Andriy Gapon wrote: I think that this setup (on ZFS level) is quite untypical, although not impossible on FreeBSD (and perhaps only FreeBSD). It's untypical because you have separate boot pool (where loader, loader.conf and kernel are taken from) and root pool (where / is mounted from). As I said elsewhere, it is pretty typical when full disk encryption is used. The /boot/ has to be unencrypted and can be stored on eg. USB pendrive which is never left unattended, unlike laptop which can be left in eg. a hotel room, but with entire disk encrypted. So, I see three ways of resolving the problem that my changes caused for your configuration. 1. [the easiest] Put zpool.cache loading instructions that used to be in defaults/loader.conf into your loader.conf. This way everything should work as before -- zpool.cache would be loaded from your boot pool. 2. Somehow (I don't want to go into any technical details here) arrange that your root pool has /boot/zfs/zpool.cache that describes your boot pool. This is probably hard given that your /boot is a symlink at the moment. This probably would be easier to achieve if zpool.cache lived in /etc/zfs. 3. [my favorite] Remove an artificial difference between your boot and root pools, so that they are a single root+boot pool (as zfs gods intended). As far as I understand your setup, you use GELI to protect some sensitive data. Apparently your kernel is not sensitive data, so I wonder if your /bin/sh or /sbin/init are really sensitive either. So perhaps you can arrange your unencrypted pool to hold all of the base system (boot + root) and put all your truly sensitive filesystems (like e.g. /home or /var/data or /opt/xyz) onto your encrypted pool. If all you care about is laptop being stolen, then that would work. If you however want to be protected from someone replacing your /sbin/init with something evil then you use encryption or even better integrity verification also supported by GELI. Remember, tools not policies. There is also option number 4 - backing out your commit. When I saw your commit removing those entries from defaults/loader.conf, I thought it is fine, as we now don't require zpool.cache to import the root pool, which was, BTW, very nice and handy improvement. Now that we know it breaks existing installations I'd prefer the commit to be backed out. This is because apart from breaking some existing installations it doesn't gain us anything. So I understand that my change causes a problem for a setup like yours, but I believe that the change is correct. The change is clearly incorrect or incomplete as it breaks existing installations and doesn't allow for full disk encryption configuration on ZFS-only systems. BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's fine by me, although the migration might be tricky. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpG8GeaQjVQd.pgp Description: PGP signature
Re: r253070 and disappearing zpool
On Wed, Jul 24, 2013 at 02:47:11PM +0300, Andriy Gapon wrote: on 22/07/2013 23:38 Pawel Jakub Dawidek said the following: The /boot/ has to be unencrypted and can be stored on eg. USB pendrive which is never left unattended, unlike laptop which can be left in eg. a hotel room, but with entire disk encrypted. As we discussed elsewhere, there are many options of configuring full disk encryption. Including decisions whether root filesystem should be separate from boot filesystem, choice of filesystem type for boot fs, ways of tying various pieces together, and many more. I do not believe that my change is incompatible with full disk encryption in general. Maybe you can imagine many ways of configuring it, but definiately the most typical one is to have separate /boot/ from /, where /boot/ is unencrypted and where you use one file system type for both (UFS or ZFS). Let's also recall that the system was not created / configured by any of the existing official or semi-official tools and thus it does not represent any recommended way of setting up such systems. Glen configured it this way, but it doesn't mean that that is the way. Note that there are no official tools to install FreeBSD on ZFS. Is that enough reason to stop supporting it? What Glen did is the recommended way of setting up full disk encryption with ZFS. I'd do it the same way and I'd recommend this configuration to anyone who will (or did) ask me. I think that there are many of ways of changing configuration of that system to make behave as before again. Three I mentioned already. Another is to add rc script to import the boot pool, given that it is a special, designated pool. Yet another is to place zpool.cache onto the root pool and use nullfs (instead of a symlink) to make /boot be from the boot pool but /boot/zfs be from the root pool. Come on... BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's fine by me, although the migration might be tricky. Yes, that's migration that's scary to me too. Now, about the postponed points. I will reproduce a section from my email that you've snipped. P.S. ZFS/FreeBSD boot process is extremely flexible. For example zfsboot can take zfsloader from pool1/fsA, zfsloader can boot kernel from pool2/fsB and kernel can mount / from pool3/fsC. Of these 3 filesystems from where should zpool.cache be taken? My firm opinion is that it should be taken from / (pool3/fsC in the example above). Because it is the root filesystem that defines what a system is going to do ultimately: what daemons are started, with what configurations, etc. And thus it should also determine what pools to auto-import. We can say that zpool.cache is analogous to /etc/fstab in this respect. So do you or do you not agree with my reasoning about from where zpool.cache should be taken? If you do not, then please explain why. If you do, then please explain how this would be compatible with the old way of loading zpool.cache. I don't have a strong opinion about this. As I said above I'm fine with moving zpool.cache to /etc/zfs/ if we can ensure it won't break existing installations. Still I'm not sure this was your initial goal, because you weren't aware of systems with separate boot pool until recently (if you were aware of this I hope you wouldn't commit the change without prior discussion). Which means in your eyes zpool.cache was always part of the root pool, because /boot/ was. I think that ensuring that zpool.cache is always loaded from a root filesystem is the gain from my change. Were people complaining about zpool.cache being loaded from /boot/zfs/ and not from /etc/zfs/? I don't think so. But people do complain about boot pool not being autoimported. In my opinion for the end user it doesn't really matter if it is /etc/zfs/zpool.cache or /boot/zfs/zpool.cache, as both directories are available once the system is booted. For most people those two directories are placed on the same file system. For some people who actually care if this is /etc/zfs/ or /boot/zfs/, because those are separate file systems the latter works, the former doesn't. In my opinion the gain, if any, is only theoretical. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpF_b3WkFXBB.pgp Description: PGP signature
Re: diskid documentation
On Mon, Jun 02, 2014 at 11:01:08AM -0700, John-Mark Gurney wrote: Michael W. Lucas wrote this message on Mon, Jun 02, 2014 at 11:36 -0400: On Mon, Jun 02, 2014 at 10:45:52AM -0400, Ryan Stone wrote: On Mon, Jun 2, 2014 at 9:26 AM, Allan Jude allanj...@freebsd.org wrote: It also tends to sometimes hide the gpt label provider on me (not sure in which cases it does this, but it is annoying) This happens when something (e.g. zfs) happens to open the diskid provider instead of the gpt label. For me this ended up being a bit more than annoying; my swap was mounted in /etc/fstab via a gpt label so I silently lost my swap when I did an upgrade. Wait-- one type of one label can hide another? I thought a big point of labels was to remove ambiguity... Surprisingly, yes... I didn't think about this, but it's true... A disk will get exported via two different devices, diskid and normal da/ada... The tasting will go through and create all the necessary sub devices, but the problem is that we now have two different paths, and if something opens the diskid path, then the da/ada paths all disappear... This sounds like we need to fix geom to bind the two together so that when one opens, the other doesn't disappear... The problem is that geom views them as two separate disks when in fact they are the same... someone who knows geom well should think about how to solve this problem, as diskid isn't the first time this has happened, just most prevalent w/ ZFS and diskid. The problem is that GPT labels (or GPT IDs for that matter) should not be implemented within GLABEL. This is wrong. It should be implemented as part of GPART, so that GPART would create ada0p1, gpt/label and gptid/whatever. Opening one of those should not make the others disappear then. Only opening ada0 for writting would make them disappear. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp8HNxiuiLpJ.pgp Description: PGP signature
Re: diskid documentation
On Mon, Jun 02, 2014 at 03:27:06PM -0700, John-Mark Gurney wrote: Pawel Jakub Dawidek wrote this message on Mon, Jun 02, 2014 at 22:26 +0200: The problem is that GPT labels (or GPT IDs for that matter) should not be implemented within GLABEL. This is wrong. It should be implemented as part of GPART, so that GPART would create ada0p1, gpt/label and gptid/whatever. Opening one of those should not make the others disappear then. Only opening ada0 for writting would make them disappear. even gpart would be wrong IMO... What happens if there is another provider like GPART, but different, do they need to implement diskid creation too to prevent the same issue? Shouldn't geom be updated to say, this ident is an alias, everything you do w/ this, it's exactly the same as the other one? This would also have the advantage of possibly removing one layer in the call chain when dealing w/ IO. (or does GEOM has a pass-through flag that says, I don't do anything, just skip me?) As for disk IDs it definitely shouldn't be implemented in GPART or GLABEL. IMHO the right place is the DISK class - both ada0 and diskid-of-ada0 should exist on the same rights (two providers of one geom). This also would address your concern about additional layer. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpUBPO31HwbW.pgp Description: PGP signature
Re: taskqueue_create() name parameter lieftime
On Tue, Nov 16, 2010 at 08:27:11AM -0500, John Baldwin wrote: On Tuesday, November 16, 2010 7:20:47 am Andriy Gapon wrote: taskqueue_create() documentation never explicitly says this, but current taskqueue_create() implementation just stores a 'name' pointer parameter internally. Thus it depends on the 'name' having a life time encompassing that of the taskqueue. I think that alternatively we could have copied the name (or a portion of it) into an internal buffer. I don't any argument for either approach, just curious which one looks more preferable from general (FreeBSD, kernel) programming practices point of view. Hmm, in many other places we store a separate copy (e.g. all the interrupt code uses separate MAXCOMLEN char arrays to hold names). If that is easy to do, that is probably the best approach. The most friendly API would keep the name internally, but would also allow me to provide name in printf-like format, so I don't have to use sprint()/snprintf() before calling it. This unfortunatelly will change taskqueue API as name is the first argument, which makes it not worth the pain. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp3yVgaHDkwq.pgp Description: PGP signature
Next ZFSv28 patchset ready for testing.
Hi. The new patchset is ready for testing: http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 When applying the patch be sure to use correct options for patch(1)!: # cd /usr/src # fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 # bzip2 -d zfs_20101212.patch.bz2 # patch -E -p0 zfs_20101212.patch The patch is against FreeBSD HEAD as of 2010-12-12. Some of the changes since the last patchset (zfs_20100831.patch): - Boot support for ZFS v28 (only RAIDZ3 is not yet supported). - Various fixes for the existing ZFS boot code. - Support for sendfile(2) (by avg@). - Userland-kernel compatibility with v13-v15 (by mm@). - ACL fixes (by trasz@). - Various bug fixes. Please test, test, test. Chances are this is the last patchset before v28 going to HEAD (finally). Especially test new changes, like boot support and sendfile(2) support. Also be sure to verify if you can import for existing ZFS pools (v13-v15) when running v28 or boot from your existing pools. Enjoy! PS. Martin (mm@) will be providing patch against 8-STABLE soon. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgptzjMdmsjno.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote: Hi. The new patchset is ready for testing: http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 When applying the patch be sure to use correct options for patch(1)!: # cd /usr/src # fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 # bzip2 -d zfs_20101212.patch.bz2 # patch -E -p0 zfs_20101212.patch [...] If patch(1) reports reject of sys/cddl/compat/opensolaris/sys/sysmacros.h file or you see the following error while compiling world: /usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:249: undefined reference to `MIN' strtab.o(.text+0x28d): In function `strtab_insert': /usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:119: undefined reference to `MIN' strtab.o(.text+0x3a1):/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:145: undefined reference to `MIN' *** Error code 1 Simple remove sys/cddl/compat/opensolaris/sys/sysmacros.h file from the tree. Unfortunately the patch can either works on source downloaded via cvsup or on the source downloaded via subversion as those two have different $FreeBSD$ id strings (at least in case of this file). The patch is generated based on subversion source, so if you use cvsup, you most likely will see the reject and the error. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp46myIfopSX.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote: Hi. The new patchset is ready for testing: http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 You can also download the whole source tree already patched from here: http://people.freebsd.org/~pjd/zfs_20101212.tbz -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpJ41aQDwAYd.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Mon, Dec 13, 2010 at 11:00:31PM -, Steven Hartland wrote: What's the expected behaviour for the sendfile changes as sendfile is one of the problems we have here with the double memory allocation required for it under ZFS compared to UFS. Does this patch address that? No. The patch doesn't address that. It only adds support for sendfile(2), as it was commented out in the previous patchset. Inspecting the patch the following segment looks odd:- --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c ... while (n 0) { nbytes = MIN(n, zfs_read_chunk_size - P2PHASE(uio-uio_loffset, zfs_read_chunk_size)); +#ifdef __FreeBSD__ + if (uio-uio_segflg == UIO_NOCOPY) + error = mappedread_sf(vp, nbytes, uio); + else +#endif /* __FreeBSD__ */ if (vn_has_cached_data(vp)) error = mappedread(vp, nbytes, uio); else Is there an extra else in there which will break things or should the __FreeBSD__ mappedread_sf block replace the standard mappedread call or is the indentation just a bit weird? The code is correct. It is just hard to split 'else' and 'if' with a '#endif' and keep the indentation pretty. Depends on the conditions we use one of the three methods to read the data. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpSKGrAP0AYX.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Tue, Dec 14, 2010 at 03:20:05PM +0100, Olivier Smedts wrote: make installworld That's what I wanted to do, and why I rebooted single-user on the new kernel. But isn't the v13-v15 userland supposed to work with the v28 kernel ? Yes, it is suppose to work. Exactly to be able to follow FreeBSD common upgrade path. Martin was working on this (CCed). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpCsgsK8Mp9u.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Wed, Dec 15, 2010 at 10:15:00PM -0500, ben wilber wrote: On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote: Hi. The new patchset is ready for testing: Running fine for 24 hours now under load with a ~50 disk v15 (not upgraded) pool from -CURRENT. Thanks! Only strange thing is the rc script complains: /etc/rc: DEBUG: run_rc_command: doit: zvol_start unrecognized command 'volinit' usage: zfs command args ... Did you run mergemaster(8) after the upgrade? The patch includes change to etc/rc.d/zvol to remove 'zfs volinit'/'zfs volfini' which are no longer available. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp7c4gzudIbP.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Fri, Dec 17, 2010 at 12:54:36AM +0300, Rechistov Grigory (Речистов Григорий) wrote: I started to check the new ZFS version inside a VirtualBox machine. So far it works for me without crashes, but I got some observations worth mentioning. Here are the steps I made: 1. Installed 8.1-RELEASE (from minimal install CD) 2. Csup'ped sources to CURRENT (as of 14/12/2010) [note that I haven't used SVN repository] 3. Applied the patch in question. 4. Created a zpool raidz of two disks of old version 15. Also some usual tuning of ZFS in loader.conf was done as I am running 32 bit version with low amount of memory. zfs_enable=YES in rc.conf was added too. 4.1 Moved /usr/ports to ZFS to have some files on it. 5. Make buildworld, buildkernel, installkernel, installworld - all the canonical steps from the Handbook. 6. After reboot to final 9.0-CURRENT world I got a dmesg with some trace stack related to ZFS and also a rc.d script message about unrecognized command 'volinit' (see the text of it in attachment). This one is because mergemaster(8) skips files with the same $FreeBSD$ value, so you need to copy /usr/src/etc/rc.d/zvol to /etc/rc.d/ by hand. 7. Nevertheless the system booted. Files 8. `zpool upgrade -a` worked all right and reported that now I have ZFS version 28 Overall I am pleasantly surprised how streamlined the whole process was. That's good to hear, thanks. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp5O7SANNIX6.pgp Description: PGP signature
Re: Next ZFSv28 patchset ready for testing.
On Wed, Dec 15, 2010 at 10:15:40AM +0200, Andrei Kolu wrote: 2010/12/14 Pawel Jakub Dawidek p...@freebsd.org On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote: Hi. The new patchset is ready for testing: http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2 You can also download the whole source tree already patched from here: http://people.freebsd.org/~pjd/zfs_20101212.tbz # uname -a FreeBSD freebsd9.raidon.eu 9.0-CURRENT FreeBSD 9.0-CURRENT #0: Tue Dec 14 14:37:01 EET 2010 r...@freebsd9.raidon.eu:/usr/obj/usr/src/sys/GENERIC amd64 Create files filled with zeroes: # mkfile 512m disk1 disk2 disk3 disk4 # zpool create andmed raidz /home/antik/disk{1,2,3,4} # zpool status andmed pool: andmed state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM andmed ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /home/antik/disk1 ONLINE 0 0 0 /home/antik/disk2 ONLINE 0 0 0 /home/antik/disk3 ONLINE 0 0 0 /home/antik/disk4 ONLINE 0 0 0 errors: No known data errors Now let's try to scrub: # zpool scrub andmed Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x1fb8007b fault code = supervisor read data, page not present instruction pointer = 0x20:0x812967d2 stack pointer = 0x20:0xff80ee605548 frame pointer = 0x28:0xff80ee605730 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2081 (initial thread) [ thread pid 2081 tid 100121 ] Stopped at vdev_file_open+0x92: testb $0x20,0x7b(%rax) Could you verify if this patch fixes the problem for you? http://people.freebsd.org/~pjd/patches/vdev_file.c.2.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgplp1JmNuuvJ.pgp Description: PGP signature
Re: My ZFS v28 Testing Experience
On Wed, Jan 12, 2011 at 11:03:19PM -0400, Chris Forgeron wrote: I've been testing out the v28 patch code for a month now, and I've yet to report any real issues other than what is mentioned below. I'll detail some of the things I've tested, hopefully the stability of v28 in FreeBSD will convince others to give it a try so the final release of v28 will be as solid as possible. I've been using FreeBSD 9.0-CURRENT as of Dec 12th, and 8.2PRE as of Dec 16th What's worked well: - I've made and destroyed small raidz's (3-5 disks), large 26 disk raid-10's, and a large 20 disk raid-50. - I've upgraded from v15, zfs 4, no issues on the different arrays noted above - I've confirmed that a v15 or v28 pool will import into Solaris 11 Express, and vice versa, with the exception about dual log or cache devices noted below. - I've run many TB of data through the ZFS storage via benchmarks from my VM's connected via NFS, to simple copies inside the same pool, or copies from one pool to another. - I've tested pretty much every compression level, and changing them as I tweak my setup and try to find the best blend. - I've added and subtracted many a log and cache device, some in failed states from hot-removals, and the pools always stayed intact. Thank you very much for all your testing, that's really a valuable contribution. I'll be happy to work with you on tracking down the bottleneck in ZFSv28. Issues: - Import of pools with multiple cache or log devices. (May be a very minor point) A v28 pool created in Solaris 11 Express with 2 or more log devices, or 2 or more cache devices won't import in FreeBSD 9. This also applies to a pool that is created in FreeBSD, is imported in Solaris to have the 2 log devices added there, then exported and attempted to be imported back in FreeBSD. No errors, zpool import just hangs forever. If I reboot into Solaris, import the pool, remove the dual devices, then reboot into FreeBSD, I can then import the pool without issue. A single cache, or log device will import just fine. Unfortunately I deleted my witness-enabled FreeBSD-9 drive, so I can't easily fire it back up to give more debug info. I'm hoping some kind soul will attempt this type of transaction and report more detail to the list. Note - I just decided to try adding 2 cache devices to a raidz pool in FreeBSD, export, and then importing, all without rebooting. That seems to work. BUT - As soon as you try to reboot FreeBSD with this pool staying active, it hangs on boot. Booting into Solaris, removing the 2 cache devices, then booting back into FreeBSD then works. Something is kept in memory between exporting then importing that allows this to work. Unfortunately I'm unable to reproduce this. It works for me with 2 cache and 2 log vdevs. I tried to reboot, etc. My test exactly looks like this: # zpool create tank raidz ada0 ada1 # zpool add tank cache ada0 ada1 # zpool export tank # kldunload zfs # zpool import tank works # reboot works - Speed. (More of an issue, but what do we do?) Wow, it's much slower than Solaris 11 Express for transactions. I do understand that Solaris will have a slight advantage over any port of ZFS. All of my speed tests are made with a kernel without debug, and yes, these are -CURRENT and -PRE releases, but the speed difference is very large. Before we go any further could you please confirm that you commented out this line in sys/modules/zfs/Makefile: CFLAGS+=-DDEBUG=1 This turns all kind of ZFS debugging and slows it down a lot, but for the correctness testing is invaluable. This will be turned off once we import ZFS into FreeBSD-CURRENT. BTW. In my testing Solaris 11 Express is much, much slower than FreeBSD/ZFSv28. And by much I mean two or more times in some tests. I was wondering if they have some debug turned on in Express. At first, I thought it may be more of an issue with the ix0/Intel X520DA2 10Gbe drivers that I'm using, since the bulk of my tests are over NFS (I'm going to use this as a SAN via NFS, so I test in that environment). But - I did a raw cp command from one pool to another of several TB. I executed the same command under FreeBSD as I did under Solaris 11 Express. When executed in FreeBSD, the copy took 36 hours. With a fresh destination pool of the same settings/compression/etc under Solaris, the copy took 7.5 hours. When you turn off compression (because it turns all-zero blocks into holes) you can test it by simply: # dd if=/dev/zero of=/zfs_fs/zero bs=1m -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgprFLLYTe9F4.pgp Description: PGP signature
Re: [head tinderbox] failure on ia64/ia64
On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote: [...] cc -O2 -pipe -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/sbin/hastctl/../hastd/proto_common.c cc1: warnings being treated as errors /src/sbin/hastctl/../hastd/proto_common.c: In function 'proto_common_descriptor_send': /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases required alignment of target type /src/sbin/hastctl/../hastd/proto_common.c: In function 'proto_common_descriptor_recv': /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases required alignment of target type /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases required alignment of target type *** Error code 1 Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with high WARNS? With WARNS=6 I get those errors and I've no idea how to fix it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed, but maybe I'm wrong? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgphFUx7Q3q4K.pgp Description: PGP signature
Re: [head tinderbox] failure on ia64/ia64
On Mon, Jan 31, 2011 at 04:56:06PM -0800, Marcel Moolenaar wrote: On Jan 31, 2011, at 3:51 PM, Pawel Jakub Dawidek wrote: On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote: [...] cc -O2 -pipe -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/sbin/hastctl/../hastd/proto_common.c cc1: warnings being treated as errors /src/sbin/hastctl/../hastd/proto_common.c: In function 'proto_common_descriptor_send': /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases required alignment of target type /src/sbin/hastctl/../hastd/proto_common.c: In function 'proto_common_descriptor_recv': /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases required alignment of target type /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases required alignment of target type *** Error code 1 Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with high WARNS? With WARNS=6 I get those errors and I've no idea how to fix it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed, but maybe I'm wrong? this warning indicates that you're casting from a pointer to type P (P having alignment constraints Ap) to a pointer to type Q (Q having alignment constraints Aq), and Aq Ap. The compiler tells you that you may end up with misaligned accesses. If you know that the pointer satisfies Aq, you can cast through (void *) to silence the compiler. If you cannot guarantee that, you have a bigger problem. Solutions include packing type Q to reduce Aq or to copy the data to a local variable. Take the statement at line 116 for example: *((int *)CMSG_DATA(cmsg)) = fd; We're effectively casting from a (char *) to a (int *) and then doing a 32-bit access (write). The easy fix (casting through (void *) is not possible, because you cannot guarantee that the address is properly aligned. cmsg points to memory set aside by the following local variable: unsigned char ctrl[CMSG_SPACE(sizeof(fd))]; There's no guarantee that the compiler will align the character array at a 32-bit boundary (though in practice it seems to be). I have seen this kind of construct fail on ARM and PowerPC for example. In any case: The safest approach here is to use le32enc or be32enc rather than casting through (void *). Obviously these function encode using a fixed byte order when the original code is using the native byte order of the CPU. Having native encoding functions help. You could use bcopy as well, but the compiler is typically too smart for its own good and it will try to optimize the call away. This leaves you with the same misaligned access that you tried to avooid by using bcopy(). You need to trick the compiler so that it won't optimize the bcopy away, like: bcopy((void *)fd, CMSG_DATA(cmsg), sizeof(fd)); Interesting. I did use bcopy() to silence the warning, but the need to cast to (void *) is surprising. Still, I'm more concerned with CMSG_NXTHDR() macro, which from what I see might not be fixed by casting arguments. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpEWqZfPoVvr.pgp Description: PGP signature
Re: Replacing a failed disk in raidz2 zfs (and gpt)
On Thu, Feb 03, 2011 at 06:11:34AM +, Philip M. Gollucci wrote: All, I have a zroot(mirror)+zmysql(raidz2) setup on a MySQL db box. One drive failed (mfid3). We've since replaced it. I can't for the life of me get zpool to replace it. I can't remember why I used gpt instead of direct disks for the zmysql pool (but thats how it is). I've tried all of the following commands with different errors, and I must say I'm stumped. I've done this several times before for the ASF (but no gpt at play there). $ zpool scrub zmysql just runs, and completes, no error $ zpool replace zmysql gpt/disk3 cannot replace gpt/disk3 with gpt/disk3: one or more devices is currently unavailable [...] $ zpool offline zmysql gpt/disk3 cannot offline gpt/disk3: no valid replicas I'm afraid this is ZFS bug that is fixed in v28 for sure, not sure about v14/v15. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpvKfbSsGxHk.pgp Description: PGP signature
Re: Replacing a failed disk in raidz2 zfs (and gpt)
On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote: Do you have a bug ID ? I think it is 6328632. Change 5a60f16123ba. Note, there are many, many other unrelated changes. Do you have any work arounds? From what I can see, this change is in HEAD already, so I'll try that. Will a reboot help ? No idea, sorry. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpEXAC6VatmN.pgp Description: PGP signature
Re: Replacing a failed disk in raidz2 zfs (and gpt)
On Thu, Feb 03, 2011 at 08:08:15PM +, Philip M. Gollucci wrote: On 02/03/11 20:02, Pawel Jakub Dawidek wrote: On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote: Do you have a bug ID ? I think it is 6328632. Change 5a60f16123ba. Note, there are many, many other unrelated changes. Do you have any work arounds? From what I can see, this change is in HEAD already, so I'll try that. Do you have a pointer to how to get the hg repo handy. There's no diff there. The repo is still online: ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate But if you are thinking about extracting only part of the change responsible for your problem that might not be easy. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpmkyX9M3bLW.pgp Description: PGP signature
Re: [PATCH] OpenSolaris/ZFS: C++ compatibility
On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote: The attached patch is sufficient to allow a C++ program to use libzfs. The motivation for these changes is work I'm doing on a ZFS fault handling daemon that is written in C++. SpectraLogic's intention is to return this work to the FreeBSD project once it is a bit more complete. Since these changes modify files that come from OpenSolaris, I want to be sure I understand the project's policies regarding divergence from the vendor before I check them in. All of the changes save one should be trivial to merge with vendor changes and I will do that work for the v28 import. Is there any reason I should not commit these changes? Now that OpenSolaris is dead we don't have to be so strict with keeping the diff against vendor small at all cost. I'd prefer not to modify vendor code whenever possible so it is easier for us to cooperate with IllumOS (we already took ome code from them). Me and my company are also interested in fault management daemon (although not restricted to ZFS, but a more general purpose mechanism like FMA in Solaris). My question would be are there any chances you may be convinced to use plain C? With C we might be able to help, but not with C++. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgphkmODt5wu8.pgp Description: PGP signature
Re: [PATCH] OpenSolaris/ZFS: C++ compatibility
On Sat, Feb 05, 2011 at 02:36:40PM -0700, Justin T. Gibbs wrote: On 2/5/2011 8:39 AM, Pawel Jakub Dawidek wrote: On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote: The attached patch is sufficient to allow a C++ program to use libzfs. The motivation for these changes is work I'm doing on a ZFS fault handling daemon that is written in C++. SpectraLogic's intention is to return this work to the FreeBSD project once it is a bit more complete. Since these changes modify files that come from OpenSolaris, I want to be sure I understand the project's policies regarding divergence from the vendor before I check them in. All of the changes save one should be trivial to merge with vendor changes and I will do that work for the v28 import. Is there any reason I should not commit these changes? Now that OpenSolaris is dead we don't have to be so strict with keeping the diff against vendor small at all cost. I'd prefer not to modify vendor code whenever possible so it is easier for us to cooperate with IllumOS (we already took ome code from them). Perhaps IllumOS will accept these changes back? As I mentioned in the change descriptions included with the patch, the header files already show the intention of providing C++ support (extern C blocks), they just don't quite deliver. The changes shouldn't be controversial. Sure. To be clear: I'm not against those changes, I think they are worth it. And getting IllumOS to accept them back is definitely a good idea. Me and my company are also interested in fault management daemon (although not restricted to ZFS, but a more general purpose mechanism like FMA in Solaris). We have talked internally about this at Spectra too. Since we don't have BSD licensed nvpair code, we've thought of using Google protocol buffers to allow extensible encoding of fault data. The GP implementation is MIT licensed and looks like it might be less cumbersome to use than nvpairs. For the first release of our product, however, we are just making due with the string data that devctl provides. I've developed similar API during HAST work, maybe it is a good starting point? src/sbin/hastd/nv.{c,h}. My question would be are there any chances you may be convinced to use plain C? With C we might be able to help, but not with C++. The core FMA support needs to be reasonably accessible from C code of course (fully functional and not cumbersome to use). But we should allow FMA agents to be coded in whatever language is convenient to the developer. The project may only be able to accept agents in C (and I'm voting for C++ too) into it's distribution, but that policy should not drive us to make the FMA architecture hard to access from shell, python, ruby, or some other language. Yes, agents should not be limited to one language. I wouldn't be surprised is the majority of agents will be shell scripts. The reason I chose C++ for this task is that devd, the source of the events I process, already requires C++ so using C++ in zfsd doesn't impose any new requirements on the system. Zfsd, like even the C kernel of FreeBSD is coded in an object oriented fashion, but its much cleaner to implement this type of design in a language that inherently supports object oriented concepts. Could I rewrite all that I have in C? Sure, but there would have to be some compelling reasons to offset the reduction in clarity and maintainability such a change would cause. Hmm, so zfsd will receive events from devd? I'm in opinion that we should let devd alone. In my initial port I used devd, because it was closest match, but if we want to clean it up, we shouldn't go through devd. For example ZFS v28 can report whole binary blocks where checksum doesn't match and passing those through devd would be cumbersome. Is your inability to help on a C++ version of this code due to distaste for C++ or just a lack of experience with it? The latter. I'm sure there are many committers that are fluent in C++, but all of them know C. I was under impression that Warner implemented devd in C++ also as a kind of experiment, which nobody really followed. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpQQMrZ5Hwdv.pgp Description: PGP signature
HEADS UP: ZFSv28 is in!
Hi. I just committed ZFSv28 to HEAD. New major features: - Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode. PS. If you like my work, you help me to promote yomoli.com:) http://yomoli.com http://www.facebook.com/pages/Yomolicom/178311095544155 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpGTPfcT34QE.pgp Description: PGP signature
Re: HEADS UP: ZFSv28 is in!
On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote: I'm so excited for your work. Thanks so much for bringing zpool v28 to FreeBSD. Will v28 come to 8-stable? Yes, hopefully in 1-2 month(s). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp1UOEA9rzOR.pgp Description: PGP signature
Re: HEADS UP: ZFSv28 is in!
On Mon, Feb 28, 2011 at 10:37:25AM +, krad wrote: On 28 February 2011 08:47, Pawel Jakub Dawidek p...@freebsd.org wrote: On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote: I'm so excited for your work. Thanks so much for bringing zpool v28 to FreeBSD. Will v28 come to 8-stable? Yes, hopefully in 1-2 month(s). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com ive never managed to be able to boot off my 4k aligned pool (ashift=12) on stable, does the import to head provide all the patches for this or is it a case of using the latest zfs v28 patch set for stable? I have no dying need for v28 yet, it just want to be able to boot onto the 4k drive and tidy things up. Support for this is included in what I committed to HEAD. Even HEAD couldn't boot off of pools with ashift != 9 until now. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpoBcg2ska7K.pgp Description: PGP signature
Re: HEADS UP: ZFSv28 is in!
On Mon, Feb 28, 2011 at 08:34:08AM +0100, Martin Sugioarto wrote: PS. If you like my work, you help me to promote yomoli.com:) http://yomoli.com http://www.facebook.com/pages/Yomolicom/178311095544155 I would like, but you should at least tell me what it is (what will be sold there). I don't like to advertise things I don't know or even things that seem evil to me. I'll post your answer to a well-known German *BSD forum, if you want. Well, I didn't want to say too much about it here, as it isn't really related to FreeBSD. This is a startup I'm working on which is location-based chat, which allows users to communicate with their neighborhood. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpe1gJOLMeSe.pgp Description: PGP signature
Re: [head tinderbox] failure on ia64/ia64
On Mon, Mar 07, 2011 at 01:06:11AM +, FreeBSD Tinderbox wrote: TB --- 2011-03-07 00:25:55 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2011-03-07 00:25:55 - starting HEAD tinderbox run for ia64/ia64 TB --- 2011-03-07 00:25:55 - cleaning the object tree TB --- 2011-03-07 00:26:06 - cvsupping the source tree TB --- 2011-03-07 00:26:06 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/HEAD/ia64/ia64/supfile TB --- 2011-03-07 00:26:19 - building world TB --- 2011-03-07 00:26:19 - MAKEOBJDIRPREFIX=/obj TB --- 2011-03-07 00:26:19 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2011-03-07 00:26:19 - TARGET=ia64 TB --- 2011-03-07 00:26:19 - TARGET_ARCH=ia64 TB --- 2011-03-07 00:26:19 - TZ=UTC TB --- 2011-03-07 00:26:19 - __MAKE_CONF=/dev/null TB --- 2011-03-07 00:26:19 - cd /src TB --- 2011-03-07 00:26:19 - /usr/bin/make -B buildworld World build started on Mon Mar 7 00:26:20 UTC 2011 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies [...] mkdep -f .depend -a /src/sbin/growfs/growfs.c echo growfs: /obj/ia64.ia64/src/tmp/usr/lib/libc.a .depend === sbin/gvinum (depend) rm -f .depend mkdep -f .depend -a-I/src/sbin/gvinum/../../sys /src/sbin/gvinum/gvinum.c /src/sbin/gvinum/../../sys/geom/vinum/geom_vinum_share.c echo gvinum: /obj/ia64.ia64/src/tmp/usr/lib/libc.a /obj/ia64.ia64/src/tmp/usr/lib/libreadline.a /obj/ia64.ia64/src/tmp/usr/lib/libtermcap.a /obj/ia64.ia64/src/tmp/usr/lib/libdevstat.a /obj/ia64.ia64/src/tmp/usr/lib/libkvm.a /obj/ia64.ia64/src/tmp/usr/lib/libgeom.a .depend === sbin/hastctl (depend) make: don't know how to make hast_compression.c. Stop *** Error code 2 Interesting race. hast_compression.c was added in the same commit it was added to hastctl Makefile. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpFDhVqWe1wK.pgp Description: PGP signature
Re: missing files in readdir(3) on NFS export of ZFS volume (since v28?)
On Mon, Mar 07, 2011 at 01:08:46AM +0100, Pierre Beyssac wrote: Hello, I'm running a 9-current server as compiled on Sat Mar 5 02:17:14 CET 2011. Since I upgraded to ZFS v28 I noticed missing files from NFS. The files are still accessible through NFS but they don't show up on a readdir(3). [...] Could you try r219404? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpeiqDGOvkQL.pgp Description: PGP signature
Re: Any success stories for HAST + ZFS?
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 patches, and 9-CURRENT (after the ZFSv28 commit). Things work well until I start hastd. Then either the system locks up, or hastd causes a kernel panic, or hastd dumps core. The minimum amount of information (as always) would be backtrace from the kernel and also hastd backtrace when it coredumps. There is really decent logging in hast, so I'm also sure it does log something interesting on primary or secondary. Another useful thing would be to turn on debugging in hast (single -d option for hastd). The best you can do is to give me the simplest and quickest procedure to reproduce the issue, eg. configure two hast resources, put ZFS mirror on top, start rsync /usr/src to the file system on top of hast and switch roles. The simpler the better. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpYcvgL105vI.pgp Description: PGP signature
Re: Any success stories for HAST + ZFS?
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: [Not sure which list is most appropriate since it's using HAST + ZFS on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on replies.] I'm having a hell of a time making this work on real hardware, and am not ruling out hardware issues as yet, but wanted to get some reassurance that someone out there is using this combination (FreeBSD + HAST + ZFS) successfully, without kernel panics, without core dumps, without deadlocks, without issues, etc. I need to know I'm not chasing a dead rabbit. I just committed a fix for a problem that might look like a deadlock. With trociny@ patch and my last fix (to GEOM GATE and hastd) do you still have any issues? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpfaqPYEbyOO.pgp Description: PGP signature
Re: panic: g_eli_key_hold: sc_ekeys_total=1
On Fri, Apr 22, 2011 at 05:04:01PM +0200, Fabian Keil wrote: With sources from today my system panics at boot time after attaching the swap device: GEOM_ELI: Device ada0s1b.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: software panic: g_eli_key_hold: sc_ekeys_total=1 cpuid = 0 KDB: enter: panic Uptime: 2m16s Physical memory: 1974 MB Dumping 213 MB: 198 182 166 150 134 118 102 86 70 54 38 22 6 [...] Could you provide the output of: # diskinfo -v /dev/ada0s1b And could you try: # /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp1PdPS9g7QC.pgp Description: PGP signature
Re: panic: g_eli_key_hold: sc_ekeys_total=1
On Sun, Apr 24, 2011 at 11:12:03AM +0200, Fabian Keil wrote: The panic can be reproduced with: /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b That's why I asked for ada0s1b size. It should be fixed in HEAD (r220984). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpAuN5zvAX8k.pgp Description: PGP signature
Re: geli on r221012
On Mon, Apr 25, 2011 at 01:31:55PM +, Anton Yuzhaninov wrote: Geli no longer works for me after upgrade to r221012. # geli attach -k ~citrin/private.key /dev/label/spool2 Enter passphrase: # from dmesg: GEOM_ELI: Device label/spool2.eli created. GEOM_ELI: Encryption: Blowfish-CBC 128 GEOM_ELI: Integrity: HMAC/MD5 GEOM_ELI: Crypto: software # dd if=/dev/label/spool2.eli of=/dev/null dd: /dev/label/spool2.eli: Invalid argument 0+0 records in 0+0 records out 0 bytes transferred in 0.000669 secs (0 bytes/sec) Thanks for the report! It should be fixed in r221628. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpUlmhHjBPXE.pgp Description: PGP signature
Re: Randomization in hastd(8) synchronization thread
On Tue, May 17, 2011 at 12:39:19PM -0700, Maxim Sobolev wrote: Hi Pawel, I am trying to use hastd(8) over slow links and one problem is apparent right now - current approach with synchronizing content sequentially is not working in this case. What happens is that hastd hits the first frequently updated block and cannot make any progress anymore. In my case I have 30GB of dirty space to be synchronized over just 1mbps uplink. The quick fix that I've applied is randomization in the block selection code. This way eventually all least used blocks will be synchronized, leaving only hot ones dirty. More effective approach would be to use some kind of LRU selection algorithm, but statistical approach would work just as good in this case. Please review the patch below: http://sobomax.sippysoft.com/activemap.c.diff Hmm, hastd keeps separate bitmap for synchronization. It is stored in am_syncmap field. Blocks that are dirtied during regular writes should not effect on synchronization bitmap and synchronization progress. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp9xz8wcUwuQ.pgp Description: PGP signature
LOR (ffs_snapshot.c:651 vm_map.c:2258).
Hello. lock order reversal 1st 0xc66a6db0 vnode interlock (vnode interlock) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:651 2nd 0xc0c2f110 system map (system map) @ /usr/src/sys/vm/vm_map.c:2258 Stack backtrace: backtrace(c05bbfcb,c0c2f110,c05c650b,c05c650b,c05c6581) at backtrace+0x17 witness_lock(c0c2f110,8,c05c6581,8d2,c0c2f0b0) at witness_lock+0x686 _mtx_lock_flags(c0c2f110,0,c05c6581,8d2,c6aee000) at _mtx_lock_flags+0xb5 _vm_map_lock(c0c2f0b0,c05c6581,8d2,c69e61b0,0) at _vm_map_lock+0x36 vm_map_remove(c0c2f0b0,c6aee000,c6af,e1b1a7f0,c0555f99) at vm_map_remove+0x30 kmem_free(c0c2f0b0,c6aee000,2000,e1b1a80c,c05579f9) at kmem_free+0x32 page_free(c6aee000,2000,22,c060c4b8,c05e9100) at page_free+0x3a uma_large_free(c69e61b0,e1b1a83c,c0487f64,c66a6db0,2000) at uma_large_free+0xf9 free(c6aee000,c05e9100,c05c3358,28b,c25aff00) at free+0xe9 ffs_snapshot(c6522600,80c39a0,70,c04b5d36,c060d3e0) at ffs_snapshot+0x23f4 ffs_mount(c6522600,c69c4380,bfbffcc0,e1b1abf0,c6496720) at ffs_mount+0x617 vfs_mount(c6496720,c258ecd0,c69c4380,1211000,bfbffcc0) at vfs_mount+0x7d1 mount(c6496720,e1b1ad14,c05cd44e,3ee,4) at mount+0xba syscall(2f,2f,2f,0,bfbffdc0) at syscall+0x28f Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (21), eip = 0x80557bb, esp = 0xbfbffb6c, ebp = 0xbfbffd48 --- -- Pawel Jakub Dawidek [EMAIL PROTECTED] UNIX Systems Programmer/Administrator http://garage.freebsd.pl Am I Evil? Yes, I Am! http://cerber.sourceforge.net pgp0.pgp Description: PGP signature
Panic after mount() fail.
Hello. There is a problem with mount(2) failures. It can cause panics. How-to-repeat. # dd if=/dev/random of=/test.img bs=1m count=8 # mdconfig -a -t vnode -f /test.img -u 25 # mkdir -p /mnt/test # mount /dev/md25 /mnt/test (fail) # mount /dev/md25 /mnt/test (panic Memory modified after free ...) This is because on failure mutex is not destroyed. Patch: --- vfs_mount.c.origSun Nov 16 15:46:56 2003 +++ vfs_mount.c Sun Nov 16 15:21:48 2003 @@ -1061,6 +1061,7 @@ update: vfs_unbusy(mp, td); else { mp-mnt_vfc-vfc_refcount--; + mtx_destroy(mp-mnt_mtx); vfs_unbusy(mp, td); #ifdef MAC mac_destroy_mount(mp); @@ -1142,6 +1143,7 @@ update: vp-v_iflag = ~VI_MOUNT; VI_UNLOCK(vp); mp-mnt_vfc-vfc_refcount--; + mtx_destroy(mp-mnt_mtx); vfs_unbusy(mp, td); #ifdef MAC mac_destroy_mount(mp); -- Pawel Jakub Dawidek [EMAIL PROTECTED] UNIX Systems Programmer/Administrator http://garage.freebsd.pl Am I Evil? Yes, I Am! http://cerber.sourceforge.net pgp0.pgp Description: PGP signature
Re: panic: sleeping without a mutex (acd related)
On Tue, Nov 25, 2003 at 11:21:03AM +0100, Christian Laursen wrote: + I have been experiencing some random lockups after upgrading from + 5.1-RELEASE to 5.2-BETA. + + I then wen on and enabled all the debug options in my kernel config + hoping to be able to find the cause. + + But now I cannot boot at all. In the end of the boot process when + detecting ATA drives, I get this: + + ad0: 76319MB ST380011A [155061/16/63] at ata0-master UDMA100 + acd0-5: CDROM with 6 CD changer CD-C68E at ata1-master PIO4 + acd6: DVDROM CREATIVEDVD5240E-1 at ata1-slave PIO4 + panic: sleeping without a mutex + Debugger(panic) + Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 + db + db trace + Debugger(c06e3744,c07549a0,c06e3ec9,d861ab60,100) at Debugger+0x54 + panic(c06e3ec9,0,c06e3eb8,c06d6584,10) at panic+0xd5 + msleep(c45173d8,0,4c,c06d6584,0) at msleep+0x505 + acd_geom_access(c452de00,1,0,0,0) at acd_geom_access+0x115 Yeah. There are two calls of tsleep(9) without timeout set (in line 499, 509), so this KASSERT is reached: KASSERT(timo != 0 || mtx_owned(Giant) || mtx != NULL, (sleeping without a mutex)); -- Pawel Jakub Dawidek [EMAIL PROTECTED] UNIX Systems Programmer/Administrator http://garage.freebsd.pl Am I Evil? Yes, I Am! http://cerber.sourceforge.net pgp0.pgp Description: PGP signature
Panic: if_simloop: attempted use of a free mbuf!
Hello. I'm reaching assertion from /sys/net/if_loop.c:270. This is very easy to reproduce: First you need to put loopback into promiscuous mode: # tcpdump -i lo0 Then try to connect to loopback, for example: # telnet 127.0.0.1 22 Enjoy!:) -- Pawel Jakub Dawidek [EMAIL PROTECTED] UNIX Systems Programmer/Administrator http://garage.freebsd.pl Am I Evil? Yes, I Am! http://cerber.sourceforge.net pgp0.pgp Description: PGP signature
Re: jail and emulators/linux_base
On Wed, Dec 03, 2003 at 10:22:16AM +0100, Niklas Saers Mailinglistaccount wrote: + I'm running CURRENT and set up a jail where I want to install SUN JDK + 1.4.2. In the process, linux emulation needs to be installed. While + installing emulators/linux_base, I get the following: + + === Installing for linux_base-7.1_5 + Un-mounting linprocfs... + umount: retrying using path instead of file system ID + === Generating temporary packing list + === Checking if emulators/linux_base already installed + mknod: /compat/linux/dev/null: Operation not permitted + *** Error code 1 + + While Linux-emulation is already up and running on the host-machine, it + seems the jail is not allowed to create what it needs to run it. I + understand allowing mknod(8) within a jail is dangerous in the case where + you allow untrusted users to be root. Is there some way to either say I + don't let untrusted users be root thus allowing this or to compile + emulators/linux_base more jail-friendly, possibly setting things up from + outside the jail? Erm. You may install it using chroot(8) only and then run jail with the same path. You may also use chroot(8) instead of jail if you're looking for full functionality. -- Pawel Jakub Dawidek [EMAIL PROTECTED] UNIX Systems Programmer/Administrator http://garage.freebsd.pl Am I Evil? Yes, I Am! http://cerber.sourceforge.net pgp0.pgp Description: PGP signature
HAST (Highly Available Storage) now in HEAD.
Hi. Yesterday I committed HAST to the HEAD branch. HAST allows to transparently store data on two physically separated machines connected over the TCP/IP network. HAST works in Primary-Secondary (Master-Backup, Master-Slave) configuration, which means that only one of the cluster nodes can be active at any given time. Only Primary node is able to handle I/O requests to HAST-managed devices. Currently HAST is limited to two cluster nodes in total. HAST operates on block level - it provides disk-like devices in /dev/hast/ directory for use by file systems and/or applications. Working on block level makes it transparent for file systems and applications. There in no difference between using HAST-provided device and raw disk, partition, etc. All of them are just regular GEOM providers in FreeBSD. For more information please consult hastd(8), hastctl(8) and hast.conf(5) manual pages, as well as: http://wiki.FreeBSD.org/HAST On the wiki page above you should find instructions how to initialize hast and integrate it with ucarp. Let me know (using freebsd...@freebsd.org mailing list) if you have and questions or comments. And last, but not least, I'd like to thank sponsorswho made this projects possible: The FreeBSD Foundation, http://www.freebsdfoundation.org OMCnet Internet Service GmbH, http://www.omc.net TransIP BV, http://www.transip.nl -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpXW0Rd7BO2p.pgp Description: PGP signature
Re: ZFS: statfs and recordsize problem
On Thu, Feb 18, 2010 at 03:39:28PM +0300, Alexander Zagrebin wrote: I have noticed, that statfs called for ZFS file systems, returns the value of FS's recordsize property in both f_bsize and f_iosize. It's a problem for some software. For example, squid uses block size of cache's file system to calculate the space occupied by file. So by default it considers that any small file uses 128KB of a cache (when default value of recordsize is used), though really this file may use 512B only. This miscalculation leads to unreasonable cleaning of a cache. IMHO the behavior of statfs have to be changed, as ZFS uses variable (up to recordsize) block sizes. It must return 512 as f_bsize and recordsize as f_iosize. One of possible solutions is the attached patch. Could somebody look it? I committed (slightly modified version of) your patch to HEAD. Thanks! -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp67WCYnRd70.pgp Description: PGP signature
Re: check for jailed environment for adjkerntz
On Mon, Mar 01, 2010 at 02:15:41AM +0300, Subbsd wrote: jail with complete type have standard crontab a file of tasks. However not all standard task are adapted for work in jail an environment. For example adjkerntz which generates adjkerntz [46733]: sysctl (set: machdep.wall_cmos_clock): Operation not permitted I suggest to give adjkerntz concept about jail in which to it it is not necessary to work: [...] I also always was finding that annoying, but only your e-mail made me to think about ways to fix it and that maybe simple patch like the one below will do? --- etc/crontab (wersja 204363) +++ etc/crontab (kopia robocza) @@ -22,4 +22,4 @@ # # Adjust the time zone if the CMOS clock keeps local time, as opposed to # UTC time. See adjkerntz(8) for details. -1,31 0-5 * * * rootadjkerntz -a +1,31 0-5 * * * root[ `sysctl -n security.jail.jailed` -eq 0 ] adjkerntz -a -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpYvDwD944Ze.pgp Description: PGP signature
Re: Increasing MAXPHYS
On Mon, Mar 22, 2010 at 08:23:43AM +, Poul-Henning Kamp wrote: In message 4ba633a0.2090...@icyb.net.ua, Andriy Gapon writes: on 21/03/2010 16:05 Alexander Motin said the following: Ivan Voras wrote: Hmm, it looks like it could be easy to spawn more g_* threads (and, barring specific class behaviour, it has a fair chance of working out of the box) but the incoming queue will need to also be broken up for greater effect. According to notes, looks there is a good chance to obtain races, as some places expect only one up and one down thread. I haven't given any deep thought to this issue, but I remember us discussing them over beer :-) The easiest way to obtain more parallelism, is to divide the mesh into multiple independent meshes. This will do you no good if you have five disks in a RAID-5 config, but if you have two disks each mounted on its own filesystem, you can run a g_up g_down for each of them. A class is suppose to interact with other classes only via GEOM, so I think it should be safe to choose g_up/g_down threads for each class individually, for example: /dev/ad0s1a (DEV) | g_up_0 + g_down_0 | ad0s1a (BSD) | g_up_1 + g_down_1 | ad0s1 (MBR) | g_up_2 + g_down_2 | ad0 (DISK) We could easly calculate g_down thread based on bio_to-geom-class and g_up thread based on bio_from-geom-class, so we know I/O requests for our class are always coming from the same threads. If we could make the same assumption for geoms it would allow for even better distribution. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpFAxWFcI5ds.pgp Description: PGP signature
Re: ZFS behavior when device disappears
On Tue, Apr 13, 2010 at 05:39:30PM -0600, Jason J. W. Williams wrote: Hello, Currently, we're an OpenSolaris shop but with the way things are going over at Oracle/Sun we're starting to evaluate our options for keeping ZFS but moving off Solaris. One of my concerns is that FreeBSD is implementing ZFSv14 (ZFS itself is up to v23 I believe). For quite a long time, ZFS under Solaris had a real problem with the following scenario: * Hard drive starts to die * Controller and SCSI subsystem continue to retry an I/O rather than failing fast * Even if the I/O does fail fast ZFS doesn't really notice a spike in I/O failures and continues to use the drive. * Result: I/O on the zpool stalls completely while the I/Os continue to be tried against the drive. This got fixed in later revs of OpenSolaris by enhancements to ZFS and greater integration with the Fault Management Architecture (FMA) of Solaris...lots of I/Os failing on a drive get communicated to ZFS who then offlines the drive out of the pool. My question is, what is the situation in FreeBSD 8 with ZFS if that type of situation occurs? I believe FreeBSD does whatever OpenSolaris did for this version of ZFS. There is nogoing work to bring v24 to FreeBSD. Basic functionality works already, but a lot work is still needed. At some point I'll see what we can do about it, because we don't have FMA in FreeBSD and we would need to find another way to deal with it. I've limited time I can spend on ZFS right now, so I'm making small steps, but I'm making good progress too. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpVisqFmsp2w.pgp Description: PGP signature
Re: ZFS behavior when device disappears
On Tue, Apr 20, 2010 at 07:24:53AM -0600, Jason J. W. Williams wrote: Hi Pawel, Thank you very much for the response! Please forgive some of my questions, as I'm a bit unfamiliar with the FreeBSD port. What is the nature of the port? Is it something where each new version of ZFS is a from-scratch effort to some degree? Or is it a point where new ZFS versions are a matter of just making the newer features operational? Definitely the latter, but there some problems: - Some changes in OpenSolaris ZFS are very hard to port in short time, and when it takes a lot of time, new versions arrive and it is nice to get them too, etc. which makes whole process to take long time. Good example here is moving some functionality to Python, where we have to decided what to do about that without importing Python to the base system. - OpenSolaris ZFS is experimental and I don't think Solaris version is published anywhere. This means it needs extensive testing on our side, which of course takes time. - OpenSolaris changes are often not easy to understand. They have different commit rules than we have. Commit logs are not very helpful and multiple fixes are committed in one go, which makes it hard to separate individual changes if we just need a fix and not intrusive change that came along. I'm doing my best, but my time is limited. I see more and more people are interested in helping with ZFS, which is a very good sign I was waiting for for a long time:) It is of course still wonderful that we can use ZFS. All my servers and my laptop are running exclusively on ZFS at this point:) -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpM8JNKN6bFd.pgp Description: PGP signature
Re: Switchover to CAM ATA?
On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote: I've read most of this thread. I think this is cool technology. However, before we move forward with this, we need to have a plan for the various issues that have come up. The plan needs to be specific, have owners for key items, warnings about ownerless == obsoleted, and target dates. I think this is one of the cases where we should record the plan of record on a wiki. It worked well for other times we've had big, disruptive changes. My opinion for the path forward: (1) Send a big heads up about the future of ataraid(5). It will be shot in the head soon, to be replaced be a bunch of geom classes for each different container format. At least that seems to be the rough consensus I've seen so far. We need worker bees to do many of these classes, although much can be mined from the ataraid code today. This shouldn't be a bunch of GEOM classes. This should one class which recognize multiple formats, just like the LABEL class. I don't think it is feasible to reuse gmirror for that, it wasn't designed in something like this in mind. (2) Send another big heads up strongly recommending people go to glabel based fstabs. Maybe the right option here is to provide a simple script walk people through the conversion. This will render the carnage of ad - ada (or da) a mostly non-event, and also protect people from 'oops' of rebooting with that thumb drive in the system. (3) Create a wiki to record all the new geom classes needed. Find people to own each one, or note it is unowned, and support will be dropped if no owner can be found. (4) sysinstall should default to creating label systems, if it doesn't already. (5) Issues with glabel and ataraid(5) need an owner, and need to be resolved, since the device names here are likely to change. What are the issues? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp9zbeI5WsV4.pgp Description: PGP signature
Re: Switchover to CAM ATA?
On Mon, Apr 26, 2010 at 12:19:46PM -0600, M. Warner Losh wrote: In message: 20100426181209.gb3...@garage.freebsd.pl Pawel Jakub Dawidek p...@freebsd.org writes: : On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote: : I've read most of this thread. I think this is cool technology. : However, before we move forward with this, we need to have a plan for : the various issues that have come up. The plan needs to be specific, : have owners for key items, warnings about ownerless == obsoleted, and : target dates. : : I think this is one of the cases where we should record the plan of : record on a wiki. It worked well for other times we've had big, : disruptive changes. : : My opinion for the path forward: : (1) Send a big heads up about the future of ataraid(5). It will be : shot in the head soon, to be replaced be a bunch of geom classes : for each different container format. At least that seems to be : the rough consensus I've seen so far. We need worker bees to do : many of these classes, although much can be mined from the ataraid : code today. : : This shouldn't be a bunch of GEOM classes. This should one class which : recognize multiple formats, just like the LABEL class. : I don't think it is feasible to reuse gmirror for that, it wasn't : designed in something like this in mind. OK. Maybe I got the consensus wrong... My key point is that we need a plan moving forward, we need to identify what's actively being worked on vs somebody else[tm] should do tihs and when it needs to be done or else. You most likely got it right, I'm just saying creating separate GEOM class for each metadata format is wrong direction. :) : (5) Issues with glabel and ataraid(5) need an owner, and need to be : resolved, since the device names here are likely to change. : : What are the issues? ataraid doesn't remove the underlying ad* devices, so glabel often picks those up instead of the ataraid device, and you only get 1 disk's worth of raid device... So no mirroring or only 1/2 a striped volume. It not only leave ad* devices, it doesn't even open them properly using GEOM. It's internal ATA hack, which is PITA. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpC74JvN8hWL.pgp Description: PGP signature
Re: AESNI driver and fpu_kern KPI
On Sat, May 15, 2010 at 01:04:01PM +0300, Kostik Belousov wrote: Hello, please find at http://people.freebsd.org/~kib/misc/aesni.1.patch the combined patch, containing the fpu_kern KPI and Intel AESNI crypto(9) driver. I did development and some testing on the hardware generously provided by Sentex Communications to Netperf cluster. Nice work. Few comments: - Could you modify this chunk in padlock.c: + td = curthread; + error = fpu_kern_enter(td, ses-ses_fpu_ctx); + if (error != 0) + goto out; error = padlock_hash_setup(ses, macini); + fpu_kern_leave(td, ses-ses_fpu_ctx); + out: To something without goto, eg.: td = curthread; error = fpu_kern_enter(td, ses-ses_fpu_ctx); if (error == 0) { error = padlock_hash_setup(ses, macini); fpu_kern_leave(td, ses-ses_fpu_ctx); } - I see that in sys/dev/random/nehemiah.c you don't check for return value of fpu_kern_enter(). That's the only place where you ignore it. Is that intended? - Unfortunately the driver in its current version can't be used with IPsec and with GELI where authentication is enabled. This is because the driver doesn't support sessions where both encryption and authentication is defined. Do you have plans to change it? I saw that you based crypto(9) bits on padlock, which does support sessions with authentication by calculating hashes in software. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgptFXEkt9czc.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote: On 8.8.2010 12:30, Pawel Jakub Dawidek wrote: So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. Of course, there are good and bad sides to it. My take on it is that the only bad side is that it really isn't glabel's primary function to (optionally) fixup geometry, while the good sides are: It isn't its secondary function either. * glabel is in GENERIC and judging by the mailing lists' traffic it is one of the better used parts of the system so people are familiar with it. It is also already used as a perfectly valid fixup for device renaming, making both UFS and ZFS more stable for usage. That's an excellent argument. But you know what? The em(4) is also in GENERIC, why not to add it in there? * You can't really make people depend on glabel both because it is in GENERIC and because of it storing metadata in the last sector, making the rest of the drive completely usable without it in the event native 4k sector support is grown. I never said that. I do want people to depend on glabel, because it is free of such ugly hacks, so I know it won't bite them in the future. I don't want people to start depend on the fact that glabel supports changing sector sizes. Once we start supporting 4kB sectors properly people configuration will stop working, because glabel won't be able to read its metadata anymore. Your hack will break all configurations that started to depend on your hack. In what I proposed, GEOM provider will be presented to glabel (or any other GEOM class) as 4kB provider and everything will just work, also after adding proper support for 4kB sectors. I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I'll repeat. You won't commit this patch, because it is totally wrong solution and can only do a lot of damage in the future. If you look forward, even temporary solutions can be done right. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpxLQFRxU0ja.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote: On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote: I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I do not like this at all. Even if it's just for the KISS and POLA principles. A geom should do one thing and do it right imo. Why not write a new geom class that does what you want? New GEOM class only for sectorsize conversion that can operate on metadata will be useful, not only to solve this particular problem. Although keep in mind that if at some point disks will be detected and presented as 4kB providers to the GEOM, this class won't be able to find its metadata anymore (as it was stored in the last 512 bytes, not in the last 4 kilobytes). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpMenhUo3zq1.pgp Description: PGP signature
Re: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement]
On Wed, Aug 18, 2010 at 12:48:53PM +0200, Ed Schouten wrote: Hi Daichi, I think Keith Packard of Xorg once wrote a commit message along the lines of 5000 lines of code removed, feature added This seems to be similar, albeit on a smaller scale. ;-) Apart from this issue with unionfs, I am also experiencing another issue, where for some reason I cannot perform a second mount of the CD right after booting the system. Basically, my WIP FreeBSD boot CD does the following (but written in C): mount -t cd9660 /dev/iso9660/freebsd /mnt mount -t tmpfs none /tmp mount -t unionfs /tmp /mnt mount -t devfs none /mnt/dev chroot /mnt /sbin/init The first step fails with EBUSY. I use the following hack to get it working, but I don't think it's the proper way to solve it: What you are trying to do here is to mount /dev/iso9660/freebsd for the second time? This is not supported. The check is there to prevent doing this, as it will panic on you when you try to unmount first mount (not really a problem in your case, as the first mount is /, so you probably don't want to unmount it, but it is a problem in general). You should be able to reproduce the panic with your patch applied by doing the following: # mount -t cd9660 /dev/iso9660/freebsd /mnt0 # mount -t cd9660 /dev/iso9660/freebsd /mnt1 # umount /mnt0 -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp88NLmz310d.pgp Description: PGP signature
Re: [CFT] Improved ZFS metaslab code (faster write speed)
On Sat, Aug 28, 2010 at 05:03:42AM -0400, jhell wrote: On 08/28/2010 04:20, Andriy Gapon wrote: on 28/08/2010 04:24 jhell said the following: The modified patch from avg@ (portion patch) is: #ifdef _KERNEL if (arc_reclaim_needed()) { needfree = 0; wakeup(needfree); } #endif I still moved that down to below _KERNEL for the obvious reasons. But when I was using the original patch with if (needfree) I noticed a performance degradation after ~12 hours of use with and without UMA turned on. So far with ~48 hours of testing with the top half of that being with the above change, I have not seen more degradation of This is quite unexpected. needfree should be checked as the very first thing in arc_reclaim_needed() [unless you have patched it locally]. So if needfree is 1 then arc_reclaim_needed() should also return 1. But the converse is not true, arc_reclaim_needed() may return 1 even if needfree is zero. So if your testing results are conclusive then it must mean that some extra wakeups on needfree are needed. I.e. needfree is zero, so there shouldn't be anything waiting on it (see arc_lowmem) and no notification should be needed, but issuing somehow does make difference, Hmm... I will look further into this and see if I can throw a counter around it or some printf's so I can at least log what its doing in both instances. I thought the very same thing you said above when I saw your patch for that and was astounded at the results that were returned from it. So in short testing I reverted it back quickly to see if that was the cause of the problem and sure enough everything resumed to the way it was before. Anyway thanks for the reply. I will get back to you if I see anything cool arise from this. Could you include the following patch to your testing: http://people.freebsd.org/~pjd/patches/arc.c.9.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpomIv4VGZ52.pgp Description: PGP signature
ZFS v28 is ready for wider testing.
Hello. I'd like to give you ZFS v28 for testing. If you are neither brave nor mad, you can stop here. The patchset is very experimental. It can eat your cookie and hurt your teddy bear, so be warned. Don't try it for anything except testing. This patchset is also a message we, as the FreeBSD project, would like to send to our users: Eventhough OpenSolaris is dead, the ZFS file system is going to stay in FreeBSD. At this point we have quite a few developers involved in ZFS on FreeBSD as well as serveral companies. We are also looking forward to work with IllumOS. So, what this new ZFS brings? - Data deduplication. Read more here: http://blogs.sun.com/bonwick/entry/zfs_dedup - Triple parity RAIDZ (RAIDZ3). Read more here: http://dtrace.org/blogs/ahl/2009/07/21/triple-parity-raid-z/ - zfs diff. Read more here: http://arc.opensolaris.org/caselog/PSARC/2010/105/20100328_tim.haley - zpool split. Read more here: http://arc.opensolaris.org/caselog/PSARC/2009/511/20090924_mark.musante - Snapshot holds. Read more here: http://arc.opensolaris.org/caselog/PSARC/2009/297/20090511_chris.kirby - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode. And much, much more, including plenty of preformance improvements and bug fixes. So test whatever you can and report back. Look for regressions, strange behaviour, missing features, deadlocks, livelocks, preformance degradation, etc. The boot code is not updated at all, so booting off of ZFS doesn't currently work. The patch is against today's FreeBSD HEAD. The patch enables (in sys/modules/zfs/Makefile) ZFS internal debugging, please don't turn it off. Also, compile your kernel with the following options: options KDB options DDB options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_LOCKS options DEBUG_VFS_LOCKS Ignore all the LOR (Lock Order Reversal) reports from WITNESS. There will be plenty of those, and you'll desperately want to report them, but please don't. The best way to report a problem is to answer to this e-mail with as short as possible procedure of how to reproduce it and debugging info. I'd prefer textdump if possible. Below you can find quick procedure how to setup textdumps: Choose spare/swap disk/partition in your system, let's say it is /dev/ad0s1b. Add the following line to /etc/fstab: /dev/ad0s1b noneswapsw 0 0 Add the following line to /etc/rc.conf: ddb_enable=YES Run the following commands: # /etc/rc.d/swap1 start # /etc/rc.d/dumpon start # /etc/rc.d/ddb start This will setup swap, mark it as dump device and setup some DDB scripts. Or you can just reboot. Now when your system panic or deadlock, enter DDB and call the following command: ddb run kdb.enter.panic It will execute all the commands I need, dump them in text format to your swap device and reboot machine. After the reboot, you should find textdump.tar.0 file in /var/crash/ directory. This is the debug info I need. End of textdumps procedure. Ok, now that I know you read everything carefully, here is the patch: http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2 Good luck! : -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpGVyTUV4RIm.pgp Description: PGP signature
Re: ZFS v28 is ready for wider testing.
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote: Ok, now that I know you read everything carefully, here is the patch: http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2 Important note. Please patch with the following command: # patch -E -p0 zfs_20100831.patch If you don't use -E option, patch(1) won't remove empty files and you won't be able to compile it. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgplMh4YH3ZOH.pgp Description: PGP signature
Re: ZFS v28 is ready for wider testing.
On Thu, Sep 02, 2010 at 01:55:51AM -0700, Rob Farmer wrote: On Tue, Aug 31, 2010 at 2:59 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: Ok, now that I know you read everything carefully, here is the patch: http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2 buildworld on i386 (yes I know ZFS isn't ideal there): [...] Yes, I know about this problem, You can use attached patch or wait for full patch, which I'll be sending later today. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --- sys/cddl/compat/opensolaris/sys/atomic.h +++ sys/cddl/compat/opensolaris/sys/atomic.h @@ -39,10 +39,9 @@ #ifndef __LP64__ extern void atomic_add_64(volatile uint64_t *target, int64_t delta); extern void atomic_dec_64(volatile uint64_t *target); -extern void *atomic_cas_ptr(volatile void *target, void *cmp, void *newval); #endif #ifndef __sparc64__ -extern uint64_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp, +extern uint32_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp, uint32_t newval); extern uint64_t atomic_cas_64(volatile uint64_t *target, uint64_t cmp, uint64_t newval); @@ -119,21 +118,19 @@ } #ifndef COMPAT_32BIT -#if defined(__LP64__) +#ifdef __LP64__ static __inline void * atomic_cas_ptr(volatile void *target, void *cmp, void *newval) { - return ((void *)atomic_cas_64((volatile uint64_t *)target, (uint64_t)cmp, - (uint64_t)newval)); + return ((void *)atomic_cas_64(target, (uint64_t)cmp, (uint64_t)newval)); } #else static __inline void * atomic_cas_ptr(volatile void *target, void *cmp, void *newval) { - return ((void *)atomic_cas_32((volatile uint64_t *)target, (uint64_t)cmp, - (uint64_t)newval)); + return ((void *)atomic_cas_32(target, (uint32_t)cmp, (uint32_t)newval)); } #endif -#endif +#endif /* !COMPAT_32BIT */ #endif /* !_OPENSOLARIS_SYS_ATOMIC_H_ */ pgppo82knRdQW.pgp Description: PGP signature
Re: ZFS v28 is ready for wider testing.
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote: [...] Ok, now that I know you read everything carefully, here is the patch: http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2 Now it is even easier to test new ZFS! :) Here you can find VirtualBox Appliance (113MB) with FreeBSD 9-CURRENT and ZFSv28: http://people.freebsd.org/~pjd/misc/FreeBSD9_ZFSv28_0.1.tgz Untar it, import it (zfsv28.ovf) to VirtualBox and have fun. You can log in as root with no password (via virtual console or via SSH). The system IP address is IP 192.168.56.66/24. There are 16 ada(4) disks to play with. For example: zfsv28:root:~# zpool create tank raidz3 ada{0,1,2,3,4,5,6,7} raidz3 ada{8,9,10,11,12,13,14,15} zfsv28:root:~# zpool status pool: tank state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 ada4ONLINE 0 0 0 ada5ONLINE 0 0 0 ada6ONLINE 0 0 0 ada7ONLINE 0 0 0 raidz3-1 ONLINE 0 0 0 ada8ONLINE 0 0 0 ada9ONLINE 0 0 0 ada10 ONLINE 0 0 0 ada11 ONLINE 0 0 0 ada12 ONLINE 0 0 0 ada13 ONLINE 0 0 0 ada14 ONLINE 0 0 0 ada15 ONLINE 0 0 0 errors: No known data errors -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp3nDIzwUUuC.pgp Description: PGP signature
Re: ZFS v28 is ready for wider testing.
On Fri, Sep 03, 2010 at 04:50:44PM +0100, Peter Molnar, BSD wrote: Hi, I would like to try ZFS + VirtualBox but I have got problems: 1) Linux 2.6.32-24-generic #42-Ubuntu SMP Fri Aug 20 14:21:58 UTC 2010 x86_64 GNU/Linux I tried import that file in my VirtualBox but I have got error: Failed to import appliance. /home/peter/FreeBSD/zfsv28.ovf Too many IDE controllers in OVF; import facility only supports one. Which VirtualBox version do you use? 3.2.8? Exporting appliances is a bit broken (if you have more than one disk, it will point all disks at the last one from configuration), so I had to edit .ovf file manually to fix this. Maybe I messed something up, but I was able to successfully import it before publishing it. PS. I waited for so long for decent virtualization software for FreeBSD, and I must say VirtualBox is really great, and free, and open-source Are you reading this, VMWare? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgppp5WIVDzjJ.pgp Description: PGP signature
gptboot rewrite, bootonce, etc.
things will have to wait until I can sleep at nights again. Well, there is still dedup support that waits to be implemented in gptzfsboot... -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpm1w4OWOKIR.pgp Description: PGP signature
Re: gptboot rewrite, bootonce, etc.
On Mon, Sep 20, 2010 at 09:46:56AM +0100, krad wrote: does it work for zfs boot as that would be really nice if it did? No, it doesn't. ZFS works a bit differently. ZFS operate on pools, not really on partitions. One ZFS file system can span multiple disks/partitions. I'm not yet sure how to implement it, so it is intuitive, but I also haven't spend much time thinking about it. We needed UFS and that is what I implemented. It took me much more time than I expected anyway:) -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpOli8wZZAdH.pgp Description: PGP signature
Re: gptboot rewrite, bootonce, etc.
On Mon, Sep 20, 2010 at 01:17:38AM +0200, Oliver Pinter wrote: Hi PJD! Can you this patcheset release for 7-STABLE? I've no plans atm to port this work to 7-STABLE. I don't even have 7.x systems anymore. Not sure how boot code differs, maybe the patch will apply without modifications? No idea. I'd like to MFC this to 8-STABLE, though. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp1EiZmlOSUJ.pgp Description: PGP signature
Re: gptboot rewrite, bootonce, etc.
On Sun, Sep 19, 2010 at 09:10:52PM +0400, Boris Samorodov wrote: Hi! On Sat, 18 Sep 2010 01:45:42 +0200 Pawel Jakub Dawidek wrote: My company was in need for functionality similar to nextboot(8), but on boot loader level, so we can have two partitions we boot from where one is known to be good and the other is used for upgrades. We upgrade by dd(1)ing entire partition image onto unused partition, we mark it as try-to-boot-from-it-but-only-once, reboot and if we fail to boot from the new partition, we fall back to the old, good partition. If we succeed on the other hand, we mark the new partition as our boot partition and mark the other one as unused. Well, how hard can it be? After around two weeks of work, I ended up rewriting gptboot in large parts, reorganizing a lot of code, improving and extending gpart a bit and implementing desire functionality. Here is the patch for review and test: http://people.freebsd.org/~pjd/patches/gptboot.patch Great! Since I need to have both i386 and amd64 at my box here are my test results: - [~]b...@alya% uname -a FreeBSD alya 9.0-CURRENT FreeBSD 9.0-CURRENT #1 r212758M: Sat Sep 18 16:13:38 MSD 2010 b...@alya:/space/FreeBSD/base/head/obj/space/FreeBSD/base/head/src/sys/ALYA amd64 [~]b...@alya% glabel status Name Status Components gptid/c6053c9b-abcc-11df-b740-00251124aff4 N/A ad4p1 label/9-amd64 N/A ad4p2 label/swap N/A ad4p3 label/space N/A ad4p4 label/9-i386 N/A ad4p5 [~]b...@alya% mount /dev/label/9-amd64 on / (ufs, local) devfs on /dev (devfs, local, multilabel) /dev/label/space on /space (ufs, local) /dev/md0 on /tmp (ufs, local, nosuid, soft-updates) procfs on /proc (procfs, local) linprocfs on /compat/linux/proc (linprocfs, local) linsysfs on /compat/linux/sys (linsysfs, local) fdescfs on /dev/fd (fdescfs) [~]b...@alya% gpart show = 34 490234685 ad4 GPT (234G) 341281 freebsd-boot (64K) 162 419430402 freebsd-ufs (20G) 4194320283886083 freebsd-swap (4.0G) 50331810 2097152004 freebsd-ufs (100G) 260047010 419430405 freebsd-ufs (20G) 301990050 188244669 - free - (90G) [~]b...@alya% gpart set -a bootme -i 2 ad4 bootme set on ad4p2 [~]b...@alya% gpart set -a bootonce -i 5 ad4 bootonce set on ad4p5 [~]b...@alya% gpart show = 34 490234685 ad4 GPT (234G) 341281 freebsd-boot (64K) 162 419430402 freebsd-ufs [bootme] (20G) 4194320283886083 freebsd-swap (4.0G) 50331810 2097152004 freebsd-ufs (100G) 260047010 419430405 freebsd-ufs [bootonce,bootme] (20G) 301990050 188244669 - free - (90G) - Install i386 kernel/world to ad4p5, successful reboot, get i386 system. Next reboot (get amd64 system back): - [~]b...@alya% gpart show = 34 490234685 ad4 GPT (234G) 341281 freebsd-boot (64K) 162 419430402 freebsd-ufs [bootme] (20G) 4194320283886083 freebsd-swap (4.0G) 50331810 2097152004 freebsd-ufs (100G) 260047010 419430405 freebsd-ufs (20G) 301990050 188244669 - free - (90G) - All seems to work fine. Great, thanks for testing! Any comments or suggestions? Only one for now. With current default syslog configuration logging to local0.warning and local0.info goes nowhere. It will be good if those messages have traces at the default system. Good point. I changed those to local0.notice. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpK71ho4UC6u.pgp Description: PGP signature
Recent GELI additions.
Hi. I'd like to inform about three new features in GELI available in HEAD: 1. AES-XTS encryption. XTS mode is a standard that is recommended these days for storage encryption. This is the default now. AES-XTS support was also added to opencrypto framework and aesni(4) driver. 2. Multiple encryption keys. GELI will use one encryption key for at most 2^20 blocks (sectors), as it is not recommended to use the same encryption key for too much data. It generates keys array from the master key on attach and uses it accordingly. This is the default now. 3. Passphrase can now be loaded from a file (-J and -j options). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpKbX8P352EG.pgp Description: PGP signature
Re: letting glabel recognise a media change
On Thu, Sep 30, 2010 at 08:46:11PM +0300, Alexander Motin wrote: Andriy Gapon wrote: on 30/09/2010 01:28 Matthew Jacob said the following: If something like that was in place, I assure you that things would start to use it very quickly. I am not sure about this. Because, e.g. I don't see an easy way to know that media is changed in scsi_cd driver. That is, without polling. I don't consider polling to be an easy way for a number of reasons. SATA specification defines concept of Asynchronous Notification. It is already used by port multipliers to report about PHY events. It is also supposed to be used by CD drives to report media change. I haven't seen such devices yet, but hope they may appear sometimes. And even without AN support it would be nice to implement proper handling for SCSI UA - media changed errors within CAM. It still won't be perfect without using polling, but probably still something. I'd like to know the original reason why CD device is represented by GEOM provider and not CD media. For my naive thinking CD media should be GEOM provider that we taste once the media is inserted and orphan once the media is removed. I don't see any reasons for CD device to be useful GEOM provider, but maybe I'm overlooking something. Poul-Henning or Soren, do you remember who made and why this design choice? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpbCmI9YvYaB.pgp Description: PGP signature
Re: letting glabel recognise a media change
On Mon, Oct 11, 2010 at 11:03:26AM -0400, John Baldwin wrote: With CD drives you are also rather stuck in that the existing ABI for controlling CD drives (e.g. ioctls in 3rd party software to eject a CD) are done on the /dev/cdX device. Ideally enclosures for removable media would be separate devices from the removable media itself, but a lot of existing software for CD's would break if this changes now. Right, but I still wonder if we could execute provider orphan and retaste on various events like media insertion or removal. If media is removed we orphan provider and recreate it, which will trigger retaste, and this is fine there will be nothing to read from or write to (we will simply return errors as we do now, I think). This way we nicely co-operate with GEOM, but also with other tools that don't require media to be present (if there is no media devfs entry still exists and handles ioctls, it just return errors on read requests). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp57kBd4EwFu.pgp Description: PGP signature
Re: ZFS v28 is ready for wider testing.
On Wed, Nov 03, 2010 at 07:28:15PM +0100, Olivier Smedts wrote: http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2 Hello, Any status update on this ? I regularly check http://people.freebsd.org/~pjd/patches/ to see if there's an updated version of your patch. 2 months old is quite a bit for -CURRENT, which often receives commits on zfsco parts. Thanks for all your work on FreeBSD (not only ZFS). It took a while, but I should have something new shortly. I recently finished boot support for v28 (the most missing feature in the previous patch?) and will work on new patch soon. I'm heading to meetBSD California tomorrow and I'll be back in a week, so nothing will happen till then for sure. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpPnD9csrFCZ.pgp Description: PGP signature
Read-only /usr/obj/ no longer kosher?
I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp0DzHE4AU2t.pgp Description: PGP signature
Re: Memory modified after free, seemingly geli related
On Thu, Aug 06, 2015 at 04:06:40AM +0200, Pawel Jakub Dawidek wrote: On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote: I've encountered a few memory modified after free panics recently, which seem to be from geli. I don't yet have any debugging to completely confirm it's geli, but it has not happened on my other test laptop which configured similarly but without geli. This has a few local patches from my to-commit-to-HEAD queue. FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10 r284409+6a002d9(staging): Tue Jul 7 17:57:01 EDT 2015 panic: Memory modified after free 0xf80009d504d8(248) val=0 @ 0xf80009d50518 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm investigating further and will let you know here once I find the cause. Ok. It was bio from ZFS in my case, but it was GELI which modified bio_error after delivering bio. This patch fixes the race: http://people.freebsd.org/~pjd/patches/geom_eli.patch Using bio after calling crypto_dispatch() is a bug. 'done' callbacks might have already called g_io_deliver() and upper layer might have already freed the bio. I'm not fully convinced that panic is the right response to crypto_dispatch() failure. It means that the driver failed our request and didn't call our callback, which is bad as we never complete the I/O. The crypto drivers tend to return errors only if the request itself is bogus, but that is program's bug and not a runtime condition. In other words panic should be fine here. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpRu2X0EJLDP.pgp Description: PGP signature
Re: Memory modified after free, seemingly geli related
On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote: I've encountered a few memory modified after free panics recently, which seem to be from geli. I don't yet have any debugging to completely confirm it's geli, but it has not happened on my other test laptop which configured similarly but without geli. This has a few local patches from my to-commit-to-HEAD queue. FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10 r284409+6a002d9(staging): Tue Jul 7 17:57:01 EDT 2015 panic: Memory modified after free 0xf80009d504d8(248) val=0 @ 0xf80009d50518 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm investigating further and will let you know here once I find the cause. cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe011414a880 vpanic() at vpanic+0x189/frame 0xfe011414a900 panic() at panic+0x43/frame 0xfe011414a960 trash_ctor() at trash_ctor+0x48/frame 0xfe011414a970 uma_zalloc_arg() at uma_zalloc_arg+0x573/frame 0xfe011414a9e0 g_clone_bio() at g_clone_bio+0x1d/frame 0xfe011414aa00 g_eli_start() at g_eli_start+0xbd/frame 0xfe011414aa30 g_io_schedule_down() at g_io_schedule_down+0xe6/frame 0xfe011414aa60 g_down_procbody() at g_down_procbody+0x7d/frame 0xfe011414aa70 fork_exit() at fork_exit+0x84/frame 0xfe011414aab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe011414aab0 --- trap 0, rip = 0, rsp = 0xfe011414ab70, rbp = 0 --- -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpMR9ZeEaVYN.pgp Description: PGP signature
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 03:32:35PM -0700, NGie Cooper wrote: On Tue, Aug 25, 2015 at 3:21 PM, Xin Li delp...@delphij.net wrote: On 08/25/15 14:55, Pawel Jakub Dawidek wrote: Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. Yes, I can now reproduce it with source updated to -p2. Yes, that's because freebsd-version.sh is generated from the files (but it's not clear to me whether if it's a bug or a feature that 'make install' checks if it's up-to-date and decides to regenerate it...). It's a quirk for sure. If you change the behavior, people will definitely complain as they will now need to go back and rebuild everything. What we have now is misleading. People should recompile. It is rather rare to see security advisory which bumps only patch level and something that doesn't require recompilation (eg. a shell script). Current behaviour would make people think they are running latest patch level because freebsd-version says so, eventhough they only did 'make installworld' without rebuilding affected binaries. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpRsLnByGkaA.pgp Description: PGP signature
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote: On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 11:53:47PM +0200, Pawel Jakub Dawidek wrote: On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote: On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. Yes, I can now reproduce it with source updated to -p2. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Read-only /usr/obj/ no longer kosher?
On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dumpdev in loader.conf vs rc.d/dumpon
On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote: > > > On 23.09.2015 19:57, Andriy Gapon wrote: > > > I do not have a strong opinion. Either option, rc.d/dumpon change or > > > geom_dev > > > change, is fine with me. > > > > I added the ability to set dumpdev via loader. But I wasn't aware that > > it was used in rc.d script. > > > > If you have set dumpdev kenv, it will be already enabled in the time > > when rc.d/dumpon will be run. So, I think it is useless to try to > > enable dumpdev again. I prefer remove this old code from rc.d script. > > rc.d script can redirect dump to device, not available at boot time, > iSCSI disk, for examle. No. Dump device is very special. It runs in an environment when kernel already paniced, there are no interrupt, so there is no networking. Storage controllers have special methods to handle dumping kernel memory - it doesn't go through GEOM, it cannot go through GEOM as the scheduler doesn't work too. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpfeiUIcx0t9.pgp Description: PGP signature
Re: dumpdev in loader.conf vs rc.d/dumpon
On Fri, Sep 25, 2015 at 12:11:51AM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 24, 2015 at 10:58:00PM +0200, Pawel Jakub Dawidek wrote: > > > On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote: > > > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote: > > > > > > > On 23.09.2015 19:57, Andriy Gapon wrote: > > > > > I do not have a strong opinion. Either option, rc.d/dumpon change or > > > > > geom_dev > > > > > change, is fine with me. > > > > > > > > I added the ability to set dumpdev via loader. But I wasn't aware that > > > > it was used in rc.d script. > > > > > > > > If you have set dumpdev kenv, it will be already enabled in the time > > > > when rc.d/dumpon will be run. So, I think it is useless to try to > > > > enable dumpdev again. I prefer remove this old code from rc.d script. > > > > > > rc.d script can redirect dump to device, not available at boot time, > > > iSCSI disk, for examle. > > > > No. Dump device is very special. It runs in an environment when kernel > > already paniced, there are no interrupt, so there is no networking. > > Storage controllers have special methods to handle dumping kernel memory > > - it doesn't go through GEOM, it cannot go through GEOM as the scheduler > > doesn't work too. > > Can be ZFS VOL act as dump device? I don't think so. IIRC there was a hack in Illumos to allocate contiguous space for dump in one of the vdevs (then I think it was extended to multiple vdevs). I don't think any ZFS feature has worked for such a ZVOL (no checksumming, no compression, etc.). Others may have more up-to-date info about that. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpSlioURbq3y.pgp Description: PGP signature
Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics
On 9/8/23 15:09, Alexander Motin wrote: Thank you, Martin. I was able to reproduce the issue with your script and found the cause. I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`. It also got copy_file_range() support, but later than `cp`. That is probably why it slipped through testing. This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 . Mark, could you please try the patch? Thank you Alex for the fix! -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/17/23 18:15, Pawel Jakub Dawidek wrote: There were three issues that I know of after the recent OpenZFS merge: 1. Data corruption unrelated to block cloning, so it can happen even with block cloning disabled or not in use. This was the problematic commit: https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9 It was reverted in 63ee747febbf024be0aace61161241b53245449e. 2. Data corruption with embedded blocks when block cloning is enabled. It can happen when compression is enabled and the block contains between 60 to 112 bytes (this might be hard to determine). Fix exists, it is merged to OpenZFS already, but isn't in FreeBSD yet. OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739 3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered when block cloning is enabled, the sync property is set to disabled and copy_file_range(2) is used. Easy fix exists, it is not yet merged to OpenZFS and not yet in FreeBSD HEAD. OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758 Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur. As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are fixed, as far as I can tell. Block cloning remains disabled for now just to be on the safe side, but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1. Don't relay on this sysctl as it will be removed in 2-3 weeks. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/17/23 21:28, José Pérez wrote: Hi Pawel, thank you for your reply and for the fixes. I think there is a 4th issue that needs to be addressed: how do we recover from the worst case scenario which is a machine with a kernel > 2a58b312b62f and ZFS root upgraded with block cloning enabled. In particular, is it safe to turn such a machine on in the first place, and what are the risks involved in doing so? Any potential data loss? Would such a machine be able to fix itself by compiling a kernel, or would compilation fail and might data be corrupted in the process? I have two poudriere builders powered off (I am not alone in this situation) and I need to recover them, ideally minimizing data loss. The builders are also hosting current and used to build kernels and worlds for 13 and current: as of now all my production machines are stuck on the 13 they run, I cannot update binaries nor packages and I would like to be back online. José, I can only speak of block cloning in details, but I'll try to address everything. The easiest way to avoid block_cloning-related corruption on the kernel after the last OpenZFS merge, but before e0bb199925 is to set the compress property to 'off' and the sync property to something other than 'disabled'. This will avoid the block_cloning-related corruption and zil_replaying() panic. As for the other corruption, unfortunately I don't know the details, but my understanding is that it is happening under higher load. Not sure I'd trust a kernel built on a machine with this bug present. What I would do is to compile the kernel as of 068913e4ba somewhere else, boot the problematic machine in single-user mode and install the newly built kernel. As far as I can tell, contrary to some initial reports, none of the problems introduced by the recent OpenZFS merge corrupt the pool metadata, only file's data. You can locate the files modified with the bogus kernel using find(1) with a proper modification time, but you have to decide what to do with them (either throw them away, restore them from backup or inspect them). -- Pawel Jakub Dawidek
Re: another crash and going forward with zfs
On 4/18/23 03:51, Mateusz Guzik wrote: After bugfixes got committed I decided to zpool upgrade and sysctl vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very quickly got a new crash: panic: VERIFY(arc_released(db->db_buf)) failed cpuid = 9 time = 1681755046 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0 vpanic() at vpanic+0x152/frame 0xfe0a90b8e640 spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0 dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0 dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame 0xfe0a90b8e700 dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780 dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0 zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960 zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980 VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90 vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20 vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80 vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0 vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40 dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90 sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00 amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30 --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp = 0x103cdc85dd48, rbp = 0x103cdc85dd80 --- KDB: enter: panic [ thread pid 95000 tid 135035 ] Stopped at kdb_enter+0x32: movq$0,0x9e4153(%rip) The posted 14.0 schedule which plans to branch stable/14 on May 12 and one cannot bet on the feature getting beaten up into production shape by that time. Given whatever non-block_clonning and not even zfs bugs which are likely to come out I think this makes the feature a non-starter for said release. I note: 1. the current problems did not make it into stable branches. 2. there was block_cloning-related data corruption (fixed) and there may be more 3. there was unrelated data corruption (see https://github.com/openzfs/zfs/issues/14753), sorted out by reverting the problematic commit in FreeBSD, not yet sorted out upstream As such people's data may be partially hosed as is. Consequently the proposed plan is as follows: 1. whack the block cloning feature for the time being, but make sure pools which upgraded to it can be mounted read-only 2. run ztest and whatever other stress testing on FreeBSD, along with restoring openzfs CI -- I can do the first part, I'm sure pho will not mind to run some tests of his own 3. recommend people create new pools and restore data from backup. if restoring from backup is not an option, tar or cp (not zfs send) from the read-only mount block cloning beaten into shape would use block_cloning_v2 or whatever else, key point that the current feature name would be considered bogus (not blocking RO import though) to prevent RW usage of the current pools with it enabled. Comments? Correct me if I'm wrong, but from my understanding there were zero problems with block cloning when it wasn't in use or now disabled. The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly avoid mess like this and give us more time to sort all the problems out while making it easy for people to try it. If there is no plan to revert the whole import, I don't see what value removing just block cloning will bring if it is now disabled by default and didn't cause any problems when disabled. -- Pawel Jakub Dawidek
Re: another crash and going forward with zfs
On 4/18/23 05:14, Mateusz Guzik wrote: On 4/17/23, Pawel Jakub Dawidek wrote: Correct me if I'm wrong, but from my understanding there were zero problems with block cloning when it wasn't in use or now disabled. The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly avoid mess like this and give us more time to sort all the problems out while making it easy for people to try it. If there is no plan to revert the whole import, I don't see what value removing just block cloning will bring if it is now disabled by default and didn't cause any problems when disabled. The feature definitely was not properly stress tested and what not and trying to do it keeps running into panics. Given the complexity of the feature I would expect there are many bug lurking, some of which possibly related to the on disk format. Not having to deal with any of this is can be arranged as described above and is imo the most sensible route given the timeline for 14.0 Block cloning doesn't create, remove or modify any on-disk data until it is in use. Again, if we are not going to revert the whole merge, I see no point in reverting block cloning as until it is enabled, its code is not executed. This allow people who upgraded the pools to do nothing special and it will allow people to test it easily. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/16/23 01:07, Florian Smeets wrote: On the pool that has block_cloning enabled I see the above insta panic when poudriere starts building. I found a workaround though: --- /usr/local/share/poudriere/include/fs.sh.orig 2023-04-15 18:03:50.090823000 +0200 +++ /usr/local/share/poudriere/include/fs.sh 2023-04-15 18:04:04.144736000 +0200 @@ -295,7 +295,6 @@ fi zfs clone -o mountpoint=${mnt} \ - -o sync=disabled \ -o atime=off \ -o compression=off \ ${fs}@${snap} \ With this workaround I was able to build thousands of packages without panics or failures due to data corruption. Thank you, Florian, that was very helpful! This should fix the problem: https://github.com/openzfs/zfs/pull/14758 -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/13/23 23:05, Shawn Webb wrote: I've learned over the years downstream that it's not really my place to tell upstream what to do or how to do it. However, I think given the seriousness of this, upstream might do well to revert the commit until a solid fix is in place. Upstream might want to consider the impacts this is having not just with downstream projects, but also regular users. Really bad timing to have a lot of new tax documentation that I really don't want to lose. I'd really like to have an up-to-date, security patched OS, but I guess I'll stay behind so that I don't risk losing critical financial documentation. Shawn, I'm working on a patch to safely revert this that would also work for people who already upgraded their pools. I'm sorry for this mess. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/13/23 22:56, Cy Schubert wrote: I'm in the process of building a branch reverting the merge altogether and will test it on my sandbox machine later today. Cy, thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. I'd also greatly appreciate if you could provide a procedure for me to reproduce the corruption, ideally without the internet access, as I'll be on the plane(s) for the next ~24h. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 07:52, Charlie Li wrote: Pawel Jakub Dawidek wrote: thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. Testing with mjg@ earlier today revealed that block_cloning was not the cause of poudriere bulk build (and similar cp(1)/install(1)-based) corruption, although may have exacerbated it. Can you please elaborate how were you testing and what exactly did you exclude? Thanks. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 09:23, Charlie Li wrote: Pawel Jakub Dawidek wrote: Here is the change that reverts most of the modifications and disables cloning new blocks. It does retain ability to free existing cloned blocks and keeps block_cloning feature around, so upgraded pools can be imported and existing cloned blocks freed. It does not handle replaying ZIL with block-cloning logs, so make sure you import pools that were cleanly exported. I'd appreciate if someone who can reproduce those corruptions could try it. https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103 Does not apply to sys/contrib/openzfs tip, conflicts in module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c. This should work: https://people.freebsd.org/~pjd/patches/brt_revert.patch -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 07:40, Pawel Jakub Dawidek wrote: On 4/13/23 22:56, Cy Schubert wrote: I'm in the process of building a branch reverting the merge altogether and will test it on my sandbox machine later today. Cy, thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. I'd also greatly appreciate if you could provide a procedure for me to reproduce the corruption, ideally without the internet access, as I'll be on the plane(s) for the next ~24h. Here is the change that reverts most of the modifications and disables cloning new blocks. It does retain ability to free existing cloned blocks and keeps block_cloning feature around, so upgraded pools can be imported and existing cloned blocks freed. It does not handle replaying ZIL with block-cloning logs, so make sure you import pools that were cleanly exported. I'd appreciate if someone who can reproduce those corruptions could try it. https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103 Thank you guys for your help! -- Pawel Jakub Dawidek