Re: HEADS UP: Capsicum overhaul.

2013-03-02 Thread Pawel Jakub Dawidek
On Fri, Mar 01, 2013 at 09:45:02PM -0600, Larry Rosenman wrote:
 On Sat, 2 Mar 2013, Pawel Jakub Dawidek wrote:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 
 
 Similar to another post:
 vn up
 Updating '.':
 Udatabases/py-sqlite3/Makefile
 Udatabases/py-sqlite3/files/setup.py
 Udatabases/py-sqlite3/files/setup3.py
 svn: E93: Can't move '/usr/ports/.svn/tmp/svn-X6U5KQ' to 
 '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient
 # svn up
 svn: E155037: Previous operation has not finished; run 'cleanup' if it was 
 interrupted
 # svn cleanup
 svn: E93: Can't move '/usr/ports/.svn/tmp/svn-Bb1iSM' to 
 '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient

This should be now fixed in r247616.

Thank you for the report!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpRi27mevghK.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote:
 Pawel Jakub Dawidek p...@freebsd.org writes:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 tmux started to behave weirdly, sometimes failing to attach:
 
   $ printenv
   PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
   OLDPWD=/
   DISPLAY=:0
   PWD=/home/foo
   TERM=xterm
   USER=foo
   HOME=/home/foo
   SHELL=/bin/sh
 
   $ ktrace -i tmux -L test -f /dev/null
   $ echo $?
   1
   $ kdump -r | pastebinit -a 'tmux fails to attach'
   http://pastebin.com/U3nCPrFY
 
   $ env -i TERM=$TERM ktrace -i /usr/local/bin/tmux -L test -f /dev/null
   $ ^D
   [exited]
   $ kdump -r | pastebinit -a 'tmux fails to attach (workaround)'
   http://pastebin.com/w1dsUAU4
 
 I've tried so far:
 
   * booting allbsd.org snapshot - no joy
   * enabling capsicum options - no joy
   * reverting recent capsicum commits - works fine

Yes, it was already reported to me and I'm investigating the problem.
Thanks.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpg0QzEgXbEG.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote:
 Pawel Jakub Dawidek p...@freebsd.org writes:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 tmux started to behave weirdly, sometimes failing to attach:

I committed a work-around in r247740, but the root of the problem is yet
to be found.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp7tAe7RGpg5.pgp
Description: PGP signature


Re: kernel build failure

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 06:47:00PM -0500, Michael Butler wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 SVN r247736 prompts this ..
 
 cc -c -O2 -pipe -fno-strict-aliasing -march=pentium4 -std=c99  -Wall
 - -Wredundant-decls -Wnested-externs -Wstrict-prototypes
 - -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
 - -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
 - -fdiagnostics-show-option  -Wno-error-tautological-compare
 - -Wno-error-empty-body  -Wno-error-parentheses-equality -nostdinc  -I.
 - -I/usr/src/sys -I/usr/src/sys/contrib/altq -D_KERNEL
 - -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -mno-aes -mno-avx
 - -mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror
  /usr/src/sys/kern/uipc_usrreq.c
 /usr/src/sys/kern/uipc_usrreq.c:1689:18: error: use of undeclared
 identifier 'fdep'; did you mean 'fde'?
 filecaps_free(fdep-fde_caps);
^~~~
fde
 /usr/src/sys/kern/uipc_usrreq.c:1682:36: note: 'fde' declared here
 unp_freerights(struct filedescent *fde, int fdcount)
^
 1 error generated.

This was because I divided larger change into smaller changes.
r247738 should be fine.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp3VnbiqBD_t.pgp
Description: PGP signature


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote:
 On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote:
  On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems
  (3) the same symptoms)), many services drop a sporadic
  
  broken pipe
  
  This happesn to system's top (I have to type it several times to get
  finally a top), it happens to sudo su -, it happens to SSH (drops
  connection with broken pipe) and as I reported earlier, it seems to
  affect the entire port system, since I can not build any port, I receive
  
  *** [do-extract] Signal 13
  
  This is dramatic for me, because several modules (rtc, linux_adobe ...)
  can not be recompiled as it is required by the last /usr/src/UPDATING
  entry 20130304.
  
  Since dbus fails to start and even the nVidia driver (which is a kernel
  module, it canot be built and therefore ... ).
  
  Dimitry, I put you into CC, just in case. It seems that the last commits
  (not only the new DRM2 mess) broke something.
  
  I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\
 
 Have you tried backing up to just before all of pjd@'s file descriptor and
 capsicum commits?  It broke some other stuff initially related to fd passing,
 so I don't think it is beyond imagination that it broke something with UNIX
 domain sockets in general.

Is there a consensus already if this is result of my changes or davide's
r247804?

I just upgraded my laptop to today's HEAD and I don't see any weird
behaviour yet. If someone can provide a way to reproduce the problem,
I'd be happy to investigate.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp7F62niw8U0.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-13 Thread Pawel Jakub Dawidek
On Wed, Mar 13, 2013 at 11:18:36AM -0400, John Baldwin wrote:
 On Tuesday, March 12, 2013 4:16:32 pm Dirk Engling wrote:
  While debugging my own daemon I noticed that pidfile_open does not
  perform the appropriate checks for a running daemon if the caller does
  not provide a pidptr to pidfile_open
  
  fd = flopen(pfh-pf_path,
  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
  
  fails when another daemon holds the lock and flopen sets errno to
  EAGAIN, the check 4 lines below in
  
  if (errno == EWOULDBLOCK  pidptr != NULL) {
  
  means that the pidfile_read is never executed.
  
  This results in my second daemon receiving an EAGAIN which clearly was
  meant to report a race condition between two daemons starting at the
  same time and the first one not yet finishing pidfile_write.
  
  The expected behavior would be to set errno to EEXIST, even if no pidptr
  was passed.
 
 Yes, I think it should actually perform the same logic even if pidptr is
 NULL of waiting for the other daemon to finish starting up.  Something like
 this:
 
 Index: lib/libutil/pidfile.c
 ===
 --- pidfile.c (revision 248162)
 +++ pidfile.c (working copy)
 @@ -100,6 +100,7 @@ pidfile_open(const char *path, mode_t mode, pid_t
   struct stat sb;
   int error, fd, len, count;
   struct timespec rqtp;
 + pid_t dummy;
  
   pfh = malloc(sizeof(*pfh));
   if (pfh == NULL)
 @@ -126,7 +127,9 @@ pidfile_open(const char *path, mode_t mode, pid_t
   fd = flopen(pfh-pf_path,
   O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
   if (fd == -1) {
 - if (errno == EWOULDBLOCK  pidptr != NULL) {
 + if (errno == EWOULDBLOCK) {
 + if (pidptr == NULL)
 + pidptr = dummy;
   count = 20;
   rqtp.tv_sec = 0;
   rqtp.tv_nsec = 500;

I agree EEXIST should be returned, but I don't like reading existing
pidfile (including waiting for the other process to write its PID) just
to throw read PID away.

How about this patch?

http://people.freebsd.org/~pjd/patches/pidfile.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp85B_RZoSMW.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-13 Thread Pawel Jakub Dawidek
On Wed, Mar 13, 2013 at 10:59:17PM +0100, Dirk Engling wrote:
 
 On Wed, 13 Mar 2013, Pawel Jakub Dawidek wrote:
 
  How about this patch?
 
  http://people.freebsd.org/~pjd/patches/pidfile.c.patch
 
 If you move the lines
 
 + if (errno == 0 || errno == EAGAIN)
 + errno = EEXIST;
 
 out of the else branch, you can get rid of the if branch, guard the else 
 branch by a
 
 + if (pidptr) {
 
 and let the if (errno == 0 || errno == EAGAIN) fix the errno

I think I considered something similar at first, but the change I
proposed was optimal, IMHO at the cost of producing pretty large diff,
because of indentation change. But to be sure, can you send a patch of
your proposed change?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp1futIji1g8.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 13.03.13 23:08, Pawel Jakub Dawidek wrote:
 
  I think I considered something similar at first, but the change I 
  proposed was optimal, IMHO at the cost of producing pretty large
  diff, because of indentation change. But to be sure, can you send a
  patch of your proposed change?
 
 http://erdgeist.org/arts/software/Code/pidfile.c.diff

Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true
on FreeBSD, but is not portable. Also in case pidptr is NULL you compare
errno three times instead of just one (not a big deal of course, just
something that could be done a bit more optimal:)).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpdpnLR5mSxt.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 09:42:40AM -0400, John Baldwin wrote:
 On Thursday, March 14, 2013 4:44:20 am Pawel Jakub Dawidek wrote:
  On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
   
   On 13.03.13 23:08, Pawel Jakub Dawidek wrote:
   
I think I considered something similar at first, but the change I 
proposed was optimal, IMHO at the cost of producing pretty large
diff, because of indentation change. But to be sure, can you send a
patch of your proposed change?
   
   http://erdgeist.org/arts/software/Code/pidfile.c.diff
  
  Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true
  on FreeBSD, but is not portable. Also in case pidptr is NULL you compare
  errno three times instead of just one (not a big deal of course, just
  something that could be done a bit more optimal:)).
 
 Geeze, why not just add an else.  That's the really short diff:

Heh, I did consider that as well, but here you check errno twice,
instead of once. Guys, is there anything wrong with the patch I
proposed?

 Index: pidfile.c
 ===
 --- pidfile.c (revision 248162)
 +++ pidfile.c (working copy)
 @@ -140,7 +140,8 @@ pidfile_open(const char *path, mode_t mode, pid_t
   *pidptr = -1;
   if (errno == 0 || errno == EAGAIN)
   errno = EEXIST;
 - }
 + } else if (errno == EWOULDBLOCK)
 + errno = EEXIST;
   free(pfh);
   return (NULL);
   }

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpXFp9L0bjdx.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 10:11:07AM -0700, Chuck Swiger wrote:
 Hi--
 
 On Mar 14, 2013, at 9:50 AM, John Baldwin wrote:
  On Thursday, March 14, 2013 12:29:58 pm Pawel Jakub Dawidek wrote:
 
 [ ... ]
  Heh, I did consider that as well, but here you check errno twice,
  instead of once. Guys, is there anything wrong with the patch I
  proposed?
  
  I'm sure the compiler can work that out just fine and it should do whatever
  is most readable to the programmer.  I don't care either way.
 
 Strong +1.  Having the code be correct and readable is much more important
 then trying to hand-optimize a single-digit # of integer compares in
 startup code that usually runs ~once per process.

Well, I think my version is more obvious, just the diff is larger.
Anyway, I think enough has been said already about this crucial change:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpeAnHpmUc3i.pgp
Description: PGP signature


Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168

2013-04-15 Thread Pawel Jakub Dawidek
, v_mount = 
 0x0, v_nmntvnodes = {
 tqe_next = 0xfe014fd95760, tqe_prev = 0xfe011d500958}, v_un = 
 {vu_mount = 0x0, vu_socket = 0x0, 
 vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 
 0x0}, v_cache_src = {
 lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 
 0xfe01967007b0}, v_cache_dd = 0x0, 
   v_lock = {lock_object = {lo_name = 0x80dddbb1 zfs, lo_flags = 
 91881472, lo_data = 0, 
   lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri 
 = 96}, v_interlock = {
 lock_object = {lo_name = 0x807bfbb9 vnode interlock, lo_flags = 
 16908288, lo_data = 0, 
   lo_witness = 0x0}, mtx_lock = 6}, v_vnlock = 0xfe01967007c8, 
 v_actfreelist = {
 tqe_next = 0xfe0031985b10, tqe_prev = 0xfe014fd95820}, v_bufobj = 
 {bo_mtx = {lock_object = {
 lo_name = 0x807bfbc9 bufobj interlock, lo_flags = 16908288, 
 lo_data = 0, 
 lo_witness = 0x0}, mtx_lock = 6}, bo_ops = 0x80a5af10, 
 bo_object = 0x0, bo_synclist = {
   le_next = 0x0, le_prev = 0x0}, bo_private = 0xfe0196700760, 
 __bo_vnode = 0xfe0196700760, 
 bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe0196700880}, 
 bv_root = 0x0, bv_cnt = 0}, 
 bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe01967008a0}, 
 bv_root = 0x0, bv_cnt = 0}, 
 bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, 
 v_label = 0x0, v_lockf = 0x0, 
   v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfe01967008e8}, 
 rl_currdep = 0x0}, v_cstart = 0, 
   v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 0, 
 v_iflag = 128, v_vflag = 4, 
   v_writecount = 0, v_hash = 26636295, v_type = VBAD}
 
 
 # kgdb -n 0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 panic: negative refcount 0xfe0059a400c8
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff823aff8770
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff823aff8820
 vpanic() at vpanic+0x127/frame 0xff823aff8860
 kassert_panic() at kassert_panic+0x136/frame 0xff823aff88d0
 closef() at closef+0x1ff/frame 0xff823aff8960
 closefp() at closefp+0xa0/frame 0xff823aff89b0
 amd64_syscall() at amd64_syscall+0x1f9/frame 0xff823aff8ab0
 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff823aff8ab0
 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80aeaaa8a, rsp = 
 0x7fffbd28, rbp = 0x7fffbd40 ---
 Uptime: 21m3s
 [...]
 (kgdb) bt
 #0  doadump (textdump=1) at pcpu.h:231
 #1  0x804f5827 in kern_reboot (howto=260) at 
 /freebsd-src/local/sys/kern/kern_shutdown.c:447
 #2  0x804f5d36 in vpanic (fmt=value optimized out, ap=value 
 optimized out)
 at /freebsd-src/local/sys/kern/kern_shutdown.c:754
 #3  0x804f5bc6 in kassert_panic (fmt=value optimized out)
 at /freebsd-src/local/sys/kern/kern_shutdown.c:642
 #4  0x804b900f in closef (fp=value optimized out, td=value 
 optimized out) at refcount.h:66
 #5  0x804b7030 in closefp (fdp=0xfe018dc79800, fd=value 
 optimized out, fp=0xfe0059a400a0, 
 td=0xfe016dfca920, holdleaders=value optimized out)
 at /freebsd-src/local/sys/kern/kern_descrip.c:1136
 #6  0x806e26c9 in amd64_syscall (td=0xfe016dfca920, traced=0) at 
 subr_syscall.c:134
 #7  0x806cb13b in Xfast_syscall () at exception.S:387
 #8  0x00080aeaaa8a in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 Current language:  auto; currently minimal
 (kgdb) 
 
  
  Thanks,
  
  Shawn Webb
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgprAfmBcAPgt.pgp
Description: PGP signature


Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168

2013-07-01 Thread Pawel Jakub Dawidek
On Sun, Jun 30, 2013 at 01:18:36PM +0200, Mateusz Guzik wrote:
 On Sun, Jun 30, 2013 at 05:21:42PM +1000, Kubilay Kocak wrote:
  I'm seeing what I believe is related panic, reliably being generated by
  the Python regression test suite on a newly created FreeBSD 10-CURRENT
  buildbot.
  
  Symptoms first seen in an freebsd.org FTP snapshot dated Thu May 30
  20:01:46 UTC 2013 and also reproducible on a freshly updated r252400
  
  It is additionally reproducible after checking out pure upstream python
  sources, using the following steps:
  
  hg clone http://hg.python.org/cpython
  cd cpython  configure  make buildbottest
  
  An interesting possible correlation is that it seems to drop out
  during/around test_socket
  
 
 Turns out the bug is quite funny ;)
 
 Try this:
 diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
 index 5d8e814..7a4db04 100644
 --- a/sys/kern/uipc_usrreq.c
 +++ b/sys/kern/uipc_usrreq.c
 @@ -1764,8 +1764,8 @@ unp_externalize(struct mbuf *control, struct mbuf 
 **controlp, int flags)
   }
   for (i = 0; i  newfds; i++, fdp++) {
   fde = fdesc-fd_ofiles[*fdp];
 - fde-fde_file = fdep[0]-fde_file;
 - filecaps_move(fdep[0]-fde_caps,
 + fde-fde_file = fdep[i]-fde_file;
 + filecaps_move(fdep[i]-fde_caps,
   fde-fde_caps);
   if ((flags  MSG_CMSG_CLOEXEC) != 0)
   fde-fde_flags |= UF_EXCLOSE;

Thanks for tracking it down before I had time to get to it!
The change looks good.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpHVKRcu5rHH.pgp
Description: PGP signature


HEADSUP! dhclient(8) sandboxing.

2013-07-03 Thread Pawel Jakub Dawidek
Hi.

I've just committed Capsicum sandboxing for the dhclient(8).
Let me know (ideally by sending e-mail to current@ and CCing me) if you
notice any weird behaviour.

The work was sponsored by the FreeBSD Foundation.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpG83WhmJLcx.pgp
Description: PGP signature


Re: HEADSUP! dhclient(8) sandboxing.

2013-07-04 Thread Pawel Jakub Dawidek
On Wed, Jul 03, 2013 at 11:04:21PM -0700, Alfred Perlstein wrote:
 On 7/3/13 3:52 PM, Pawel Jakub Dawidek wrote:
  Hi.
 
  I've just committed Capsicum sandboxing for the dhclient(8).
  Let me know (ideally by sending e-mail to current@ and CCing me) if you
  notice any weird behaviour.
 
  The work was sponsored by the FreeBSD Foundation.
 
 It broke running dhclient on igb0 for me.  It says interface not found 
 or something to that effect.
 
 Can I help somehow?
 
 Basically just ifconfig down igb0 then try to run dhclient.  It will 
 not work.  If you up the interface and then run it, it is OK.
 
 See attached image.

Thanks for the report. Could you try this patch?

http://people.freebsd.org/~pjd/patches/dhclient.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp0Gs_Z1C8ZM.pgp
Description: PGP signature


Re: HEADSUP! dhclient(8) sandboxing.

2013-07-04 Thread Pawel Jakub Dawidek
On Thu, Jul 04, 2013 at 04:55:14PM +0400, Andrey Chernov wrote:
 On 04.07.2013 2:52, Pawel Jakub Dawidek wrote:
  I've just committed Capsicum sandboxing for the dhclient(8).
  Let me know (ideally by sending e-mail to current@ and CCing me) if you
  notice any weird behaviour.
 
 I don't test one your very recent commit yet, but whole previous commits
 chain case dhclient broken:
 
 Starting dhclient.
 em0: no link .. got link
 em0: not found
 exiting.
 /etc/rc.d/dhclient: WARNING: failed to start dhclient
 
 and a bit later in rc
 
 Starting dhclient.
 em0: not found
 exiting.
 /etc/rc.d/dhclient: WARNING: failed to start dhclient

It should be fixed in r252697. Could you give it a try?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpvt0u0Xs1b4.pgp
Description: PGP signature


Re: r253070 and disappearing zpool

2013-07-22 Thread Pawel Jakub Dawidek
On Mon, Jul 22, 2013 at 10:29:40AM +0300, Andriy Gapon wrote:
 I think that this setup (on ZFS level) is quite untypical, although not
 impossible on FreeBSD (and perhaps only FreeBSD).
 It's untypical because you have separate boot pool (where loader, loader.conf
 and kernel are taken from) and root pool (where / is mounted from).

As I said elsewhere, it is pretty typical when full disk encryption is
used. The /boot/ has to be unencrypted and can be stored on eg. USB
pendrive which is never left unattended, unlike laptop which can be left
in eg. a hotel room, but with entire disk encrypted.

 So, I see three ways of resolving the problem that my changes caused for your
 configuration.
 
 1.  [the easiest] Put zpool.cache loading instructions that used to be in
 defaults/loader.conf into your loader.conf.  This way everything should work 
 as
 before -- zpool.cache would be loaded from your boot pool.
 
 2. Somehow (I don't want to go into any technical details here) arrange that
 your root pool has /boot/zfs/zpool.cache that describes your boot pool.  This 
 is
 probably hard given that your /boot is a symlink at the moment.  This probably
 would be easier to achieve if zpool.cache lived in /etc/zfs.
 
 3. [my favorite]  Remove an artificial difference between your boot and root
 pools, so that they are a single root+boot pool (as zfs gods intended).  As 
 far
 as I understand your setup, you use GELI to protect some sensitive data.
 Apparently your kernel is not sensitive data, so I wonder if your /bin/sh or
 /sbin/init are really sensitive either.
 So perhaps you can arrange your unencrypted pool to hold all of the base 
 system
 (boot + root) and put all your truly sensitive filesystems (like e.g. /home or
 /var/data or /opt/xyz) onto your encrypted pool.

If all you care about is laptop being stolen, then that would work.

If you however want to be protected from someone replacing your /sbin/init
with something evil then you use encryption or even better integrity
verification also supported by GELI.

Remember, tools not policies.

There is also option number 4 - backing out your commit.

When I saw your commit removing those entries from defaults/loader.conf,
I thought it is fine, as we now don't require zpool.cache to import the
root pool, which was, BTW, very nice and handy improvement. Now that we
know it breaks existing installations I'd prefer the commit to be backed
out. This is because apart from breaking some existing installations it
doesn't gain us anything.

 So I understand that my change causes a problem for a setup like yours, but I
 believe that the change is correct.

The change is clearly incorrect or incomplete as it breaks existing
installations and doesn't allow for full disk encryption configuration
on ZFS-only systems.

BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's
fine by me, although the migration might be tricky.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpG8GeaQjVQd.pgp
Description: PGP signature


Re: r253070 and disappearing zpool

2013-07-25 Thread Pawel Jakub Dawidek
On Wed, Jul 24, 2013 at 02:47:11PM +0300, Andriy Gapon wrote:
 on 22/07/2013 23:38 Pawel Jakub Dawidek said the following:
  The /boot/ has to be unencrypted and can be stored on eg. USB
  pendrive which is never left unattended, unlike laptop which can be left
  in eg. a hotel room, but with entire disk encrypted.
 
 As we discussed elsewhere, there are many options of configuring full disk
 encryption.  Including decisions whether root filesystem should be separate 
 from
 boot filesystem, choice of filesystem type for boot fs, ways of tying various
 pieces together, and many more.
 
 I do not believe that my change is incompatible with full disk encryption in
 general.

Maybe you can imagine many ways of configuring it, but definiately the
most typical one is to have separate /boot/ from /, where /boot/ is
unencrypted and where you use one file system type for both (UFS or ZFS).

 Let's also recall that the system was not created / configured by any of the
 existing official or semi-official tools and thus it does not represent any
 recommended way of setting up such systems.  Glen configured it this way, but 
 it
 doesn't mean that that is the way.

Note that there are no official tools to install FreeBSD on ZFS. Is that
enough reason to stop supporting it?

What Glen did is the recommended way of setting up full disk encryption
with ZFS. I'd do it the same way and I'd recommend this configuration to
anyone who will (or did) ask me.

 I think that there are many of ways of changing configuration of that system 
 to
 make behave as before again.
 Three I mentioned already.  Another is to add rc script to import the boot 
 pool,
 given that it is a special, designated pool.  Yet another is to place
 zpool.cache onto the root pool and use nullfs (instead of a symlink) to make
 /boot be from the boot pool but /boot/zfs be from the root pool.

Come on...

  BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's
  fine by me, although the migration might be tricky.
 
 Yes, that's migration that's scary to me too.
 
 
 Now, about the postponed points.
 I will reproduce a section from my email that you've snipped.
 
  P.S.
  ZFS/FreeBSD boot process is extremely flexible.  For example zfsboot can 
  take
  zfsloader from pool1/fsA, zfsloader can boot kernel from pool2/fsB and 
  kernel
  can mount / from pool3/fsC.  Of these 3 filesystems from where should
  zpool.cache be taken?
  My firm opinion is that it should be taken from / (pool3/fsC in the example
  above).  Because it is the root filesystem that defines what a system is 
  going
  to do ultimately: what daemons are started, with what configurations, etc.
  And thus it should also determine what pools to auto-import.
  We can say that zpool.cache is analogous to /etc/fstab in this respect.
 
 So do you or do you not agree with my reasoning about from where zpool.cache
 should be taken?
 If you do not, then please explain why.
 If you do, then please explain how this would be compatible with the old way 
 of
 loading zpool.cache.

I don't have a strong opinion about this. As I said above I'm fine with
moving zpool.cache to /etc/zfs/ if we can ensure it won't break existing
installations.

Still I'm not sure this was your initial goal, because you weren't aware
of systems with separate boot pool until recently (if you were aware of
this I hope you wouldn't commit the change without prior discussion).
Which means in your eyes zpool.cache was always part of the root pool,
because /boot/ was.

 I think that ensuring that zpool.cache is always loaded from a root filesystem
 is the gain from my change.

Were people complaining about zpool.cache being loaded from /boot/zfs/
and not from /etc/zfs/? I don't think so. But people do complain about
boot pool not being autoimported. In my opinion for the end user it
doesn't really matter if it is /etc/zfs/zpool.cache or
/boot/zfs/zpool.cache, as both directories are available once the system
is booted. For most people those two directories are placed on the same
file system. For some people who actually care if this is /etc/zfs/ or
/boot/zfs/, because those are separate file systems the latter works,
the former doesn't.

In my opinion the gain, if any, is only theoretical.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpF_b3WkFXBB.pgp
Description: PGP signature


Re: diskid documentation

2014-06-02 Thread Pawel Jakub Dawidek
On Mon, Jun 02, 2014 at 11:01:08AM -0700, John-Mark Gurney wrote:
 Michael W. Lucas wrote this message on Mon, Jun 02, 2014 at 11:36 -0400:
  On Mon, Jun 02, 2014 at 10:45:52AM -0400, Ryan Stone wrote:
   On Mon, Jun 2, 2014 at 9:26 AM, Allan Jude allanj...@freebsd.org wrote:
It also tends to sometimes hide the gpt label provider on me (not sure
in which cases it does this, but it is annoying)
   
   This happens when something (e.g. zfs) happens to open the diskid
   provider instead of the gpt label.  For me this ended up being a bit
   more than annoying; my swap was mounted in /etc/fstab via a gpt label
   so I silently lost my swap when I did an upgrade.
  
  Wait-- one type of one label can hide another?
  
  I thought a big point of labels was to remove ambiguity...
 
 Surprisingly, yes...  I didn't think about this, but it's true...
 
 A disk will get exported via two different devices, diskid and normal
 da/ada...  The tasting will go through and create all the necessary
 sub devices, but the problem is that we now have two different paths,
 and if something opens the diskid path, then the da/ada paths all
 disappear...
 
 This sounds like we need to fix geom to bind the two together so
 that when one opens, the other doesn't disappear... The problem is
 that geom views them as two separate disks when in fact they are the
 same...  someone who knows geom well should think about how to solve
 this problem, as diskid isn't the first time this has happened, just
 most prevalent w/ ZFS and diskid.

The problem is that GPT labels (or GPT IDs for that matter) should not
be implemented within GLABEL. This is wrong. It should be implemented as
part of GPART, so that GPART would create ada0p1, gpt/label and
gptid/whatever. Opening one of those should not make the others
disappear then. Only opening ada0 for writting would make them disappear.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp8HNxiuiLpJ.pgp
Description: PGP signature


Re: diskid documentation

2014-06-03 Thread Pawel Jakub Dawidek
On Mon, Jun 02, 2014 at 03:27:06PM -0700, John-Mark Gurney wrote:
 Pawel Jakub Dawidek wrote this message on Mon, Jun 02, 2014 at 22:26 +0200:
  The problem is that GPT labels (or GPT IDs for that matter) should not
  be implemented within GLABEL. This is wrong. It should be implemented as
  part of GPART, so that GPART would create ada0p1, gpt/label and
  gptid/whatever. Opening one of those should not make the others
  disappear then. Only opening ada0 for writting would make them disappear.
 
 even gpart would be wrong IMO... What happens if there is another
 provider like GPART, but different, do they need to implement diskid
 creation too to prevent the same issue?
 
 Shouldn't geom be updated to say, this ident is an alias, everything
 you do w/ this, it's exactly the same as the other one?  This would
 also have the advantage of possibly removing one layer in the call
 chain when dealing w/ IO. (or does GEOM has a pass-through flag that
 says, I don't do anything, just skip me?)

As for disk IDs it definitely shouldn't be implemented in GPART or
GLABEL. IMHO the right place is the DISK class - both ada0 and
diskid-of-ada0 should exist on the same rights (two providers of one
geom). This also would address your concern about additional layer.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpUBPO31HwbW.pgp
Description: PGP signature


Re: taskqueue_create() name parameter lieftime

2010-11-16 Thread Pawel Jakub Dawidek
On Tue, Nov 16, 2010 at 08:27:11AM -0500, John Baldwin wrote:
 On Tuesday, November 16, 2010 7:20:47 am Andriy Gapon wrote:
  
  taskqueue_create() documentation never explicitly says this, but current
  taskqueue_create() implementation just stores a 'name' pointer parameter
  internally.  Thus it depends on the 'name' having a life time encompassing 
  that of
  the taskqueue.
  I think that alternatively we could have copied the name (or a portion of 
  it) into
  an internal buffer.
  I don't any argument for either approach, just curious which one looks more
  preferable from general (FreeBSD, kernel) programming practices point of 
  view.
 
 Hmm, in many other places we store a separate copy (e.g. all the interrupt
 code uses separate MAXCOMLEN char arrays to hold names).  If that is easy to
 do, that is probably the best approach.

The most friendly API would keep the name internally, but would also
allow me to provide name in printf-like format, so I don't have to use
sprint()/snprintf() before calling it. This unfortunatelly will change
taskqueue API as name is the first argument, which makes it not worth
the pain.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3yVgaHDkwq.pgp
Description: PGP signature


Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
Hi.

The new patchset is ready for testing:

http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2

When applying the patch be sure to use correct options for patch(1)!:

# cd /usr/src
# fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
# bzip2 -d zfs_20101212.patch.bz2
# patch -E -p0  zfs_20101212.patch

The patch is against FreeBSD HEAD as of 2010-12-12.

Some of the changes since the last patchset (zfs_20100831.patch):

- Boot support for ZFS v28 (only RAIDZ3 is not yet supported).
- Various fixes for the existing ZFS boot code.
- Support for sendfile(2) (by avg@).
- Userland-kernel compatibility with v13-v15 (by mm@).
- ACL fixes (by trasz@).
- Various bug fixes.

Please test, test, test. Chances are this is the last patchset before
v28 going to HEAD (finally). Especially test new changes, like boot
support and sendfile(2) support. Also be sure to verify if you can
import for existing ZFS pools (v13-v15) when running v28 or boot from
your existing pools.

Enjoy!

PS. Martin (mm@) will be providing patch against 8-STABLE soon.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptzjMdmsjno.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
 Hi.
 
 The new patchset is ready for testing:
 
   http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
 
 When applying the patch be sure to use correct options for patch(1)!:
 
   # cd /usr/src
   # fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
   # bzip2 -d zfs_20101212.patch.bz2
   # patch -E -p0  zfs_20101212.patch
[...]

If patch(1) reports reject of sys/cddl/compat/opensolaris/sys/sysmacros.h
file or you see the following error while compiling world:

/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:249:
 undefined reference to `MIN'
strtab.o(.text+0x28d): In function `strtab_insert':
/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:119:
 undefined reference to `MIN'
strtab.o(.text+0x3a1):/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:145:
 undefined reference to `MIN'
*** Error code 1

Simple remove sys/cddl/compat/opensolaris/sys/sysmacros.h file from the tree.

Unfortunately the patch can either works on source downloaded via cvsup or on
the source downloaded via subversion as those two have different $FreeBSD$ id
strings (at least in case of this file). The patch is generated based on
subversion source, so if you use cvsup, you most likely will see the reject and
the error.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp46myIfopSX.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
 Hi.
 
 The new patchset is ready for testing:
 
   http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2

You can also download the whole source tree already patched from here:

http://people.freebsd.org/~pjd/zfs_20101212.tbz

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpJ41aQDwAYd.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 11:00:31PM -, Steven Hartland wrote:
 What's the expected behaviour for the sendfile changes as
 sendfile is one of the problems we have here with the
 double memory allocation required for it under ZFS compared
 to UFS. Does this patch address that?

No. The patch doesn't address that. It only adds support for
sendfile(2), as it was commented out in the previous patchset.

 Inspecting the patch the following segment looks odd:-
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
 ...
while (n  0) {
nbytes = MIN(n, zfs_read_chunk_size -
P2PHASE(uio-uio_loffset, zfs_read_chunk_size));
 
 +#ifdef __FreeBSD__
 +   if (uio-uio_segflg == UIO_NOCOPY)
 +   error = mappedread_sf(vp, nbytes, uio);
 +   else
 +#endif /* __FreeBSD__ */
if (vn_has_cached_data(vp))
error = mappedread(vp, nbytes, uio);
else
 
 Is there an extra else in there which will break things or should
 the __FreeBSD__ mappedread_sf block replace the standard mappedread
 call or is the indentation just a bit weird?

The code is correct. It is just hard to split 'else' and 'if' with a
'#endif' and keep the indentation pretty. Depends on the conditions we
use one of the three methods to read the data.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpSKGrAP0AYX.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-14 Thread Pawel Jakub Dawidek
On Tue, Dec 14, 2010 at 03:20:05PM +0100, Olivier Smedts wrote:
  make installworld
 
 That's what I wanted to do, and why I rebooted single-user on the new
 kernel. But isn't the v13-v15 userland supposed to work with the v28
 kernel ?

Yes, it is suppose to work. Exactly to be able to follow FreeBSD common
upgrade path. Martin was working on this (CCed).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpCsgsK8Mp9u.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-15 Thread Pawel Jakub Dawidek
On Wed, Dec 15, 2010 at 10:15:00PM -0500, ben wilber wrote:
 On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
  Hi.
  
  The new patchset is ready for testing:
 
 Running fine for 24 hours now under load with a ~50 disk v15 (not
 upgraded) pool from -CURRENT.  Thanks!
 
 Only strange thing is the rc script complains:
 
 /etc/rc: DEBUG: run_rc_command: doit: zvol_start 
 unrecognized command 'volinit'
 usage: zfs command args ...

Did you run mergemaster(8) after the upgrade? The patch includes change
to etc/rc.d/zvol to remove 'zfs volinit'/'zfs volfini' which are no
longer available.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp7c4gzudIbP.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-19 Thread Pawel Jakub Dawidek
On Fri, Dec 17, 2010 at 12:54:36AM +0300, Rechistov Grigory (Речистов Григорий) 
wrote:
 I started to check the new ZFS version inside a VirtualBox machine. So far  
 it works for me without crashes, but I got some observations worth  
 mentioning. Here are the steps I made:
 
 1. Installed 8.1-RELEASE (from minimal install  CD)
 2. Csup'ped sources to CURRENT (as of 14/12/2010) [note that I haven't  
 used SVN repository]
 3. Applied the patch in question.
 4. Created a zpool raidz of two disks of old  version 15. Also some usual  
 tuning of ZFS in loader.conf was done as I am running 32 bit version with  
 low amount of memory.  zfs_enable=YES in rc.conf was added too.
 4.1 Moved /usr/ports to ZFS to have some files on it.
 5. Make buildworld, buildkernel, installkernel, installworld - all the  
 canonical steps from the Handbook.
 6. After reboot to final 9.0-CURRENT world I got a dmesg with some trace  
 stack related to ZFS and also a rc.d script message about unrecognized  
 command 'volinit' (see the text of it in attachment).

This one is because mergemaster(8) skips files with the same $FreeBSD$
value, so you need to copy /usr/src/etc/rc.d/zvol to /etc/rc.d/ by hand.

 7. Nevertheless the system booted. Files
 8. `zpool upgrade -a` worked all right and reported that now I have ZFS  
 version 28
 
 Overall I am pleasantly surprised how streamlined the whole process was.

That's good to hear, thanks.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp5O7SANNIX6.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2011-01-04 Thread Pawel Jakub Dawidek
On Wed, Dec 15, 2010 at 10:15:40AM +0200, Andrei Kolu wrote:
 2010/12/14 Pawel Jakub Dawidek p...@freebsd.org
 
  On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
   Hi.
  
   The new patchset is ready for testing:
  
         http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
 
  You can also download the whole source tree already patched from here:
 
         http://people.freebsd.org/~pjd/zfs_20101212.tbz
 
 
 # uname -a
 FreeBSD freebsd9.raidon.eu 9.0-CURRENT FreeBSD 9.0-CURRENT #0: Tue Dec
 14 14:37:01 EET 2010
 r...@freebsd9.raidon.eu:/usr/obj/usr/src/sys/GENERIC  amd64
 
 Create files filled with zeroes:
 # mkfile 512m disk1 disk2 disk3 disk4
 # zpool create andmed raidz /home/antik/disk{1,2,3,4}
 # zpool status andmed
   pool: andmed
  state: ONLINE
  scan: none requested
 config:
 
 NAME   STATE READ WRITE CKSUM
 andmed ONLINE   0 0 0
   raidz1-0 ONLINE   0 0 0
 /home/antik/disk1  ONLINE   0 0 0
 /home/antik/disk2  ONLINE   0 0 0
 /home/antik/disk3  ONLINE   0 0 0
 /home/antik/disk4  ONLINE   0 0 0
 
 errors: No known data errors
 
 Now let's try to scrub:
 # zpool scrub andmed
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 1; apic id = 01
 fault virtual address = 0x1fb8007b
 fault code = supervisor read data, page not present
 instruction pointer = 0x20:0x812967d2
 stack pointer = 0x20:0xff80ee605548
 frame pointer = 0x28:0xff80ee605730
 code segment = base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres1, long 1, def32 0, gran 1
 processor eflags = interrupt enabled, resume, IOPL = 0
 current process = 2081 (initial thread)
 [ thread pid 2081 tid 100121 ]
 Stopped at  vdev_file_open+0x92:  testb  $0x20,0x7b(%rax)

Could you verify if this patch fixes the problem for you?

http://people.freebsd.org/~pjd/patches/vdev_file.c.2.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgplp1JmNuuvJ.pgp
Description: PGP signature


Re: My ZFS v28 Testing Experience

2011-01-13 Thread Pawel Jakub Dawidek
On Wed, Jan 12, 2011 at 11:03:19PM -0400, Chris Forgeron wrote:
 I've been testing out the v28 patch code for a month now, and I've yet to 
 report any real issues other than what is mentioned below. 
 
 I'll detail some of the things I've tested, hopefully the stability of v28 in 
 FreeBSD will convince others to give it a try so the final release of v28 
 will be as solid as possible.
 
 I've been using FreeBSD 9.0-CURRENT as of Dec 12th, and 8.2PRE as of Dec 16th
 
 What's worked well:
 
 - I've made and destroyed small raidz's (3-5 disks), large 26 disk raid-10's, 
 and a large 20 disk raid-50.
 - I've upgraded from v15, zfs 4, no issues on the different arrays noted above
 - I've confirmed that a v15 or v28 pool will import into Solaris 11 Express, 
 and vice versa, with the exception about dual log or cache devices noted 
 below. 
 - I've run many TB of data through the ZFS storage via benchmarks from my 
 VM's connected via NFS, to simple copies inside the same pool, or copies from 
 one pool to another. 
 - I've tested pretty much every compression level, and changing them as I 
 tweak my setup and try to find the best blend.
 - I've added and subtracted many a log and cache device, some in failed 
 states from hot-removals, and the pools always stayed intact.

Thank you very much for all your testing, that's really a valuable
contribution. I'll be happy to work with you on tracking down the
bottleneck in ZFSv28.

 Issues:
 
 - Import of pools with multiple cache or log devices. (May be a very minor 
 point)
 
 A v28 pool created in Solaris 11 Express with 2 or more log devices, or 2 or 
 more cache devices won't import in FreeBSD 9. This also applies to a pool 
 that is created in FreeBSD, is imported in Solaris to have the 2 log devices 
 added there, then exported and attempted to be imported back in FreeBSD. No 
 errors, zpool import just hangs forever. If I reboot into Solaris, import the 
 pool, remove the dual devices, then reboot into FreeBSD, I can then import 
 the pool without issue. A single cache, or log device will import just fine. 
 Unfortunately I deleted my witness-enabled FreeBSD-9 drive, so I can't easily 
 fire it back up to give more debug info. I'm hoping some kind soul will 
 attempt this type of transaction and report more detail to the list.
 
 Note - I just decided to try adding 2 cache devices to a raidz pool in 
 FreeBSD, export, and then importing, all without rebooting. That seems to 
 work. BUT - As soon as you try to reboot FreeBSD with this pool staying 
 active, it hangs on boot. Booting into Solaris, removing the 2 cache devices, 
 then booting back into FreeBSD then works. Something is kept in memory 
 between exporting then importing that allows this to work.  

Unfortunately I'm unable to reproduce this. It works for me with 2 cache
and 2 log vdevs. I tried to reboot, etc. My test exactly looks like
this:

# zpool create tank raidz ada0 ada1
# zpool add tank cache ada0 ada1
# zpool export tank
# kldunload zfs
# zpool import tank
works
# reboot
works

 - Speed. (More of an issue, but what do we do?)
 
 Wow, it's much slower than Solaris 11 Express for transactions. I do 
 understand that Solaris will have a slight advantage over any port of ZFS. 
 All of my speed tests are made with a kernel without debug, and yes, these 
 are -CURRENT and -PRE releases, but the speed difference is very large.

Before we go any further could you please confirm that you commented out
this line in sys/modules/zfs/Makefile:

CFLAGS+=-DDEBUG=1

This turns all kind of ZFS debugging and slows it down a lot, but for
the correctness testing is invaluable. This will be turned off once we
import ZFS into FreeBSD-CURRENT.

BTW. In my testing Solaris 11 Express is much, much slower than
FreeBSD/ZFSv28. And by much I mean two or more times in some tests.
I was wondering if they have some debug turned on in Express.

 At first, I thought it may be more of an issue with the ix0/Intel X520DA2 
 10Gbe drivers that I'm using, since the bulk of my tests are over NFS (I'm 
 going to use this as a SAN via NFS, so I test in that environment). 
 
 But - I did a raw cp command from one pool to another of several TB. I 
 executed the same command under FreeBSD as I did under Solaris 11 Express. 
 When executed in FreeBSD, the copy took 36 hours. With a fresh destination 
 pool of the same settings/compression/etc under Solaris, the copy took 7.5 
 hours. 

When you turn off compression (because it turns all-zero blocks into
holes) you can test it by simply:

# dd if=/dev/zero of=/zfs_fs/zero bs=1m

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgprFLLYTe9F4.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-01-31 Thread Pawel Jakub Dawidek
On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote:
[...]
 cc -O2 -pipe  -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT 
 -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall 
 -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes 
 -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual 
 -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align 
 -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls 
 -Wold-style-definition -Wno-pointer-sign -c 
 /src/sbin/hastctl/../hastd/proto_common.c
 cc1: warnings being treated as errors
 /src/sbin/hastctl/../hastd/proto_common.c: In function 
 'proto_common_descriptor_send':
 /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases 
 required alignment of target type
 /src/sbin/hastctl/../hastd/proto_common.c: In function 
 'proto_common_descriptor_recv':
 /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases 
 required alignment of target type
 /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases 
 required alignment of target type
 *** Error code 1

Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with
high WARNS? With WARNS=6 I get those errors and I've no idea how to fix
it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed,
but maybe I'm wrong?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgphFUx7Q3q4K.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-02-01 Thread Pawel Jakub Dawidek
On Mon, Jan 31, 2011 at 04:56:06PM -0800, Marcel Moolenaar wrote:
 
 On Jan 31, 2011, at 3:51 PM, Pawel Jakub Dawidek wrote:
 
  On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote:
  [...]
  cc -O2 -pipe  -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT 
  -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall 
  -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes 
  -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual 
  -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align 
  -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls 
  -Wold-style-definition -Wno-pointer-sign -c 
  /src/sbin/hastctl/../hastd/proto_common.c
  cc1: warnings being treated as errors
  /src/sbin/hastctl/../hastd/proto_common.c: In function 
  'proto_common_descriptor_send':
  /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases 
  required alignment of target type
  /src/sbin/hastctl/../hastd/proto_common.c: In function 
  'proto_common_descriptor_recv':
  /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases 
  required alignment of target type
  /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases 
  required alignment of target type
  *** Error code 1
  
  Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with
  high WARNS? With WARNS=6 I get those errors and I've no idea how to fix
  it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed,
  but maybe I'm wrong?
 
 this warning indicates that you're casting from a pointer to type P
 (P having alignment constraints Ap) to a pointer to type Q (Q having
 alignment constraints Aq), and Aq  Ap. The compiler tells you that
 you may end up with misaligned accesses.
 
 If you know that the pointer satisfies Aq, you can cast through (void *)
 to silence the compiler. If you cannot guarantee that, you have a bigger
 problem. Solutions include packing type Q to reduce Aq or to copy the
 data to a local variable.
 
 Take the statement at line 116 for example:
   *((int *)CMSG_DATA(cmsg)) = fd;
 
 We're effectively casting from a (char *) to a (int *) and then doing
 a 32-bit access (write). The easy fix (casting through (void *) is not
 possible, because you cannot guarantee that the address is properly
 aligned. cmsg points to memory set aside by the following local
 variable:
   unsigned char ctrl[CMSG_SPACE(sizeof(fd))];
 
 There's no guarantee that the compiler will align the character array
 at a 32-bit boundary (though in practice it seems to be). I have seen
 this kind of construct fail on ARM and PowerPC for example.
 
 In any case: The safest approach here is to use le32enc or be32enc
 rather than casting through (void *). Obviously these function encode
 using a fixed byte order when the original code is using the native
 byte order of the CPU. Having native encoding functions help.
 
 You could use bcopy as well, but the compiler is typically too smart
 for its own good and it will try to optimize the call away. This
 leaves you with the same misaligned access that you tried to avooid
 by using bcopy(). You need to trick the compiler so that it won't
 optimize the bcopy away, like:
   bcopy((void *)fd, CMSG_DATA(cmsg), sizeof(fd));

Interesting. I did use bcopy() to silence the warning, but the need to
cast to (void *) is surprising.

Still, I'm more concerned with CMSG_NXTHDR() macro, which from what I
see might not be fixed by casting arguments.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpEWqZfPoVvr.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 06:11:34AM +, Philip M. Gollucci wrote:
 All,
 
 I have a zroot(mirror)+zmysql(raidz2) setup on a MySQL db box.
 One drive failed (mfid3).  We've since replaced it.
 
 I can't for the life of me get zpool to replace it. I can't remember why
 I used gpt instead of direct disks for the zmysql pool (but thats how it
 is).  I've tried all of the following commands with different errors,
 and I must say I'm stumped.  I've done this several times before for the
 ASF (but no gpt at play there).
 
 $ zpool scrub zmysql
 just runs, and completes, no error
 
 $ zpool replace zmysql gpt/disk3
 cannot replace gpt/disk3 with gpt/disk3: one or more devices is
 currently unavailable
[...]
 $ zpool offline zmysql gpt/disk3
 cannot offline gpt/disk3: no valid replicas

I'm afraid this is ZFS bug that is fixed in v28 for sure, not sure
about v14/v15.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpvKfbSsGxHk.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote:
 Do you have a bug ID ?

I think it is 6328632. Change 5a60f16123ba. Note, there are many, many
other unrelated changes.

 Do you have any work arounds?

From what I can see, this change is in HEAD already, so I'll try that.

 Will a reboot help ?

No idea, sorry.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpEXAC6VatmN.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 08:08:15PM +, Philip M. Gollucci wrote:
 On 02/03/11 20:02, Pawel Jakub Dawidek wrote:
  On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote:
  Do you have a bug ID ?
  
  I think it is 6328632. Change 5a60f16123ba. Note, there are many, many
  other unrelated changes.
  
  Do you have any work arounds?
  
  From what I can see, this change is in HEAD already, so I'll try that.
 Do you have a pointer to how to get the hg repo handy.  There's no diff
 there.

The repo is still online:

ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate

But if you are thinking about extracting only part of the change
responsible for your problem that might not be easy.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpmkyX9M3bLW.pgp
Description: PGP signature


Re: [PATCH] OpenSolaris/ZFS: C++ compatibility

2011-02-05 Thread Pawel Jakub Dawidek
On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote:
 The attached patch is sufficient to allow a C++ program to use libzfs.
 The motivation for these changes is work I'm doing on a ZFS fault
 handling daemon that is written in C++.  SpectraLogic's intention
 is to return this work to the FreeBSD project once it is a bit more
 complete.
 
 Since these changes modify files that come from OpenSolaris, I want to be
 sure I understand the project's policies regarding divergence from
 the vendor before I check them in.  All of the changes save one should
 be trivial to merge with vendor changes and I will do that work for the
 v28 import.  Is there any reason I should not commit these changes?

Now that OpenSolaris is dead we don't have to be so strict with keeping
the diff against vendor small at all cost. I'd prefer not to modify
vendor code whenever possible so it is easier for us to cooperate with
IllumOS (we already took ome code from them).

Me and my company are also interested in fault management daemon
(although not restricted to ZFS, but a more general purpose mechanism
like FMA in Solaris). My question would be are there any chances you may
be convinced to use plain C? With C we might be able to help, but not
with C++.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgphkmODt5wu8.pgp
Description: PGP signature


Re: [PATCH] OpenSolaris/ZFS: C++ compatibility

2011-02-05 Thread Pawel Jakub Dawidek
On Sat, Feb 05, 2011 at 02:36:40PM -0700, Justin T. Gibbs wrote:
 On 2/5/2011 8:39 AM, Pawel Jakub Dawidek wrote:
  On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote:
  The attached patch is sufficient to allow a C++ program to use libzfs.
  The motivation for these changes is work I'm doing on a ZFS fault
  handling daemon that is written in C++.  SpectraLogic's intention
  is to return this work to the FreeBSD project once it is a bit more
  complete.
 
  Since these changes modify files that come from OpenSolaris, I want to be
  sure I understand the project's policies regarding divergence from
  the vendor before I check them in.  All of the changes save one should
  be trivial to merge with vendor changes and I will do that work for the
  v28 import.  Is there any reason I should not commit these changes?
  
  Now that OpenSolaris is dead we don't have to be so strict with keeping
  the diff against vendor small at all cost. I'd prefer not to modify
  vendor code whenever possible so it is easier for us to cooperate with
  IllumOS (we already took ome code from them).
 
 Perhaps IllumOS will accept these changes back?  As I mentioned in the
 change descriptions included with the patch, the header files already
 show the intention of providing C++ support (extern C blocks), they
 just don't quite deliver.  The changes shouldn't be controversial.

Sure. To be clear: I'm not against those changes, I think they are worth
it. And getting IllumOS to accept them back is definitely a good idea.

  Me and my company are also interested in fault management daemon
  (although not restricted to ZFS, but a more general purpose mechanism
  like FMA in Solaris).
 
 We have talked internally about this at Spectra too.  Since we don't have
 BSD licensed nvpair code, we've thought of using Google protocol buffers
 to allow extensible encoding of fault data.  The GP implementation is
 MIT licensed and looks like it might be less cumbersome to use than
 nvpairs.  For the first release of our product, however, we are just
 making due with the string data that devctl provides.

I've developed similar API during HAST work, maybe it is a good starting
point? src/sbin/hastd/nv.{c,h}.

  My question would be are there any chances you may
  be convinced to use plain C? With C we might be able to help, but not
  with C++.
 
 The core FMA support needs to be reasonably accessible from C code of
 course (fully functional and not cumbersome to use).  But we should
 allow FMA agents to be coded in whatever language is convenient to the
 developer.  The project may only be able to accept agents in C (and I'm
 voting for C++ too) into it's distribution, but that policy should not
 drive us to make the FMA architecture hard to access from shell, python,
 ruby, or some other language.

Yes, agents should not be limited to one language. I wouldn't be
surprised is the majority of agents will be shell scripts.

 The reason I chose C++ for this task is that devd, the source of the
 events I process, already requires C++ so using C++ in zfsd doesn't
 impose any new requirements on the system.  Zfsd, like even the C
 kernel of FreeBSD is coded in an object oriented fashion, but its
 much cleaner to implement this type of design in a language that
 inherently supports object oriented concepts.  Could I rewrite all
 that I have in C?  Sure, but there would have to be some compelling
 reasons to offset the reduction in clarity and maintainability such
 a change would cause.

Hmm, so zfsd will receive events from devd? I'm in opinion that we
should let devd alone. In my initial port I used devd, because it was
closest match, but if we want to clean it up, we shouldn't go through
devd. For example ZFS v28 can report whole binary blocks where checksum
doesn't match and passing those through devd would be cumbersome.

 Is your inability to help on a C++ version of this code due to distaste
 for C++ or just a lack of experience with it?

The latter. I'm sure there are many committers that are fluent in C++,
but all of them know C. I was under impression that Warner implemented
devd in C++ also as a kind of experiment, which nobody really followed.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpQQMrZ5Hwdv.pgp
Description: PGP signature


HEADS UP: ZFSv28 is in!

2011-02-27 Thread Pawel Jakub Dawidek
Hi.

I just committed ZFSv28 to HEAD.

New major features:

- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.
- Possibility to import pool in read-only mode.

PS. If you like my work, you help me to promote yomoli.com:)

http://yomoli.com
http://www.facebook.com/pages/Yomolicom/178311095544155

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpGTPfcT34QE.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-02-28 Thread Pawel Jakub Dawidek
On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote:
 I'm so excited for your work. Thanks so much for bringing zpool v28 to
 FreeBSD. Will v28 come to 8-stable?

Yes, hopefully in 1-2 month(s).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp1UOEA9rzOR.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-02-28 Thread Pawel Jakub Dawidek
On Mon, Feb 28, 2011 at 10:37:25AM +, krad wrote:
 On 28 February 2011 08:47, Pawel Jakub Dawidek p...@freebsd.org wrote:
  On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote:
  I'm so excited for your work. Thanks so much for bringing zpool v28 to
  FreeBSD. Will v28 come to 8-stable?
 
  Yes, hopefully in 1-2 month(s).
 
  --
  Pawel Jakub Dawidek                       http://www.wheelsystems.com
  FreeBSD committer                         http://www.FreeBSD.org
  Am I Evil? Yes, I Am!                     http://yomoli.com
 
 
 ive never managed to be able to boot off my 4k aligned pool
 (ashift=12) on stable, does the import to head provide all the patches
 for this or is it a case of using the latest zfs v28 patch set for
 stable? I have no dying need for v28 yet, it just want to be able to
 boot onto the 4k drive and tidy things up.

Support for this is included in what I committed to HEAD. Even HEAD
couldn't boot off of pools with ashift != 9 until now.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpoBcg2ska7K.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-03-01 Thread Pawel Jakub Dawidek
On Mon, Feb 28, 2011 at 08:34:08AM +0100, Martin Sugioarto wrote:
  PS. If you like my work, you help me to promote yomoli.com:)
  
  http://yomoli.com
  http://www.facebook.com/pages/Yomolicom/178311095544155
  
 
 I would like, but you should at least tell me what it is (what will be
 sold there). I don't like to advertise things I don't know or even
 things that seem evil to me.
 
 I'll post your answer to a well-known German *BSD forum, if you want.

Well, I didn't want to say too much about it here, as it isn't really
related to FreeBSD. This is a startup I'm working on which is
location-based chat, which allows users to communicate with their
neighborhood.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpe1gJOLMeSe.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-03-06 Thread Pawel Jakub Dawidek
On Mon, Mar 07, 2011 at 01:06:11AM +, FreeBSD Tinderbox wrote:
 TB --- 2011-03-07 00:25:55 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2011-03-07 00:25:55 - starting HEAD tinderbox run for ia64/ia64
 TB --- 2011-03-07 00:25:55 - cleaning the object tree
 TB --- 2011-03-07 00:26:06 - cvsupping the source tree
 TB --- 2011-03-07 00:26:06 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/ia64/ia64/supfile
 TB --- 2011-03-07 00:26:19 - building world
 TB --- 2011-03-07 00:26:19 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-03-07 00:26:19 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-03-07 00:26:19 - TARGET=ia64
 TB --- 2011-03-07 00:26:19 - TARGET_ARCH=ia64
 TB --- 2011-03-07 00:26:19 - TZ=UTC
 TB --- 2011-03-07 00:26:19 - __MAKE_CONF=/dev/null
 TB --- 2011-03-07 00:26:19 - cd /src
 TB --- 2011-03-07 00:26:19 - /usr/bin/make -B buildworld
  World build started on Mon Mar  7 00:26:20 UTC 2011
  Rebuilding the temporary build tree
  stage 1.1: legacy release compatibility shims
  stage 1.2: bootstrap tools
  stage 2.1: cleaning up the object tree
  stage 2.2: rebuilding the object tree
  stage 2.3: build tools
  stage 3: cross tools
  stage 4.1: building includes
  stage 4.2: building libraries
  stage 4.3: make dependencies
 [...]
 mkdep -f .depend -a /src/sbin/growfs/growfs.c
 echo growfs: /obj/ia64.ia64/src/tmp/usr/lib/libc.a   .depend
 === sbin/gvinum (depend)
 rm -f .depend
 mkdep -f .depend -a-I/src/sbin/gvinum/../../sys /src/sbin/gvinum/gvinum.c 
 /src/sbin/gvinum/../../sys/geom/vinum/geom_vinum_share.c
 echo gvinum: /obj/ia64.ia64/src/tmp/usr/lib/libc.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libreadline.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libtermcap.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libdevstat.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libkvm.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libgeom.a  .depend
 === sbin/hastctl (depend)
 make: don't know how to make hast_compression.c. Stop
 *** Error code 2

Interesting race. hast_compression.c was added in the same commit it was
added to hastctl Makefile.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpFDhVqWe1wK.pgp
Description: PGP signature


Re: missing files in readdir(3) on NFS export of ZFS volume (since v28?)

2011-03-08 Thread Pawel Jakub Dawidek
On Mon, Mar 07, 2011 at 01:08:46AM +0100, Pierre Beyssac wrote:
 Hello,
 
 I'm running a 9-current server as compiled on Sat Mar  5 02:17:14
 CET 2011.
 
 Since I upgraded to ZFS v28 I noticed missing files from NFS. The
 files are still accessible through NFS but they don't show up on a
 readdir(3).
[...]

Could you try r219404?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpeiqDGOvkQL.pgp
Description: PGP signature


Re: Any success stories for HAST + ZFS?

2011-03-25 Thread Pawel Jakub Dawidek
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:
 I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28
 patches, and 9-CURRENT (after the ZFSv28 commit).  Things work well
 until I start hastd.  Then either the system locks up, or hastd causes
 a kernel panic, or hastd dumps core.

The minimum amount of information (as always) would be backtrace from
the kernel and also hastd backtrace when it coredumps. There is really
decent logging in hast, so I'm also sure it does log something
interesting on primary or secondary. Another useful thing would be to
turn on debugging in hast (single -d option for hastd).

The best you can do is to give me the simplest and quickest procedure to
reproduce the issue, eg. configure two hast resources, put ZFS mirror on
top, start rsync /usr/src to the file system on top of hast and switch
roles. The simpler the better.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpYcvgL105vI.pgp
Description: PGP signature


Re: Any success stories for HAST + ZFS?

2011-04-02 Thread Pawel Jakub Dawidek
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:
 [Not sure which list is most appropriate since it's using HAST + ZFS
 on -RELEASE, -STABLE, and -CURRENT.  Feel free to trim the CC: on
 replies.]
 
 I'm having a hell of a time making this work on real hardware, and am
 not ruling out hardware issues as yet, but wanted to get some
 reassurance that someone out there is using this combination (FreeBSD
 + HAST + ZFS) successfully, without kernel panics, without core dumps,
 without deadlocks, without issues, etc.  I need to know I'm not
 chasing a dead rabbit.

I just committed a fix for a problem that might look like a deadlock.
With trociny@ patch and my last fix (to GEOM GATE and hastd) do you
still have any issues?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpfaqPYEbyOO.pgp
Description: PGP signature


Re: panic: g_eli_key_hold: sc_ekeys_total=1

2011-04-22 Thread Pawel Jakub Dawidek
On Fri, Apr 22, 2011 at 05:04:01PM +0200, Fabian Keil wrote:
 With sources from today my system panics at boot time
 after attaching the swap device:
 
 GEOM_ELI: Device ada0s1b.eli created.
 GEOM_ELI: Encryption: AES-XTS 256
 GEOM_ELI: Crypto: software
 panic: g_eli_key_hold: sc_ekeys_total=1
 cpuid = 0
 KDB: enter: panic
 Uptime: 2m16s
 Physical memory: 1974 MB
 Dumping 213 MB: 198 182 166 150 134 118 102 86 70 54 38 22 6
[...]

Could you provide the output of:

# diskinfo -v /dev/ada0s1b

And could you try:

# /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp1PdPS9g7QC.pgp
Description: PGP signature


Re: panic: g_eli_key_hold: sc_ekeys_total=1

2011-04-24 Thread Pawel Jakub Dawidek
On Sun, Apr 24, 2011 at 11:12:03AM +0200, Fabian Keil wrote:
 The panic can be reproduced with:
 /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b

That's why I asked for ada0s1b size. It should be fixed in HEAD (r220984).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpAuN5zvAX8k.pgp
Description: PGP signature


Re: geli on r221012

2011-05-08 Thread Pawel Jakub Dawidek
On Mon, Apr 25, 2011 at 01:31:55PM +, Anton Yuzhaninov wrote:
 Geli no longer works for me after upgrade to r221012.
 
 # geli attach -k ~citrin/private.key /dev/label/spool2
 Enter passphrase:
 #
 
 from dmesg:
 GEOM_ELI: Device label/spool2.eli created.
 GEOM_ELI: Encryption: Blowfish-CBC 128
 GEOM_ELI:  Integrity: HMAC/MD5
 GEOM_ELI: Crypto: software
 
 # dd if=/dev/label/spool2.eli of=/dev/null
 dd: /dev/label/spool2.eli: Invalid argument
 0+0 records in
 0+0 records out
 0 bytes transferred in 0.000669 secs (0 bytes/sec)

Thanks for the report! It should be fixed in r221628.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpUlmhHjBPXE.pgp
Description: PGP signature


Re: Randomization in hastd(8) synchronization thread

2011-05-21 Thread Pawel Jakub Dawidek
On Tue, May 17, 2011 at 12:39:19PM -0700, Maxim Sobolev wrote:
 Hi Pawel,
 
 I am trying to use hastd(8) over slow links and one problem is
 apparent right now - current approach with synchronizing content
 sequentially is not working in this case. What happens is that hastd
 hits the first frequently updated block and cannot make any progress
 anymore. In my case I have 30GB of dirty space to be synchronized
 over just 1mbps uplink.
 
 The quick fix that I've applied is randomization in the block
 selection code. This way  eventually all least used blocks will be
 synchronized, leaving only hot ones dirty. More effective approach
 would be to use some kind of LRU selection algorithm, but
 statistical approach would work just as good in this case.
 
 Please review the patch below:
 
 http://sobomax.sippysoft.com/activemap.c.diff

Hmm, hastd keeps separate bitmap for synchronization. It is stored in
am_syncmap field. Blocks that are dirtied during regular writes should
not effect on synchronization bitmap and synchronization progress.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp9xz8wcUwuQ.pgp
Description: PGP signature


LOR (ffs_snapshot.c:651 vm_map.c:2258).

2003-11-05 Thread Pawel Jakub Dawidek
Hello.

lock order reversal
 1st 0xc66a6db0 vnode interlock (vnode interlock) @ 
/usr/src/sys/ufs/ffs/ffs_snapshot.c:651
 2nd 0xc0c2f110 system map (system map) @ /usr/src/sys/vm/vm_map.c:2258
Stack backtrace:
backtrace(c05bbfcb,c0c2f110,c05c650b,c05c650b,c05c6581) at backtrace+0x17
witness_lock(c0c2f110,8,c05c6581,8d2,c0c2f0b0) at witness_lock+0x686
_mtx_lock_flags(c0c2f110,0,c05c6581,8d2,c6aee000) at _mtx_lock_flags+0xb5
_vm_map_lock(c0c2f0b0,c05c6581,8d2,c69e61b0,0) at _vm_map_lock+0x36
vm_map_remove(c0c2f0b0,c6aee000,c6af,e1b1a7f0,c0555f99) at vm_map_remove+0x30
kmem_free(c0c2f0b0,c6aee000,2000,e1b1a80c,c05579f9) at kmem_free+0x32
page_free(c6aee000,2000,22,c060c4b8,c05e9100) at page_free+0x3a
uma_large_free(c69e61b0,e1b1a83c,c0487f64,c66a6db0,2000) at uma_large_free+0xf9
free(c6aee000,c05e9100,c05c3358,28b,c25aff00) at free+0xe9
ffs_snapshot(c6522600,80c39a0,70,c04b5d36,c060d3e0) at ffs_snapshot+0x23f4
ffs_mount(c6522600,c69c4380,bfbffcc0,e1b1abf0,c6496720) at ffs_mount+0x617
vfs_mount(c6496720,c258ecd0,c69c4380,1211000,bfbffcc0) at vfs_mount+0x7d1
mount(c6496720,e1b1ad14,c05cd44e,3ee,4) at mount+0xba
syscall(2f,2f,2f,0,bfbffdc0) at syscall+0x28f
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (21), eip = 0x80557bb, esp = 0xbfbffb6c, ebp = 0xbfbffd48 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic after mount() fail.

2003-11-17 Thread Pawel Jakub Dawidek
Hello.

There is a problem with mount(2) failures. It can cause panics.

How-to-repeat.

# dd if=/dev/random of=/test.img bs=1m count=8
# mdconfig -a -t vnode -f /test.img -u 25
# mkdir -p /mnt/test
# mount /dev/md25 /mnt/test
(fail)
# mount /dev/md25 /mnt/test
(panic Memory modified after free ...)

This is because on failure mutex is not destroyed.

Patch:

--- vfs_mount.c.origSun Nov 16 15:46:56 2003
+++ vfs_mount.c Sun Nov 16 15:21:48 2003
@@ -1061,6 +1061,7 @@ update:
vfs_unbusy(mp, td);
else {
mp-mnt_vfc-vfc_refcount--;
+   mtx_destroy(mp-mnt_mtx);
vfs_unbusy(mp, td);
 #ifdef MAC
mac_destroy_mount(mp);
@@ -1142,6 +1143,7 @@ update:
vp-v_iflag = ~VI_MOUNT;
VI_UNLOCK(vp);
mp-mnt_vfc-vfc_refcount--;
+   mtx_destroy(mp-mnt_mtx);
vfs_unbusy(mp, td);
 #ifdef MAC
mac_destroy_mount(mp);

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: panic: sleeping without a mutex (acd related)

2003-11-25 Thread Pawel Jakub Dawidek
On Tue, Nov 25, 2003 at 11:21:03AM +0100, Christian Laursen wrote:
+ I have been experiencing some random lockups after upgrading from
+ 5.1-RELEASE to 5.2-BETA.
+ 
+ I then wen on and enabled all the debug options in my kernel config
+ hoping to be able to find the cause.
+ 
+ But now I cannot boot at all. In the end of the boot process when
+ detecting ATA drives, I get this:
+ 
+ ad0: 76319MB ST380011A [155061/16/63] at ata0-master UDMA100  
+ acd0-5: CDROM with 6 CD changer CD-C68E at ata1-master PIO4   
+ acd6: DVDROM CREATIVEDVD5240E-1 at ata1-slave PIO4
+ panic: sleeping without a mutex 
+ Debugger(panic)   
+ Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0  
+ db 
+ db trace   
+ Debugger(c06e3744,c07549a0,c06e3ec9,d861ab60,100) at Debugger+0x54  
+ panic(c06e3ec9,0,c06e3eb8,c06d6584,10) at panic+0xd5
+ msleep(c45173d8,0,4c,c06d6584,0) at msleep+0x505
+ acd_geom_access(c452de00,1,0,0,0) at acd_geom_access+0x115  

Yeah. There are two calls of tsleep(9) without timeout set
(in line 499, 509), so this KASSERT is reached:

KASSERT(timo != 0 || mtx_owned(Giant) || mtx != NULL,
(sleeping without a mutex));

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic: if_simloop: attempted use of a free mbuf!

2003-11-28 Thread Pawel Jakub Dawidek
Hello.

I'm reaching assertion from /sys/net/if_loop.c:270.

This is very easy to reproduce:

First you need to put loopback into promiscuous mode:

# tcpdump -i lo0

Then try to connect to loopback, for example:

# telnet 127.0.0.1 22

Enjoy!:)

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: jail and emulators/linux_base

2003-12-03 Thread Pawel Jakub Dawidek
On Wed, Dec 03, 2003 at 10:22:16AM +0100, Niklas Saers Mailinglistaccount wrote:
+ I'm running CURRENT and set up a jail where I want to install SUN JDK
+ 1.4.2. In the process, linux emulation needs to be installed. While
+ installing emulators/linux_base, I get the following:
+ 
+ === Installing for linux_base-7.1_5
+ Un-mounting linprocfs...
+ umount: retrying using path instead of file system ID
+ ===  Generating temporary packing list
+ === Checking if emulators/linux_base already installed
+ mknod: /compat/linux/dev/null: Operation not permitted
+ *** Error code 1
+ 
+ While Linux-emulation is already up and running on the host-machine, it
+ seems the jail is not allowed to create what it needs to run it. I
+ understand allowing mknod(8) within a jail is dangerous in the case where
+ you allow untrusted users to be root. Is there some way to either say I
+ don't let untrusted users be root thus allowing this or to compile
+ emulators/linux_base more jail-friendly, possibly setting things up from
+ outside the jail?

Erm. You may install it using chroot(8) only and then run jail with the
same path. You may also use chroot(8) instead of jail if you're looking
for full functionality.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


HAST (Highly Available Storage) now in HEAD.

2010-02-19 Thread Pawel Jakub Dawidek
Hi.

Yesterday I committed HAST to the HEAD branch.

HAST allows to transparently store data on two physically separated
machines connected over the TCP/IP network. HAST works in
Primary-Secondary (Master-Backup, Master-Slave) configuration, which
means that only one of the cluster nodes can be active at any given
time. Only Primary node is able to handle I/O requests to HAST-managed
devices. Currently HAST is limited to two cluster nodes in total.

HAST operates on block level - it provides disk-like devices in
/dev/hast/ directory for use by file systems and/or applications.
Working on block level makes it transparent for file systems and
applications. There in no difference between using HAST-provided device
and raw disk, partition, etc. All of them are just regular GEOM
providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and
hast.conf(5) manual pages, as well as:

http://wiki.FreeBSD.org/HAST

On the wiki page above you should find instructions how to initialize
hast and integrate it with ucarp.

Let me know (using freebsd...@freebsd.org mailing list) if you have and
questions or comments.

And last, but not least, I'd like to thank sponsorswho made this
projects possible:

The FreeBSD Foundation, http://www.freebsdfoundation.org
OMCnet Internet Service GmbH, http://www.omc.net
TransIP BV, http://www.transip.nl

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpXW0Rd7BO2p.pgp
Description: PGP signature


Re: ZFS: statfs and recordsize problem

2010-02-19 Thread Pawel Jakub Dawidek
On Thu, Feb 18, 2010 at 03:39:28PM +0300, Alexander Zagrebin wrote:
 I have noticed, that statfs called for ZFS file systems,
 returns the value of FS's recordsize property in both f_bsize and
 f_iosize.
 
 It's a problem for some software.
 For example, squid uses block size of cache's file system to calculate
 the space occupied by file.
 So by default it considers that any small file uses 128KB of a cache
 (when default value of recordsize is used), though really this file
 may use 512B only.
 This miscalculation leads to unreasonable cleaning of a cache.
 
 IMHO the behavior of statfs have to be changed, as ZFS uses variable
 (up to recordsize) block sizes.
 It must return 512 as f_bsize and recordsize as f_iosize.
 One of possible solutions is the attached patch.
 Could somebody look it?

I committed (slightly modified version of) your patch to HEAD.
Thanks!

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp67WCYnRd70.pgp
Description: PGP signature


Re: check for jailed environment for adjkerntz

2010-03-01 Thread Pawel Jakub Dawidek
On Mon, Mar 01, 2010 at 02:15:41AM +0300, Subbsd wrote:
 jail with complete type have standard crontab a file of tasks. However not
 all standard task are adapted for work in jail an environment. For example
 adjkerntz which generates
 
 adjkerntz [46733]: sysctl (set: machdep.wall_cmos_clock): Operation not
 permitted
 
 I suggest to give adjkerntz concept about jail in which to it it is not
 necessary to work:
[...]

I also always was finding that annoying, but only your e-mail made me to
think about ways to fix it and that maybe simple patch like the one
below will do?

--- etc/crontab (wersja 204363)
+++ etc/crontab (kopia robocza)
@@ -22,4 +22,4 @@
 #
 # Adjust the time zone if the CMOS clock keeps local time, as opposed to
 # UTC time.  See adjkerntz(8) for details.
-1,31   0-5 *   *   *   rootadjkerntz -a
+1,31   0-5 *   *   *   root[ `sysctl -n 
security.jail.jailed` -eq 0 ]  adjkerntz -a

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpYvDwD944Ze.pgp
Description: PGP signature


Re: Increasing MAXPHYS

2010-03-22 Thread Pawel Jakub Dawidek
On Mon, Mar 22, 2010 at 08:23:43AM +, Poul-Henning Kamp wrote:
 In message 4ba633a0.2090...@icyb.net.ua, Andriy Gapon writes:
 on 21/03/2010 16:05 Alexander Motin said the following:
  Ivan Voras wrote:
  Hmm, it looks like it could be easy to spawn more g_* threads (and,
  barring specific class behaviour, it has a fair chance of working out of
  the box) but the incoming queue will need to also be broken up for
  greater effect.
  
  According to notes, looks there is a good chance to obtain races, as
  some places expect only one up and one down thread.
 
 I haven't given any deep thought to this issue, but I remember us discussing
 them over beer :-)
 
 The easiest way to obtain more parallelism, is to divide the mesh into
 multiple independent meshes.
 
 This will do you no good if you have five disks in a RAID-5 config, but
 if you have two disks each mounted on its own filesystem, you can run
 a g_up  g_down for each of them.

A class is suppose to interact with other classes only via GEOM, so I
think it should be safe to choose g_up/g_down threads for each class
individually, for example:

/dev/ad0s1a (DEV)
   |
g_up_0 + g_down_0
   |
 ad0s1a (BSD)
   |
g_up_1 + g_down_1
   |
 ad0s1 (MBR)
   |
g_up_2 + g_down_2
   |
 ad0 (DISK)

We could easly calculate g_down thread based on bio_to-geom-class and
g_up thread based on bio_from-geom-class, so we know I/O requests for
our class are always coming from the same threads.

If we could make the same assumption for geoms it would allow for even
better distribution.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpFAxWFcI5ds.pgp
Description: PGP signature


Re: ZFS behavior when device disappears

2010-04-20 Thread Pawel Jakub Dawidek
On Tue, Apr 13, 2010 at 05:39:30PM -0600, Jason J. W. Williams wrote:
 Hello,
 
 Currently, we're an OpenSolaris shop but with the way things are going
 over at Oracle/Sun we're starting to evaluate our options for keeping
 ZFS but moving off Solaris. One of my concerns is that FreeBSD is
 implementing ZFSv14 (ZFS itself is up to v23 I believe). For quite a
 long time, ZFS under Solaris had a real problem with the following
 scenario:
 
 * Hard drive starts to die
 * Controller and SCSI subsystem continue to retry an I/O rather than
 failing fast
 * Even if the I/O does fail fast ZFS doesn't really notice a spike in
 I/O failures and continues to use the drive.
 * Result: I/O on the zpool stalls completely while the I/Os continue
 to be tried against the drive.
 
 This got fixed in later revs of OpenSolaris by enhancements to ZFS and
 greater integration with the Fault Management Architecture (FMA) of
 Solaris...lots of I/Os failing on a drive get communicated to ZFS who
 then offlines the drive out of the pool.
 
 My question is, what is the situation in FreeBSD 8 with ZFS if that
 type of situation occurs?

I believe FreeBSD does whatever OpenSolaris did for this version of ZFS.
There is nogoing work to bring v24 to FreeBSD. Basic functionality works
already, but a lot work is still needed. At some point I'll see what we
can do about it, because we don't have FMA in FreeBSD and we would need
to find another way to deal with it. I've limited time I can spend on
ZFS right now, so I'm making small steps, but I'm making good progress
too.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpVisqFmsp2w.pgp
Description: PGP signature


Re: ZFS behavior when device disappears

2010-04-20 Thread Pawel Jakub Dawidek
On Tue, Apr 20, 2010 at 07:24:53AM -0600, Jason J. W. Williams wrote:
 Hi Pawel,
 
 Thank you very much for the response! Please forgive some of my
 questions, as I'm a bit unfamiliar with the FreeBSD port.
 
 What is the nature of the port? Is it something where each new version
 of ZFS is a from-scratch effort to some degree? Or is it a point where
 new ZFS versions are a matter of just making the newer features
 operational?

Definitely the latter, but there some problems:

- Some changes in OpenSolaris ZFS are very hard to port in short time,
  and when it takes a lot of time, new versions arrive and it is nice to
  get them too, etc. which makes whole process to take long time.

  Good example here is moving some functionality to Python, where we
  have to decided what to do about that without importing Python to the
  base system.

- OpenSolaris ZFS is experimental and I don't think Solaris version is
  published anywhere. This means it needs extensive testing on our side,
  which of course takes time.

- OpenSolaris changes are often not easy to understand. They have
  different commit rules than we have. Commit logs are not very helpful
  and multiple fixes are committed in one go, which makes it hard to
  separate individual changes if we just need a fix and not intrusive
  change that came along.

I'm doing my best, but my time is limited. I see more and more people
are interested in helping with ZFS, which is a very good sign I was
waiting for for a long time:)

It is of course still wonderful that we can use ZFS. All my servers and
my laptop are running exclusively on ZFS at this point:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpM8JNKN6bFd.pgp
Description: PGP signature


Re: Switchover to CAM ATA?

2010-04-26 Thread Pawel Jakub Dawidek
On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote:
 I've read most of this thread.  I think this is cool technology.
 However, before we move forward with this, we need to have a plan for
 the various issues that have come up.  The plan needs to be specific,
 have owners for key items, warnings about ownerless == obsoleted, and
 target dates.
 
 I think this is one of the cases where we should record the plan of
 record on a wiki.  It worked well for other times we've had big,
 disruptive changes.
 
 My opinion for the path forward:
 (1) Send a big heads up about the future of ataraid(5).  It will be
 shot in the head soon, to be replaced be a bunch of geom classes
 for each different container format.  At least that seems to be
 the rough consensus I've seen so far.  We need worker bees to do
 many of these classes, although much can be mined from the ataraid
 code today.

This shouldn't be a bunch of GEOM classes. This should one class which
recognize multiple formats, just like the LABEL class.
I don't think it is feasible to reuse gmirror for that, it wasn't
designed in something like this in mind.

 (2) Send another big heads up strongly recommending people go to
 glabel based fstabs.  Maybe the right option here is to provide a
 simple script walk people through the conversion.  This will
 render the carnage of ad - ada (or da) a mostly non-event, and
 also protect people from 'oops' of rebooting with that thumb drive
 in the system.
 (3) Create a wiki to record all the new geom classes needed.  Find
 people to own each one, or note it is unowned, and support will be
 dropped if no owner can be found.
 (4) sysinstall should default to creating label systems, if it doesn't
 already.
 (5) Issues with glabel and ataraid(5) need an owner, and need to be
 resolved, since the device names here are likely to change.

What are the issues?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp9zbeI5WsV4.pgp
Description: PGP signature


Re: Switchover to CAM ATA?

2010-04-26 Thread Pawel Jakub Dawidek
On Mon, Apr 26, 2010 at 12:19:46PM -0600, M. Warner Losh wrote:
 In message: 20100426181209.gb3...@garage.freebsd.pl
 Pawel Jakub Dawidek p...@freebsd.org writes:
 : On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote:
 :  I've read most of this thread.  I think this is cool technology.
 :  However, before we move forward with this, we need to have a plan for
 :  the various issues that have come up.  The plan needs to be specific,
 :  have owners for key items, warnings about ownerless == obsoleted, and
 :  target dates.
 :  
 :  I think this is one of the cases where we should record the plan of
 :  record on a wiki.  It worked well for other times we've had big,
 :  disruptive changes.
 :  
 :  My opinion for the path forward:
 :  (1) Send a big heads up about the future of ataraid(5).  It will be
 :  shot in the head soon, to be replaced be a bunch of geom classes
 :  for each different container format.  At least that seems to be
 :  the rough consensus I've seen so far.  We need worker bees to do
 :  many of these classes, although much can be mined from the ataraid
 :  code today.
 : 
 : This shouldn't be a bunch of GEOM classes. This should one class which
 : recognize multiple formats, just like the LABEL class.
 : I don't think it is feasible to reuse gmirror for that, it wasn't
 : designed in something like this in mind.
 
 OK.  Maybe I got the consensus wrong...  My key point is that we need
 a plan moving forward, we need to identify what's actively being
 worked on vs somebody else[tm] should do tihs and when it needs to
 be done or else.

You most likely got it right, I'm just saying creating separate GEOM
class for each metadata format is wrong direction. :)

 :  (5) Issues with glabel and ataraid(5) need an owner, and need to be
 :  resolved, since the device names here are likely to change.
 : 
 : What are the issues?
 
 ataraid doesn't remove the underlying ad* devices, so glabel often
 picks those up instead of the ataraid device, and you only get 1
 disk's worth of raid device...  So no mirroring or only 1/2 a striped
 volume.

It not only leave ad* devices, it doesn't even open them properly using
GEOM. It's internal ATA hack, which is PITA.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpC74JvN8hWL.pgp
Description: PGP signature


Re: AESNI driver and fpu_kern KPI

2010-05-18 Thread Pawel Jakub Dawidek
On Sat, May 15, 2010 at 01:04:01PM +0300, Kostik Belousov wrote:
 Hello,
 
 please find at http://people.freebsd.org/~kib/misc/aesni.1.patch the
 combined patch, containing the fpu_kern KPI and Intel AESNI crypto(9)
 driver.  I did development and some testing on the hardware generously
 provided by Sentex Communications to Netperf cluster.

Nice work. Few comments:

- Could you modify this chunk in padlock.c:

+   td = curthread;
+   error = fpu_kern_enter(td, ses-ses_fpu_ctx);
+   if (error != 0)
+   goto out;
error = padlock_hash_setup(ses, macini);
+   fpu_kern_leave(td, ses-ses_fpu_ctx);
+   out:

  To something without goto, eg.:

td = curthread;
error = fpu_kern_enter(td, ses-ses_fpu_ctx);
if (error == 0) {
error = padlock_hash_setup(ses, macini);
fpu_kern_leave(td, ses-ses_fpu_ctx);
}

- I see that in sys/dev/random/nehemiah.c you don't check for return
  value of fpu_kern_enter(). That's the only place where you ignore it.
  Is that intended?

- Unfortunately the driver in its current version can't be used with
  IPsec and with GELI where authentication is enabled. This is because
  the driver doesn't support sessions where both encryption and
  authentication is defined. Do you have plans to change it?
  I saw that you based crypto(9) bits on padlock, which does support
  sessions with authentication by calculating hashes in software.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptFXEkt9czc.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote:
 On 8.8.2010 12:30, Pawel Jakub Dawidek wrote:
  So why do you want to obfuscate glabel with it? For people to start
  depend on it? Once we start supporting 4kB sectors what do we do with
  such a change? Remove it and decrease version number? What people will
  do with providers already labeled this way?
  
  If its temporary, just allow to list providers you want to increase
  sector size in /boot/loader.conf. Once we start supporting it properly
  people might simply remove it from loader.conf and it should just work.
  
  Glabel is not for that and I don't agree for such obfuscation.
 
 Of course, there are good and bad sides to it. My take on it is that the
 only bad side is that it really isn't glabel's primary function to
 (optionally) fixup geometry, while the good sides are:

It isn't its secondary function either.

 * glabel is in GENERIC and judging by the mailing lists' traffic it is
 one of the better used parts of the system so people are familiar with
 it. It is also already used as a perfectly valid fixup for device
 renaming, making both UFS and ZFS more stable for usage.

That's an excellent argument. But you know what? The em(4) is also in
GENERIC, why not to add it in there?

 * You can't really make people depend on glabel both because it is in
 GENERIC and because of it storing metadata in the last sector, making
 the rest of the drive completely usable without it in the event native
 4k sector support is grown.

I never said that. I do want people to depend on glabel, because it is
free of such ugly hacks, so I know it won't bite them in the future.

I don't want people to start depend on the fact that glabel supports
changing sector sizes.

Once we start supporting 4kB sectors properly people configuration will
stop working, because glabel won't be able to read its metadata anymore.
Your hack will break all configurations that started to depend on your
hack. In what I proposed, GEOM provider will be presented to glabel (or
any other GEOM class) as 4kB provider and everything will just work,
also after adding proper support for 4kB sectors.

 I'd like to hear comments from the wider audience. In respect with your
 comment, I will compromise: as 4k sector drives have become available
 over the counter more than 6 months ago and so far I think this is the
 first effort to give some support for them, I will commit this patch
 before 9.0 code freeze only if no other support gets developed.

I'll repeat. You won't commit this patch, because it is totally wrong
solution and can only do a lot of damage in the future.
If you look forward, even temporary solutions can be done right.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpxLQFRxU0ja.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote:
 On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote:
  I'd like to hear comments from the wider audience. In respect with your
  comment, I will compromise: as 4k sector drives have become available
  over the counter more than 6 months ago and so far I think this is the
  first effort to give some support for them, I will commit this patch
  before 9.0 code freeze only if no other support gets developed.
 
 I do not like this at all. Even if it's just for the KISS and POLA
 principles. A geom should do one thing and do it right imo.
 Why not write a new geom class that does what you want?

New GEOM class only for sectorsize conversion that can operate on
metadata will be useful, not only to solve this particular problem.
Although keep in mind that if at some point disks will be detected and
presented as 4kB providers to the GEOM, this class won't be able to find
its metadata anymore (as it was stored in the last 512 bytes, not in the
last 4 kilobytes).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpMenhUo3zq1.pgp
Description: PGP signature


Re: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement]

2010-08-22 Thread Pawel Jakub Dawidek
On Wed, Aug 18, 2010 at 12:48:53PM +0200, Ed Schouten wrote:
 Hi Daichi,
 
 I think Keith Packard of Xorg once wrote a commit message along the
 lines of 5000 lines of code removed, feature added This seems to be
 similar, albeit on a smaller scale. ;-)
 
 Apart from this issue with unionfs, I am also experiencing another
 issue, where for some reason I cannot perform a second mount of the CD
 right after booting the system. Basically, my WIP FreeBSD boot CD does
 the following (but written in C):
 
   mount -t cd9660 /dev/iso9660/freebsd /mnt
   mount -t tmpfs none /tmp
   mount -t unionfs /tmp /mnt
   mount -t devfs none /mnt/dev
   chroot /mnt /sbin/init
 
 The first step fails with EBUSY. I use the following hack to get it
 working, but I don't think it's the proper way to solve it:

What you are trying to do here is to mount /dev/iso9660/freebsd for the
second time? This is not supported. The check is there to prevent doing
this, as it will panic on you when you try to unmount first mount (not
really a problem in your case, as the first mount is /, so you probably
don't want to unmount it, but it is a problem in general).

You should be able to reproduce the panic with your patch applied by
doing the following:

# mount -t cd9660 /dev/iso9660/freebsd /mnt0
# mount -t cd9660 /dev/iso9660/freebsd /mnt1
# umount /mnt0

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp88NLmz310d.pgp
Description: PGP signature


Re: [CFT] Improved ZFS metaslab code (faster write speed)

2010-08-28 Thread Pawel Jakub Dawidek
On Sat, Aug 28, 2010 at 05:03:42AM -0400, jhell wrote:
 On 08/28/2010 04:20, Andriy Gapon wrote:
  on 28/08/2010 04:24 jhell said the following:
  The modified patch from avg@ (portion patch) is:
 
  #ifdef _KERNEL
  if (arc_reclaim_needed()) {
  needfree = 0;
  wakeup(needfree);
  }
  #endif
 
 I still moved that down to below _KERNEL for the obvious reasons.  But
  when I was using the original patch with if (needfree) I noticed a
  performance degradation after ~12 hours of use with and without UMA
  turned on. So far with ~48 hours of testing with the top half of that
  being with the above change, I have not seen more degradation of
  
  This is quite unexpected.
  needfree should be checked as the very first thing in arc_reclaim_needed()
  [unless you have patched it locally].  So if needfree is 1 then
  arc_reclaim_needed() should also return 1.  But the converse is not true,
  arc_reclaim_needed() may return 1 even if needfree is zero.
  
  So if your testing results are conclusive then it must mean that some extra
  wakeups on needfree are needed.  I.e. needfree is zero, so there shouldn't 
  be
  anything waiting on it (see arc_lowmem) and no notification should be 
  needed,
  but issuing somehow does make difference,
  Hmm...
  
 
 I will look further into this and see if I can throw a counter around it
 or some printf's so I can at least log what its doing in both instances.
 
 I thought the very same thing you said above when I saw your patch for
 that and was astounded at the results that were returned from it. So in
 short testing I reverted it back quickly to see if that was the cause of
 the problem and sure enough everything resumed to the way it was before.
 
 Anyway thanks for the reply. I will get back to you if I see anything
 cool arise from this.

Could you include the following patch to your testing:

http://people.freebsd.org/~pjd/patches/arc.c.9.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpomIv4VGZ52.pgp
Description: PGP signature


ZFS v28 is ready for wider testing.

2010-08-31 Thread Pawel Jakub Dawidek
Hello.

I'd like to give you ZFS v28 for testing. If you are neither brave nor
mad, you can stop here.

The patchset is very experimental. It can eat your cookie and hurt your
teddy bear, so be warned. Don't try it for anything except testing.

This patchset is also a message we, as the FreeBSD project, would like
to send to our users: Eventhough OpenSolaris is dead, the ZFS file
system is going to stay in FreeBSD. At this point we have quite a few
developers involved in ZFS on FreeBSD as well as serveral companies.
We are also looking forward to work with IllumOS.

So, what this new ZFS brings?

- Data deduplication. Read more here:

http://blogs.sun.com/bonwick/entry/zfs_dedup

- Triple parity RAIDZ (RAIDZ3). Read more here:

http://dtrace.org/blogs/ahl/2009/07/21/triple-parity-raid-z/

- zfs diff. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2010/105/20100328_tim.haley

- zpool split. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2009/511/20090924_mark.musante

- Snapshot holds. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2009/297/20090511_chris.kirby

- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.

- Possibility to import pool in read-only mode.

And much, much more, including plenty of preformance improvements and bug
fixes.

So test whatever you can and report back. Look for regressions, strange
behaviour, missing features, deadlocks, livelocks, preformance
degradation, etc.

The boot code is not updated at all, so booting off of ZFS doesn't
currently work.

The patch is against today's FreeBSD HEAD.

The patch enables (in sys/modules/zfs/Makefile) ZFS internal debugging,
please don't turn it off. Also, compile your kernel with the following
options:

options KDB
options DDB
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options WITNESS_SKIPSPIN
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS

Ignore all the LOR (Lock Order Reversal) reports from WITNESS. There will
be plenty of those, and you'll desperately want to report them, but please
don't.

The best way to report a problem is to answer to this e-mail with as short
as possible procedure of how to reproduce it and debugging info. I'd
prefer textdump if possible. Below you can find quick procedure how to
setup textdumps:

Choose spare/swap disk/partition in your system, let's say it is
/dev/ad0s1b.

Add the following line to /etc/fstab:

/dev/ad0s1b noneswapsw  0   0

Add the following line to /etc/rc.conf:

ddb_enable=YES

Run the following commands:

# /etc/rc.d/swap1 start
# /etc/rc.d/dumpon start
# /etc/rc.d/ddb start

This will setup swap, mark it as dump device and setup some DDB
scripts. Or you can just reboot.

Now when your system panic or deadlock, enter DDB and call the
following command:

ddb run kdb.enter.panic

It will execute all the commands I need, dump them in text format to
your swap device and reboot machine.

After the reboot, you should find textdump.tar.0 file in /var/crash/
directory. This is the debug info I need.

End of textdumps procedure.

Ok, now that I know you read everything carefully, here is the patch:

http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Good luck! :

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGVyTUV4RIm.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-01 Thread Pawel Jakub Dawidek
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote:
 Ok, now that I know you read everything carefully, here is the patch:
 
   http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Important note. Please patch with the following command:

# patch -E -p0  zfs_20100831.patch

If you don't use -E option, patch(1) won't remove empty files and you
won't be able to compile it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgplMh4YH3ZOH.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-02 Thread Pawel Jakub Dawidek
On Thu, Sep 02, 2010 at 01:55:51AM -0700, Rob Farmer wrote:
 On Tue, Aug 31, 2010 at 2:59 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 
  Ok, now that I know you read everything carefully, here is the patch:
 
         http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2
 
 
 buildworld on i386 (yes I know ZFS isn't ideal there):
[...]

Yes, I know about this problem, You can use attached patch or wait for
full patch, which I'll be sending later today.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
--- sys/cddl/compat/opensolaris/sys/atomic.h
+++ sys/cddl/compat/opensolaris/sys/atomic.h
@@ -39,10 +39,9 @@
 #ifndef __LP64__
 extern void atomic_add_64(volatile uint64_t *target, int64_t delta);
 extern void atomic_dec_64(volatile uint64_t *target);
-extern void *atomic_cas_ptr(volatile void *target, void *cmp,  void *newval);
 #endif
 #ifndef __sparc64__
-extern uint64_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp,
+extern uint32_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp,
 uint32_t newval);
 extern uint64_t atomic_cas_64(volatile uint64_t *target, uint64_t cmp,
 uint64_t newval);
@@ -119,21 +118,19 @@
 }
 
 #ifndef COMPAT_32BIT
-#if defined(__LP64__)
+#ifdef __LP64__
 static __inline void *
 atomic_cas_ptr(volatile void *target, void *cmp,  void *newval)
 {
-	return ((void *)atomic_cas_64((volatile uint64_t *)target, (uint64_t)cmp,
-	(uint64_t)newval));
+	return ((void *)atomic_cas_64(target, (uint64_t)cmp, (uint64_t)newval));
 }
 #else
 static __inline void *
 atomic_cas_ptr(volatile void *target, void *cmp,  void *newval)
 {
-	return ((void *)atomic_cas_32((volatile uint64_t *)target, (uint64_t)cmp,
-	(uint64_t)newval));
+	return ((void *)atomic_cas_32(target, (uint32_t)cmp, (uint32_t)newval));
 }
 #endif
-#endif
+#endif	/* !COMPAT_32BIT */
 
 #endif	/* !_OPENSOLARIS_SYS_ATOMIC_H_ */


pgppo82knRdQW.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-02 Thread Pawel Jakub Dawidek
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote:
[...]
 Ok, now that I know you read everything carefully, here is the patch:
 
   http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Now it is even easier to test new ZFS! :)

Here you can find VirtualBox Appliance (113MB) with
FreeBSD 9-CURRENT and ZFSv28:

http://people.freebsd.org/~pjd/misc/FreeBSD9_ZFSv28_0.1.tgz

Untar it, import it (zfsv28.ovf) to VirtualBox and have fun.

You can log in as root with no password (via virtual console or via SSH).
The system IP address is IP 192.168.56.66/24.
There are 16 ada(4) disks to play with. For example:

zfsv28:root:~# zpool create tank raidz3 ada{0,1,2,3,4,5,6,7} raidz3 
ada{8,9,10,11,12,13,14,15}
zfsv28:root:~# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz3-0  ONLINE   0 0 0
ada0ONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
  raidz3-1  ONLINE   0 0 0
ada8ONLINE   0 0 0
ada9ONLINE   0 0 0
ada10   ONLINE   0 0 0
ada11   ONLINE   0 0 0
ada12   ONLINE   0 0 0
ada13   ONLINE   0 0 0
ada14   ONLINE   0 0 0
ada15   ONLINE   0 0 0

errors: No known data errors

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3nDIzwUUuC.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-03 Thread Pawel Jakub Dawidek
On Fri, Sep 03, 2010 at 04:50:44PM +0100, Peter Molnar, BSD wrote:
 Hi,
 I would like to try ZFS + VirtualBox but I have got problems:
 
 
 1) Linux 2.6.32-24-generic #42-Ubuntu SMP Fri Aug 20 14:21:58 UTC 2010 
 x86_64 GNU/Linux
 
 I tried import that file in my  VirtualBox but I have got error:
 Failed to import appliance.
 /home/peter/FreeBSD/zfsv28.ovf
 Too many IDE controllers in OVF; import facility only supports one.

Which VirtualBox version do you use? 3.2.8?

Exporting appliances is a bit broken (if you have more than one disk, it
will point all disks at the last one from configuration), so I had to
edit .ovf file manually to fix this. Maybe I messed something up, but I
was able to successfully import it before publishing it.

PS. I waited for so long for decent virtualization software for FreeBSD,
and I must say VirtualBox is really great, and free, and open-source
Are you reading this, VMWare?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgppp5WIVDzjJ.pgp
Description: PGP signature


gptboot rewrite, bootonce, etc.

2010-09-17 Thread Pawel Jakub Dawidek
 things will have to wait until I can
sleep at nights again. Well, there is still dedup support that waits to
be implemented in gptzfsboot...

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpm1w4OWOKIR.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Mon, Sep 20, 2010 at 09:46:56AM +0100, krad wrote:
 does it work for zfs boot as that would be really nice if it did?

No, it doesn't. ZFS works a bit differently. ZFS operate on pools, not
really on partitions. One ZFS file system can span multiple
disks/partitions. I'm not yet sure how to implement it, so it is
intuitive, but I also haven't spend much time thinking about it. We
needed UFS and that is what I implemented. It took me much more time
than I expected anyway:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpOli8wZZAdH.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Mon, Sep 20, 2010 at 01:17:38AM +0200, Oliver Pinter wrote:
 Hi PJD!
 
 Can you this patcheset release for 7-STABLE?

I've no plans atm to port this work to 7-STABLE. I don't even have 7.x
systems anymore. Not sure how boot code differs, maybe the patch will
apply without modifications? No idea. I'd like to MFC this to 8-STABLE,
though.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp1EiZmlOSUJ.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Sun, Sep 19, 2010 at 09:10:52PM +0400, Boris Samorodov wrote:
 Hi!
 
 On Sat, 18 Sep 2010 01:45:42 +0200 Pawel Jakub Dawidek wrote:
 
  My company was in need for functionality similar to nextboot(8), but on
  boot loader level, so we can have two partitions we boot from where one
  is known to be good and the other is used for upgrades. We upgrade by
  dd(1)ing entire partition image onto unused partition, we mark it as
  try-to-boot-from-it-but-only-once, reboot and if we fail to boot from
  the new partition, we fall back to the old, good partition. If we
  succeed on the other hand, we mark the new partition as our boot
  partition and mark the other one as unused.
 
  Well, how hard can it be?
 
  After around two weeks of work, I ended up rewriting gptboot in large
  parts, reorganizing a lot of code, improving and extending gpart a bit
  and implementing desire functionality.
 
  Here is the patch for review and test:
 
  http://people.freebsd.org/~pjd/patches/gptboot.patch
 
 Great! Since I need to have both i386 and amd64 at my box
 here are my test results:
 -
 [~]b...@alya% uname -a
 FreeBSD alya 9.0-CURRENT FreeBSD 9.0-CURRENT #1 r212758M: Sat Sep 18 16:13:38 
 MSD 2010
 b...@alya:/space/FreeBSD/base/head/obj/space/FreeBSD/base/head/src/sys/ALYA 
 amd64
 
 [~]b...@alya% glabel status
   Name  Status  Components
 gptid/c6053c9b-abcc-11df-b740-00251124aff4 N/A  ad4p1
  label/9-amd64 N/A  ad4p2
 label/swap N/A  ad4p3
label/space N/A  ad4p4
   label/9-i386 N/A  ad4p5
 [~]b...@alya% mount
 /dev/label/9-amd64 on / (ufs, local)
 devfs on /dev (devfs, local, multilabel)
 /dev/label/space on /space (ufs, local)
 /dev/md0 on /tmp (ufs, local, nosuid, soft-updates)
 procfs on /proc (procfs, local)
 linprocfs on /compat/linux/proc (linprocfs, local)
 linsysfs on /compat/linux/sys (linsysfs, local)
 fdescfs on /dev/fd (fdescfs)
 
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  (20G)
   301990050  188244669   - free -  (90G)
 
 [~]b...@alya% gpart set -a bootme -i 2 ad4
 bootme set on ad4p2
 [~]b...@alya% gpart set -a bootonce -i 5 ad4
 bootonce set on ad4p5
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  [bootme]  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  [bootonce,bootme]  (20G)
   301990050  188244669   - free -  (90G)
 -
 
 Install i386 kernel/world to ad4p5, successful reboot, get i386
 system. Next reboot (get amd64 system back):
 -
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  [bootme]  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  (20G)
   301990050  188244669   - free -  (90G)
 -
 
 All seems to work fine.

Great, thanks for testing!

  Any comments or suggestions?
 
 Only one for now. With current default syslog configuration
 logging to local0.warning and local0.info goes nowhere.
 It will be good if those messages have traces at the
 default system.

Good point. I changed those to local0.notice.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpK71ho4UC6u.pgp
Description: PGP signature


Recent GELI additions.

2010-09-25 Thread Pawel Jakub Dawidek
Hi.

I'd like to inform about three new features in GELI available in HEAD:

1. AES-XTS encryption. XTS mode is a standard that is recommended these
   days for storage encryption. This is the default now. AES-XTS support
   was also added to opencrypto framework and aesni(4) driver.

2. Multiple encryption keys. GELI will use one encryption key for at
   most 2^20 blocks (sectors), as it is not recommended to use the same
   encryption key for too much data. It generates keys array from the
   master key on attach and uses it accordingly. This is the default now.

3. Passphrase can now be loaded from a file (-J and -j options).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpKbX8P352EG.pgp
Description: PGP signature


Re: letting glabel recognise a media change

2010-10-10 Thread Pawel Jakub Dawidek
On Thu, Sep 30, 2010 at 08:46:11PM +0300, Alexander Motin wrote:
 Andriy Gapon wrote:
  on 30/09/2010 01:28 Matthew Jacob said the following:
  If something like that was in place, I assure you that things would start 
  to use
  it very quickly.
  
  I am not sure about this.
  Because, e.g. I don't see an easy way to know that media is changed in 
  scsi_cd
  driver.  That is, without polling.  I don't consider polling to be an easy 
  way for
  a number of reasons.
 
 SATA specification defines concept of Asynchronous Notification. It is
 already used by port multipliers to report about PHY events. It is also
 supposed to be used by CD drives to report media change. I haven't seen
 such devices yet, but hope they may appear sometimes.
 
 And even without AN support it would be nice to implement proper
 handling for SCSI UA - media changed errors within CAM. It still won't
 be perfect without using polling, but probably still something.

I'd like to know the original reason why CD device is represented by
GEOM provider and not CD media. For my naive thinking CD media should be
GEOM provider that we taste once the media is inserted and orphan once
the media is removed. I don't see any reasons for CD device to be useful
GEOM provider, but maybe I'm overlooking something.

Poul-Henning or Soren, do you remember who made and why this design choice?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpbCmI9YvYaB.pgp
Description: PGP signature


Re: letting glabel recognise a media change

2010-10-11 Thread Pawel Jakub Dawidek
On Mon, Oct 11, 2010 at 11:03:26AM -0400, John Baldwin wrote:
 With CD drives you are also rather stuck in that the existing ABI for
 controlling CD drives (e.g. ioctls in 3rd party software to eject a CD) are
 done on the /dev/cdX device.  Ideally enclosures for removable media would
 be separate devices from the removable media itself, but a lot of existing
 software for CD's would break if this changes now.

Right, but I still wonder if we could execute provider orphan and
retaste on various events like media insertion or removal. If media is
removed we orphan provider and recreate it, which will trigger retaste,
and this is fine there will be nothing to read from or write to (we will
simply return errors as we do now, I think). This way we nicely
co-operate with GEOM, but also with other tools that don't require media
to be present (if there is no media devfs entry still exists and handles
ioctls, it just return errors on read requests).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp57kBd4EwFu.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-11-03 Thread Pawel Jakub Dawidek
On Wed, Nov 03, 2010 at 07:28:15PM +0100, Olivier Smedts wrote:
         http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2
 
 Hello,
 
 Any status update on this ? I regularly check
 http://people.freebsd.org/~pjd/patches/ to see if there's an updated
 version of your patch. 2 months old is quite a bit for -CURRENT, which
 often receives commits on zfsco parts.
 
 Thanks for all your work on FreeBSD (not only ZFS).

It took a while, but I should have something new shortly. I recently
finished boot support for v28 (the most missing feature in the previous
patch?) and will work on new patch soon. I'm heading to meetBSD
California tomorrow and I'll be back in a week, so nothing will happen
till then for sure.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpPnD9csrFCZ.pgp
Description: PGP signature


Read-only /usr/obj/ no longer kosher?

2015-08-23 Thread Pawel Jakub Dawidek
I used to build world and kernel on one machine and export both /usr/src/ and
/usr/obj read-only to other machines. It doesn't work anymore (this is from
'make installworld'):

=== bin/freebsd-version (install)
eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
/usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  s/@@BRANCH@@/${BRANCH}/g; 
  /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
then  rm -f freebsd-version.sh ;  exit 1 ;  fi
cannot create freebsd-version.sh: Permission denied
rm: freebsd-version.sh: Read-only file system
*** Error code 1

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp0DzHE4AU2t.pgp
Description: PGP signature


Re: Memory modified after free, seemingly geli related

2015-08-05 Thread Pawel Jakub Dawidek
On Thu, Aug 06, 2015 at 04:06:40AM +0200, Pawel Jakub Dawidek wrote:
 On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote:
  I've encountered a few memory modified after free panics recently,
  which seem to be from geli. I don't yet have any debugging to
  completely confirm it's geli, but it has not happened on my other test
  laptop which configured similarly but without geli.
  
  This has a few local patches from my to-commit-to-HEAD queue.
  FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10
  r284409+6a002d9(staging): Tue Jul  7 17:57:01 EDT 2015
  
  panic: Memory modified after free 0xf80009d504d8(248) val=0 @
  0xf80009d50518
 
 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by
 the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm
 investigating further and will let you know here once I find the
 cause.

Ok. It was bio from ZFS in my case, but it was GELI which modified
bio_error after delivering bio.

This patch fixes the race:

http://people.freebsd.org/~pjd/patches/geom_eli.patch

Using bio after calling crypto_dispatch() is a bug. 'done' callbacks
might have already called g_io_deliver() and upper layer might have
already freed the bio.

I'm not fully convinced that panic is the right response to
crypto_dispatch() failure. It means that the driver failed our request
and didn't call our callback, which is bad as we never complete the I/O.
The crypto drivers tend to return errors only if the request itself is
bogus, but that is program's bug and not a runtime condition. In other
words panic should be fine here.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpRu2X0EJLDP.pgp
Description: PGP signature


Re: Memory modified after free, seemingly geli related

2015-08-05 Thread Pawel Jakub Dawidek
On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote:
 I've encountered a few memory modified after free panics recently,
 which seem to be from geli. I don't yet have any debugging to
 completely confirm it's geli, but it has not happened on my other test
 laptop which configured similarly but without geli.
 
 This has a few local patches from my to-commit-to-HEAD queue.
 FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10
 r284409+6a002d9(staging): Tue Jul  7 17:57:01 EDT 2015
 
 panic: Memory modified after free 0xf80009d504d8(248) val=0 @
 0xf80009d50518

I'm seeing it too. I tracked it down to ZFS. The bio was last owned by
the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm
investigating further and will let you know here once I find the
cause.

 cpuid = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe011414a880
 vpanic() at vpanic+0x189/frame 0xfe011414a900
 panic() at panic+0x43/frame 0xfe011414a960
 trash_ctor() at trash_ctor+0x48/frame 0xfe011414a970
 uma_zalloc_arg() at uma_zalloc_arg+0x573/frame 0xfe011414a9e0
 g_clone_bio() at g_clone_bio+0x1d/frame 0xfe011414aa00
 g_eli_start() at g_eli_start+0xbd/frame 0xfe011414aa30
 g_io_schedule_down() at g_io_schedule_down+0xe6/frame 0xfe011414aa60
 g_down_procbody() at g_down_procbody+0x7d/frame 0xfe011414aa70
 fork_exit() at fork_exit+0x84/frame 0xfe011414aab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe011414aab0
 --- trap 0, rip = 0, rsp = 0xfe011414ab70, rbp = 0 ---

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpMR9ZeEaVYN.pgp
Description: PGP signature


Re: Read-only /usr/obj/ no longer kosher?

2015-08-26 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 03:32:35PM -0700, NGie Cooper wrote:
 On Tue, Aug 25, 2015 at 3:21 PM, Xin Li delp...@delphij.net wrote:
  On 08/25/15 14:55, Pawel Jakub Dawidek wrote:
  Now that I think of it, it might have been that I did
  buildworld/buildkernel before -p1. Then freebsd-update updated
  newvers.sh and then I was trying to do installworld.
 
  Yes, I can now reproduce it with source updated to -p2.
 
  Yes, that's because freebsd-version.sh is generated from the files (but
  it's not clear to me whether if it's a bug or a feature that 'make
  install' checks if it's up-to-date and decides to regenerate it...).
 
 It's a quirk for sure. If you change the behavior, people will
 definitely complain as they will now need to go back and rebuild
 everything.

What we have now is misleading. People should recompile. It is rather
rare to see security advisory which bumps only patch level and something
that doesn't require recompilation (eg. a shell script). Current
behaviour would make people think they are running latest patch level
because freebsd-version says so, eventhough they only did 'make
installworld' without rebuilding affected binaries.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpRsLnByGkaA.pgp
Description: PGP signature


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote:
 On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
  
  
  On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
   I used to build world and kernel on one machine and export both /usr/src/ 
   and
   /usr/obj read-only to other machines. It doesn't work anymore (this is 
   from
   'make installworld'):
   
   === bin/freebsd-version (install)
   eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
   /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
   s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
   s/@@BRANCH@@/${BRANCH}/g;   
   /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
   then  rm -f freebsd-version.sh ;  exit 1 ;  fi
   cannot create freebsd-version.sh: Permission denied
   rm: freebsd-version.sh: Read-only file system
   *** Error code 1
  
  What's the modification times of
  /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
  /usr/src/bin/freebsd-version/freebsd-version.sh and
  /usr/src/sys/conf/newvers.sh?
 
 I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
 I've send it to current@ by mistake. All in all my expectation is that
 we shouldn't modify obj/ during installworld.

Now that I think of it, it might have been that I did
buildworld/buildkernel before -p1. Then freebsd-update updated
newvers.sh and then I was trying to do installworld.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 11:53:47PM +0200, Pawel Jakub Dawidek wrote:
 On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote:
  On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
   
   
   On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
I used to build world and kernel on one machine and export both 
/usr/src/ and
/usr/obj read-only to other machines. It doesn't work anymore (this is 
from
'make installworld'):

=== bin/freebsd-version (install)
eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
/usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e 
 s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
s/@@BRANCH@@/${BRANCH}/g;   
/usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh 
; then  rm -f freebsd-version.sh ;  exit 1 ;  fi
cannot create freebsd-version.sh: Permission denied
rm: freebsd-version.sh: Read-only file system
*** Error code 1
   
   What's the modification times of
   /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
   /usr/src/bin/freebsd-version/freebsd-version.sh and
   /usr/src/sys/conf/newvers.sh?
  
  I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
  I've send it to current@ by mistake. All in all my expectation is that
  we shouldn't modify obj/ during installworld.
 
 Now that I think of it, it might have been that I did
 buildworld/buildkernel before -p1. Then freebsd-update updated
 newvers.sh and then I was trying to do installworld.

Yes, I can now reproduce it with source updated to -p2.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
 
 
 On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
  I used to build world and kernel on one machine and export both /usr/src/ 
  and
  /usr/obj read-only to other machines. It doesn't work anymore (this is from
  'make installworld'):
  
  === bin/freebsd-version (install)
  eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
  /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
  s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
  s/@@BRANCH@@/${BRANCH}/g;   
  /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
  then  rm -f freebsd-version.sh ;  exit 1 ;  fi
  cannot create freebsd-version.sh: Permission denied
  rm: freebsd-version.sh: Read-only file system
  *** Error code 1
 
 What's the modification times of
 /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
 /usr/src/bin/freebsd-version/freebsd-version.sh and
 /usr/src/sys/conf/newvers.sh?

I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
I've send it to current@ by mistake. All in all my expectation is that
we shouldn't modify obj/ during installworld.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dumpdev in loader.conf vs rc.d/dumpon

2015-09-24 Thread Pawel Jakub Dawidek
On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote:
> 
> > On 23.09.2015 19:57, Andriy Gapon wrote:
> > > I do not have a strong opinion.  Either option, rc.d/dumpon change or 
> > > geom_dev
> > > change, is fine with me.
> > 
> > I added the ability to set dumpdev via loader. But I wasn't aware that
> > it was used in rc.d script.
> > 
> > If you have set dumpdev kenv, it will be already enabled in the time
> > when rc.d/dumpon  will be run. So, I think it is useless to try to
> > enable dumpdev again. I prefer remove this old code from rc.d script.
> 
> rc.d script can redirect dump to device, not available at boot time,
> iSCSI disk, for examle.

No. Dump device is very special. It runs in an environment when kernel
already paniced, there are no interrupt, so there is no networking.
Storage controllers have special methods to handle dumping kernel memory
- it doesn't go through GEOM, it cannot go through GEOM as the scheduler
doesn't work too.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpfeiUIcx0t9.pgp
Description: PGP signature


Re: dumpdev in loader.conf vs rc.d/dumpon

2015-09-24 Thread Pawel Jakub Dawidek
On Fri, Sep 25, 2015 at 12:11:51AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 24, 2015 at 10:58:00PM +0200, Pawel Jakub Dawidek wrote:
> 
> > On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote:
> > > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote:
> > > 
> > > > On 23.09.2015 19:57, Andriy Gapon wrote:
> > > > > I do not have a strong opinion.  Either option, rc.d/dumpon change or 
> > > > > geom_dev
> > > > > change, is fine with me.
> > > > 
> > > > I added the ability to set dumpdev via loader. But I wasn't aware that
> > > > it was used in rc.d script.
> > > > 
> > > > If you have set dumpdev kenv, it will be already enabled in the time
> > > > when rc.d/dumpon  will be run. So, I think it is useless to try to
> > > > enable dumpdev again. I prefer remove this old code from rc.d script.
> > > 
> > > rc.d script can redirect dump to device, not available at boot time,
> > > iSCSI disk, for examle.
> > 
> > No. Dump device is very special. It runs in an environment when kernel
> > already paniced, there are no interrupt, so there is no networking.
> > Storage controllers have special methods to handle dumping kernel memory
> > - it doesn't go through GEOM, it cannot go through GEOM as the scheduler
> > doesn't work too.
> 
> Can be ZFS VOL act as dump device?

I don't think so. IIRC there was a hack in Illumos to allocate
contiguous space for dump in one of the vdevs (then I think it was
extended to multiple vdevs). I don't think any ZFS feature has worked
for such a ZVOL (no checksumming, no compression, etc.).

Others may have more up-to-date info about that.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpSlioURbq3y.pgp
Description: PGP signature


Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Pawel Jakub Dawidek

On 9/8/23 15:09, Alexander Motin wrote:
Thank you, Martin.  I was able to reproduce the issue with your script 
and found the cause.


I first though the issue is triggered by the `cp`, but it appeared to be 
triggered by `cat`.  It also got copy_file_range() support, but later 
than `cp`.  That is probably why it slipped through testing.  This patch 
fixes it for me: https://github.com/openzfs/zfs/pull/15251 .


Mark, could you please try the patch?


Thank you Alex for the fix!

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic commit:

 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9

It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains between 
60 to 112 bytes (this might be hard to determine). Fix exists, it is 
merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered 
when block cloning is enabled, the sync property is set to disabled and 
copy_file_range(2) is used. Easy fix exists, it is not yet merged to 
OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, 
so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are 
fixed, as far as I can tell.


Block cloning remains disabled for now just to be on the safe side, but 
can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.


Don't relay on this sysctl as it will be removed in 2-3 weeks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 21:28, José Pérez wrote:

Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


José,

I can only speak of block cloning in details, but I'll try to address 
everything.


The easiest way to avoid block_cloning-related corruption on the kernel 
after the last OpenZFS merge, but before e0bb199925 is to set the 
compress property to 'off' and the sync property to something other than 
'disabled'. This will avoid the block_cloning-related corruption and 
zil_replaying() panic.


As for the other corruption, unfortunately I don't know the details, but 
my understanding is that it is happening under higher load. Not sure I'd 
trust a kernel built on a machine with this bug present. What I would do 
is to compile the kernel as of 068913e4ba somewhere else, boot the 
problematic machine in single-user mode and install the newly built kernel.


As far as I can tell, contrary to some initial reports, none of the 
problems introduced by the recent OpenZFS merge corrupt the pool 
metadata, only file's data. You can locate the files modified with the 
bogus kernel using find(1) with a proper modification time, but you have 
to decide what to do with them (either throw them away, restore them 
from backup or inspect them).


--
Pawel Jakub Dawidek




Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 03:51, Mateusz Guzik wrote:

After bugfixes got committed I decided to zpool upgrade and sysctl
vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
quickly got a new crash:

panic: VERIFY(arc_released(db->db_buf)) failed

cpuid = 9
time = 1681755046
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0
vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
0xfe0a90b8e700
dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780
dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30
--- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
KDB: enter: panic
[ thread pid 95000 tid 135035 ]
Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)

The posted 14.0 schedule which plans to branch stable/14 on May 12 and
one cannot bet on the feature getting beaten up into production shape
by that time. Given whatever non-block_clonning and not even zfs bugs
which are likely to come out I think this makes the feature a
non-starter for said release.

I note:
1. the current problems did not make it into stable branches.
2. there was block_cloning-related data corruption (fixed) and there may be more
3. there was unrelated data corruption (see
https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
the problematic commit in FreeBSD, not yet sorted out upstream

As such people's data may be partially hosed as is.

Consequently the proposed plan is as follows:
1. whack the block cloning feature for the time being, but make sure
pools which upgraded to it can be mounted read-only
2. run ztest and whatever other stress testing on FreeBSD, along with
restoring openzfs CI -- I can do the first part, I'm sure pho will not
mind to run some tests of his own
3. recommend people create new pools and restore data from backup. if
restoring from backup is not an option, tar or cp (not zfs send) from
the read-only mount

block cloning beaten into shape would use block_cloning_v2 or whatever
else, key point that the current feature name would be considered
bogus (not blocking RO import though) to prevent RW usage of the
current pools with it enabled.

Comments?


Correct me if I'm wrong, but from my understanding there were zero 
problems with block cloning when it wasn't in use or now disabled.


The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly 
avoid mess like this and give us more time to sort all the problems out 
while making it easy for people to try it.


If there is no plan to revert the whole import, I don't see what value 
removing just block cloning will bring if it is now disabled by default 
and didn't cause any problems when disabled.


--
Pawel Jakub Dawidek




Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 05:14, Mateusz Guzik wrote:

On 4/17/23, Pawel Jakub Dawidek  wrote:

Correct me if I'm wrong, but from my understanding there were zero
problems with block cloning when it wasn't in use or now disabled.

The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
avoid mess like this and give us more time to sort all the problems out
while making it easy for people to try it.

If there is no plan to revert the whole import, I don't see what value
removing just block cloning will bring if it is now disabled by default
and didn't cause any problems when disabled.



The feature definitely was not properly stress tested and what not and
trying to do it keeps running into panics. Given the complexity of the
feature I would expect there are many bug lurking, some of which
possibly related to the on disk format. Not having to deal with any of
this is can be arranged as described above and is imo the most
sensible route given the timeline for 14.0


Block cloning doesn't create, remove or modify any on-disk data until it 
is in use.


Again, if we are not going to revert the whole merge, I see no point in 
reverting block cloning as until it is enabled, its code is not 
executed. This allow people who upgraded the pools to do nothing special 
and it will allow people to test it easily.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Pawel Jakub Dawidek

On 4/16/23 01:07, Florian Smeets wrote:
On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig    2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh    2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
  fi

  zfs clone -o mountpoint=${mnt} \
-    -o sync=disabled \
  -o atime=off \
  -o compression=off \
  ${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Thank you, Florian, that was very helpful!

This should fix the problem:

https://github.com/openzfs/zfs/pull/14758

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 23:05, Shawn Webb wrote:

I've learned over the years downstream that it's not really my place
to tell upstream what to do or how to do it. However, I think given
the seriousness of this, upstream might do well to revert the commit
until a solid fix is in place. Upstream might want to consider the
impacts this is having not just with downstream projects, but also
regular users.

Really bad timing to have a lot of new tax documentation that I really
don't want to lose. I'd really like to have an up-to-date, security
patched OS, but I guess I'll stay behind so that I don't risk losing
critical financial documentation.


Shawn,

I'm working on a patch to safely revert this that would also work for 
people who already upgraded their pools.


I'm sorry for this mess.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 22:56, Cy Schubert wrote:

I'm in the process of building a branch reverting the merge altogether and
will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:52, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


Testing with mjg@ earlier today revealed that block_cloning was not the 
cause of poudriere bulk build (and similar cp(1)/install(1)-based) 
corruption, although may have exacerbated it.


Can you please elaborate how were you testing and what exactly did you 
exclude?


Thanks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 09:23, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can 
be imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could 
try it.


https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Does not apply to sys/contrib/openzfs tip, conflicts in 
module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.


This should work:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:40, Pawel Jakub Dawidek wrote:

On 4/13/23 22:56, Cy Schubert wrote:
I'm in the process of building a branch reverting the merge altogether 
and

will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can be 
imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could try it.

https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Thank you guys for your help!

--
Pawel Jakub Dawidek




<    1   2