Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 15.04.23 17:51, FreeBSD User wrote: Am Sat, 15 Apr 2023 07:36:25 -0700 Cy Schubert schrieb: With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded blocks. #14739" patch I didn't have any issues, except for email messages with corruption in my sent directory, nowhere else. I'm still investigating the email messages issue. IMO one is generally safe to run poudriere on the latest ZFS with the additional patch. This is also my current observation. I have 2 hosts where I was unfortunate enough to update at the wrong time. I currently *think* that I'm *not* seeing data corruption with head from April 12th and this patch https://github.com/openzfs/zfs/commit/d3a6e5ca3b2f684132238ca968bf0b96f17ec7e1.diff applied. One pool has been upgraded with feature@block_cloning and the other hasn't. FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST 2023 amd64 The box is crashing while trying to update ports with the well known issue: Panic String: VERIFY(!zil_replaying(zilog, tx)) failed On the pool that has block_cloning enabled I see the above insta panic when poudriere starts building. I found a workaround though: --- /usr/local/share/poudriere/include/fs.sh.orig 2023-04-15 18:03:50.090823000 +0200 +++ /usr/local/share/poudriere/include/fs.sh 2023-04-15 18:04:04.144736000 +0200 @@ -295,7 +295,6 @@ fi zfs clone -o mountpoint=${mnt} \ - -o sync=disabled \ -o atime=off \ -o compression=off \ ${fs}@${snap} \ With this workaround I was able to build thousands of packages without panics or failures due to data corruption. Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: IPv6 TCP: first two SYN packets to local v6 unicast addresses ignored
On 03.05.22 19:08, Gleb Smirnoff wrote: On Sun, Apr 24, 2022 at 09:49:48AM +0200, Florian Smeets wrote: F> On 23.04.22 01:38, Gleb Smirnoff wrote: F> >Hi Florian, F> > F> > here is a patch that should help with the IPv6 problem. I'm not F> > yet committing it, it might be not final. F> F> yes, the patch resolves the issue. There is just one SYN packet, and it F> gets a reply immediately. Alexander provided a patch against the ip6_output inconsistency: https://reviews.freebsd.org/D35117 You might be interested in testing it together with my patch. I will commit mine only after Alexander commits his. With both patches applied it's working fine and I cannot reproduce the initial issue. Thanks Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: IPv6 TCP: first two SYN packets to local v6 unicast addresses ignored
On 23.04.22 01:38, Gleb Smirnoff wrote: Hi Florian, here is a patch that should help with the IPv6 problem. I'm not yet committing it, it might be not final. Hi Gleb, yes, the patch resolves the issue. There is just one SYN packet, and it gets a reply immediately. Thanks, Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: IPv6 TCP: first two SYN packets to local v6 unicast addresses ignored
On 16.04.22 07:22, Gleb Smirnoff wrote: Hi Florian, Hi Michael, Hi Gleb, thanks for looking into it. On Fri, Apr 15, 2022 at 06:11:13PM -0400, Michael Butler wrote: M> > M> > Found the culprit 1817be481b8703ae86730b151a6f49cc3022930f. And indeed M> > toggling net.inet6.ip6.source_address_validation makes the issue go away M> > on latest main. M> M> I found this commit and the ipv4 analog also cause packets between M> non-VNET jails on the same host and to the host itself to be dropped :-( I see your mails and will look into the problem ASAP. Meanwhile... Florian, can you please confirm you are using jails too? No, two of the 3 hosts I tested on do not use jails. Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: IPv6 TCP: first two SYN packets to local v6 unicast addresses ignored
On 15.04.22 21:24, tue...@freebsd.org wrote: On 15. Apr 2022, at 20:20, Florian Smeets wrote: Hi, there seems to be an issue with local IPv6 TCP connections on main. I have been seeing this for a couple of months at least. pkg upgr on my webserver hosting the pkg repo is very slow, all other hosts can connect to the pkg repo just fine. So IPv6 connections from external hosts are not affected. I thought I must have misconfigured something, as my setup is a bit weird. Yesterday I noticed the same issue on a different host, turns out all my 14.0 hosts seem to be affected, cognet@ could also reproduce it on one of his systems. The service/software used does not seem to matter, I tried with port 22, 25, 80 and 443. ICMP and UDP don't seem to be affected. ping6 gets replies immediately. And UDP connections with nc -l -u / nc -u don't have any delay, sent data is received immediately. Testing local TCP connections show this: flo@rp64:~ $ ifconfig dwc0|grep 2003 inet6 2003:cf:df49:c97:4c59:ebff:fec1:463d prefixlen 64 autoconf flo@rp64:~ $ nc -v 2003:cf:df49:c97:4c59:ebff:fec1:463d 22 [3 second delay here] Connection to 2003:cf:df49:c97:4c59:ebff:fec1:463d 22 port [tcp/ssh] succeeded! SSH-2.0-OpenSSH_8.9 FreeBSD-20220413 I need help debugging this, I don't know how to analyze this further. I will start bisecting this, but I thought maybe someone has an idea. Hi Florian, I can reproduce this locally, will try to figure out what is going on. If you can bisect it, it would be great. Found the culprit 1817be481b8703ae86730b151a6f49cc3022930f. And indeed toggling net.inet6.ip6.source_address_validation makes the issue go away on latest main. Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
IPv6 TCP: first two SYN packets to local v6 unicast addresses ignored
[bcc to net@ for wider exposure] Hi, there seems to be an issue with local IPv6 TCP connections on main. I have been seeing this for a couple of months at least. pkg upgr on my webserver hosting the pkg repo is very slow, all other hosts can connect to the pkg repo just fine. So IPv6 connections from external hosts are not affected. I thought I must have misconfigured something, as my setup is a bit weird. Yesterday I noticed the same issue on a different host, turns out all my 14.0 hosts seem to be affected, cognet@ could also reproduce it on one of his systems. The service/software used does not seem to matter, I tried with port 22, 25, 80 and 443. ICMP and UDP don't seem to be affected. ping6 gets replies immediately. And UDP connections with nc -l -u / nc -u don't have any delay, sent data is received immediately. Testing local TCP connections show this: flo@rp64:~ $ ifconfig dwc0|grep 2003 inet6 2003:cf:df49:c97:4c59:ebff:fec1:463d prefixlen 64 autoconf flo@rp64:~ $ nc -v 2003:cf:df49:c97:4c59:ebff:fec1:463d 22 [3 second delay here] Connection to 2003:cf:df49:c97:4c59:ebff:fec1:463d 22 port [tcp/ssh] succeeded! SSH-2.0-OpenSSH_8.9 FreeBSD-20220413 tcpdump on lo0 shows that the first two SYN packets are ignored / time out, then the connection is successfully established. 19:28:38.685128 IP6 2003:cf:df49:c97:4c59:ebff:fec1:463d.61294 > 2003:cf:df49:c97:4c59:ebff:fec1:463d.22: Flags [S], seq 2489479594, win 65535, options [mss 16324,nop,wscale 6,sackOK,TS val 3410505643 ecr 0], length 0 19:28:39.696047 IP6 2003:cf:df49:c97:4c59:ebff:fec1:463d.61294 > 2003:cf:df49:c97:4c59:ebff:fec1:463d.22: Flags [S], seq 2489479594, win 65535, options [mss 16324,nop,wscale 6,sackOK,TS val 3410506654 ecr 0], length 0 19:28:41.897836 IP6 2003:cf:df49:c97:4c59:ebff:fec1:463d.61294 > 2003:cf:df49:c97:4c59:ebff:fec1:463d.22: Flags [S], seq 2489479594, win 65535, options [mss 16324,nop,wscale 6,sackOK,TS val 3410508856 ecr 0], length 0 19:28:41.897907 IP6 2003:cf:df49:c97:4c59:ebff:fec1:463d.22 > 2003:cf:df49:c97:4c59:ebff:fec1:463d.61294: Flags [S.], seq 2857552476, ack 2489479595, win 65535, options [mss 16324,nop,wscale 6,sackOK,TS val 1858349482 ecr 3410508856], length 0 19:28:41.897962 IP6 2003:cf:df49:c97:4c59:ebff:fec1:463d.61294 > 2003:cf:df49:c97:4c59:ebff:fec1:463d.22: Flags [.], ack 1, win 1276, options [nop,nop,TS val 3410508856 ecr 1858349482], length 0 I need help debugging this, I don't know how to analyze this further. I will start bisecting this, but I thought maybe someone has an idea. Thanks, Florian OpenPGP_0xEF5BA4DCD5A9F3C0.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
netstat / sockstat in jail don't list tcp/udp sockets
Hi, maybe all of this is related to the net.inet.tcp.pcblist error message further down. What works on 12.1 (host running stable/12, jail running 12.1-RELEASE) root@db21:~ # sockstat -4 -l -P tcp USER COMMANDPID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root sshd 3907 4 tcp4 *:22 *:* root perl 3885 6 tcp4 *:4949*:* mysqlmysqld 3771 11 tcp4 *:4567*:* mysqlmysqld 3771 38 tcp46 *:3306*:* nagios nrpe3 3465 4 tcp4 *:5666*:* root master 3459 13 tcp4 *:25 *:* root@db21:~ # netstat -nl -p tcp Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 172.17.2.33.4567 172.17.1.33.0 ESTABLISHED tcp4 0 0 172.17.2.33.58278 172.17.3.33.4567 ESTABLISHED root@db21:~ # sysctl net.inet.tcp.pcblist root@db21:~ # echo $? 0 doesn't work on head (r356268, host world/kernel and jail are in sync) root@db31:/ # sockstat -4 -l -P tcp USER COMMANDPID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root@db31:/ # netstat -nl -p tcp netstat: sysctl: net.inet.tcp.pcblist: No such file or directory root@db31:/ # sysctl net.inet.tcp.pcblist root@db31:/ # echo $? 0 root@db31:/ # exit root@host:~ # sockstat -4 -l -P tcp -j db31 USER COMMANDPID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root sshd 403 tcp4 *:22 *:* 88 mysqld 23261 11 tcp4 *:4567*:* 88 mysqld 23261 37 tcp4 *:3306*:* As db31 is my only jail running head and I only set this up in December I cannot tell how long this has been broken. Florian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Time to be real
Hi, Joe has indicated in the past that SPAM was sent from his account: https://lists.freebsd.org/pipermail/freebsd-ports/2014-September/095407.html We (postmaster@) contacted Joe and are looking into the issue. Please do not reply to the thread anymore. Florian signature.asc Description: OpenPGP digital signature
Re: External toolchain support
On 29/11/14 16:04, Baptiste Daroussin wrote: Hi all, It is now possible to use an external toolchain to build the kernel and base (tested with gcc 4.9.1 and latest binutils) make CROSS_TOOLCHAIN=sparc64-gcc -j8 buildkernel I built a sparc64 kernel on amd64 using sparc64-xtoolchain-gcc and was able to boot it successfully. $ uname -a FreeBSD v240 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r275267M: Sat Nov 29 22:23:38 CET root@storage:/usr/obj/sparc64.sparc64/usr/src/sys/GENERIC sparc64 $ sysctl kern.ostype kern.osrelease kern.osrevision kern.compiler_version kern.ostype: FreeBSD kern.osrelease: 11.0-CURRENT kern.osrevision: 199506 kern.compiler_version: gcc version 4.9.1 (FreeBSD Ports Collection for sparc64) Florian signature.asc Description: OpenPGP digital signature
Re: [ZFS][PANIC] Solaris Assert/zio.c:2548
On 20/07/14 16:03, Larry Rosenman wrote: panic: solaris assert: !(zio-io_flags ZIO_FLAG_DELEGATED), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874 This was fixed by r268980. Florian signature.asc Description: OpenPGP digital signature
Re: [ZFS][PANIC] Solaris Assert/zio.c:2548
On 20/07/14 16:03, Larry Rosenman wrote: Panic String: solaris assert: !(zio-io_flags ZIO_FLAG_DELEGATED), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874 Unread portion of the kernel message buffer: panic: solaris assert: !(zio-io_flags ZIO_FLAG_DELEGATED), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874 cpuid = 7 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe100c49f930 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100c49f9e0 vpanic() at vpanic+0x126/frame 0xfe100c49fa20 panic() at panic+0x43/frame 0xfe100c49fa80 assfail() at assfail+0x1d/frame 0xfe100c49fa90 zio_vdev_io_assess() at zio_vdev_io_assess+0x2ed/frame 0xfe100c49fac0 zio_execute() at zio_execute+0x1e9/frame 0xfe100c49fb20 taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe100c49fb80 taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfe100c49fbb0 fork_exit() at fork_exit+0x84/frame 0xfe100c49fbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe100c49fbf0 --- trap 0, rip = 0, rsp = 0xfe100c49fcb0, rbp = 0 --- Uptime: 8h57m17s (ada2:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich2:0:0:0): CAM status: Same here, running poudriere the box panics reproducibly within 2-5 seconds. panic: solaris assert: !(zio-io_flags ZIO_FLAG_DELEGATED), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874 cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe2e97f0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe2e98a0 vpanic() at vpanic+0x126/frame 0xfe2e98e0 panic() at panic+0x43/frame 0xfe2e9940 assfail() at assfail+0x1d/frame 0xfe2e9950 zio_vdev_io_assess() at zio_vdev_io_assess+0x2e8/frame 0xfe2e9980 zio_execute() at zio_execute+0x1e9/frame 0xfe2e99e0 taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe2e9a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfe2e9a70 fork_exit() at fork_exit+0x84/frame 0xfe2e9ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe2e9ab0 --- trap 0, rip = 0, rsp = 0xfe2e9b70, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100422 ] Stopped at kdb_enter+0x3e: movq$0,kdb_why Florian signature.asc Description: OpenPGP digital signature
Re: [ZFS][PANIC] Solaris Assert/zio.c:2548
On 21/07/14 01:46, Steven Hartland wrote: - Original Message - From: Larry Rosenman l...@lerctr.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd...@freebsd.org; freebsd-current@freebsd.org Sent: Monday, July 21, 2014 12:22 AM Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548 On 2014-07-20 18:21, Steven Hartland wrote: Can you try reverting r265321 and see if you still see the same crash? Regards Steve I'll do the revert, but it's been a ONE TIME hit. There was a followup to mine with a reproducible poudriere crash like mine. If you don't have a reproducable senario I'd hold off. Florian, is yours reproducable and can you send me a pretty print of the crashing zio? My backtrace looks a little different. panic: solaris assert: !(zio-io_flags ZIO_FLAG_DELEGATED), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874 cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe2e97f0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe2e98a0 vpanic() at vpanic+0x126/frame 0xfe2e98e0 panic() at panic+0x43/frame 0xfe2e9940 assfail() at assfail+0x1d/frame 0xfe2e9950 zio_vdev_io_assess() at zio_vdev_io_assess+0x2e8/frame 0xfe2e9980 zio_execute() at zio_execute+0x1e9/frame 0xfe2e99e0 taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe2e9a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfe2e9a70 fork_exit() at fork_exit+0x84/frame 0xfe2e9ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe2e9ab0 --- trap 0, rip = 0, rsp = 0xfe2e9b70, rbp = 0 --- KDB: enter: panic (kgdb) where #0 doadump (textdump=-2125462752) at pcpu.h:219 #1 0x80347655 in db_fncall (dummy1=value optimized out, dummy2=value optimized out, dummy3=value optimized out, dummy4=value optimized out) at /usr/src/sys/ddb/db_command.c:578 #2 0x8034733d in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:449 #3 0x803470b4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 #4 0x80349a90 in db_trap (type=value optimized out, code=0) at /usr/src/sys/ddb/db_main.c:231 #5 0x80944159 in kdb_trap (type=3, code=0, tf=value optimized out) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x80d1e532 in trap (frame=0xfe2e97d0) at /usr/src/sys/amd64/amd64/trap.c:542 #7 0x80d01202 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231 #8 0x809438be in kdb_enter (why=0x80f9ce38 panic, msg=value optimized out) at cpufunc.h:63 #9 0x8090bb66 in vpanic (fmt=value optimized out, ap=value optimized out) at /usr/src/sys/kern/kern_shutdown.c:737 #10 0x8090bbd3 in panic (fmt=0x815a59a0 \004) at /usr/src/sys/kern/kern_shutdown.c:673 #11 0x81fb821d in assfail (a=value optimized out, ---Type return to continue, or q return to quit--- f=value optimized out, l=value optimized out) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #12 0x81eca848 in zio_vdev_io_assess (ziop=value optimized out) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2874 #13 0x81ec58b9 in zio_execute (zio=0xf801a8abc398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1416 #14 0x80954150 in taskqueue_run_locked (queue=0xf80009249b00) at /usr/src/sys/kern/subr_taskqueue.c:356 #15 0x80954c1b in taskqueue_thread_loop (arg=value optimized out) at /usr/src/sys/kern/subr_taskqueue.c:623 #16 0x808d9834 in fork_exit ( callout=0x80954b80 taskqueue_thread_loop, arg=0xf80003dfeed0, frame=0xfe2e9ac0) at /usr/src/sys/kern/kern_fork.c:977 #17 0x80d0173e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605 #18 0x in ?? () (kgdb) frame 12 #12 0x81eca848 in zio_vdev_io_assess (ziop=value optimized out) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2874 2874ASSERT(!(zio-io_flags ZIO_FLAG_DELEGATED)); (kgdb) print zio $3 = (zio_t *) 0xf801a8abc398 (kgdb) print *zio $4 = {io_bookmark = {zb_objset = 4339, zb_object = 327827, zb_level = 0, zb_blkid = 0}, io_prop = {zp_checksum = ZIO_CHECKSUM_INHERIT, zp_compress = ZIO_COMPRESS_INHERIT, zp_type = DMU_OT_NONE, zp_level = 0 '\0', zp_copies = 0 '\0', zp_dedup = 0, zp_dedup_verify = 0, zp_nopwrite = 0}, io_type = ZIO_TYPE_WRITE, io_child_type = ZIO_CHILD_VDEV, io_cmd = 0, io_priority = ZIO_PRIORITY_ASYNC_WRITE, io_reexecute = 0 '\0', io_state = \001, io_txg = 1312558, io_spa = 0xfe00022e6000, io_bp = 0xfe000a94a640, io_bp_override = 0x0, io_bp_copy = {blk_dva = {{ dva_word = {1, 58754170}}, {dva_word = {1, 69614673}},
Re: HEADS UP: sparc64 backend for llvm/clang imported
On 01/03/14 20:51, John-Mark Gurney wrote: Florian Smeets wrote this message on Sat, Mar 01, 2014 at 16:28 +0100: On 01/03/14 02:16, John-Mark Gurney wrote: Ok, I have a new pcpu patch to try. I have only compile tested it. It is available here: https://www.funkthat.com/~jmg/sparc64.pcpu.patch I've also attached it. Craig, do you mind testing it? My machine doesn't boot with this patch. OK boot -v Booting... jumping to kernel entry at 0xc0088000. OF_panic: sparc64_init: cannot find boot CPU node Program terminated {1} ok I'm now going to try the version that dim sent. Does it boot w/o the patch? Is this a clang built loader/kernel or a gcc built loader/kernel that you tried the patch on? From a quick look at the code, it doesn't look like my patch would have effected this part of the kernel... Ok, all of the following was with dim's version of the patch. I can retry with your version too, but I don't think it will make a difference. The kernel works fine with gcc, but doesn't work compiled with clang. Booting [/boot/kernel/kernel]... jumping to kernel entry at 0xc0088000. OF_panic: sparc64_init: cannot find boot CPU node Program terminated {1} ok So, the same panic with your and dim's patch, compiled with clang. Userland was compiled with gcc, cc is gcc and I used CC=clang make kernel to build the kernel with clang. Florian signature.asc Description: OpenPGP digital signature
Re: HEADS UP: sparc64 backend for llvm/clang imported
On 01/03/14 02:16, John-Mark Gurney wrote: Dimitry Andric wrote this message on Fri, Feb 28, 2014 at 20:22 +0100: For building the sparc64 kernel, there is one open issue left, which is that sys/sparc64/include/pcpu.h uses global register variables, and this is not supported by clang. A preliminary patch for this is attached, but it may or may not blow up your system, please beware! The patch changes the pcpu and curpcb global register variables into inline functions, similar to what is done on other architectures. However, the current approach is not optimal, and the emitted code is slightly different from what gcc outputs. Any improvements to this patch are greatly appreciated! Last but not least, thanks go out to Roman Divacky for his work with llvm/clang upstream in getting the sparc64 backend into shape. Ok, I have a new pcpu patch to try. I have only compile tested it. It is available here: https://www.funkthat.com/~jmg/sparc64.pcpu.patch I've also attached it. Craig, do you mind testing it? My machine doesn't boot with this patch. OK boot -v Booting... jumping to kernel entry at 0xc0088000. OF_panic: sparc64_init: cannot find boot CPU node Program terminated {1} ok I'm now going to try the version that dim sent. Florian signature.asc Description: OpenPGP digital signature
Re: PACKAGESITE spam
On 22/12/13 00:04, Steve Kargl wrote: On Sat, Dec 21, 2013 at 11:14:39PM +0100, Baptiste Daroussin wrote: Other than the noise in /var/log/message, what does this provide that 'pkg info' doesn't! Please turn of this feature by default. No please don't, it's a very useful feature, you can see when packages were installed or upgraded. Which can be very useful in certain environments. Additionally, looking at other systems (SLES, CentOS, Debian and Ubuntu from the top of my head) they all have a log that keeps track of packages. The idea cannot be *that* useless. Florian signature.asc Description: OpenPGP digital signature
Re: Kernel hangs on reboot on system with 05/2013~06/2013 CURRENT sources
On 26.06.13 03:19, Attilio Rao wrote: On Tue, Jun 25, 2013 at 11:27 PM, Florian Smeets f...@smeets.im wrote: On 06/25/2013 22:45, Garrett Cooper wrote: Long story short is that I've run into an issue on several VM images and real machines where UFS on mpt fails to reboot because it hangs in the kernel. I don't have any specific details, other than it occurs regularly with cam/mpt on VMware boxes running builds; however I've also seen this occur with a Dell box that has an mpt SAS controller with 2 zpools and gobs of RAM. Does anyone know of any issues in this area [recently]? This set of issues appears to have started cropping up after 03/2013, because I was running reliable builds off those sources. Thanks! -Garrett Yes, I saw the same thing today when rebooting a box running r251905: Tue Jun 18 10:12:42 CEST 2013 with ahci on a zfs only system. I update this box about once a week, that previous kernel was from Jun 11. and that still rebooted successfully. As the kernel from June 18. is now kernel.old I don't know the SVN rev for the June 11 kernel, but it looks like it was broken between June 11. and 18. Can you break into KDB once the loop happens? I cannot reproduce on kernels from yesterday/today. So for me the case is closed. Florian signature.asc Description: OpenPGP digital signature
Re: Kernel hangs on reboot on system with 05/2013~06/2013 CURRENT sources
On 06/25/2013 22:45, Garrett Cooper wrote: Long story short is that I've run into an issue on several VM images and real machines where UFS on mpt fails to reboot because it hangs in the kernel. I don't have any specific details, other than it occurs regularly with cam/mpt on VMware boxes running builds; however I've also seen this occur with a Dell box that has an mpt SAS controller with 2 zpools and gobs of RAM. Does anyone know of any issues in this area [recently]? This set of issues appears to have started cropping up after 03/2013, because I was running reliable builds off those sources. Thanks! -Garrett Yes, I saw the same thing today when rebooting a box running r251905: Tue Jun 18 10:12:42 CEST 2013 with ahci on a zfs only system. I update this box about once a week, that previous kernel was from Jun 11. and that still rebooted successfully. As the kernel from June 18. is now kernel.old I don't know the SVN rev for the June 11 kernel, but it looks like it was broken between June 11. and 18. However my workstation, also running a kernel from June 18. does still reboot OK, it's using ahci and UFS. Florian signature.asc Description: OpenPGP digital signature
Re: Booting an alternative kernel from loader prompt fails the first time only
On 20.04.13 18:05, Steven Hartland wrote: When trying to boot an alternative kernel from the loader prompt it fails the first time the command is run but succeeds the second time. Type '?' for a list of commands, 'help' for more detailed help. OK boot kernel.generic Booting... don't know how to load module '/boot/kernel.generic/kernel' OK boot kernel.generic Booting... /boot/kernel.generic/kernel text=0xd21288 data=.. Yes, I've been seeing the same thing for about 6-12 months maybe more. None of the people I asked were able to confirm, so I'm happy that I'm not imagining it :) I see this on serial as well as on the normal console in front of a PC. Florian signature.asc Description: OpenPGP digital signature
panic: vputx: missed vn_close
Hi, I got this while building packages with poudriere. I'm running r245188. Let me know if you need anything else from the dump. Florian VNASSERT failed 0xfe04fda5bba0: tag zfs, type VREG usecount 1, writecount 1, refcount 1 mountedhere 0 flags (VI_ACTIVE) VI_LOCKedv_object 0xfe062f6479f8 ref 0 pages 0 lock type zfs: EXCL by thread 0xfe00bd683480 (pid 34602, umount, tid 100578) panic: vputx: missed vn_close cpuid = 3 Uptime: 9h25m23s Dumping 13255 out of 32647 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% [...] (kgdb) where #0 doadump (textdump=1) at pcpu.h:229 #1 0x804c4ab7 in kern_reboot (howto=260) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:446 #2 0x804c4fc6 in vpanic (fmt=value optimized out, ap=value optimized out) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:753 #3 0x804c4e56 in kassert_panic (fmt=value optimized out) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:641 #4 0x8055714d in vputx (vp=0xfe04fda5bba0, func=2) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2243 #5 0x80d6b42f in null_reclaim (ap=value optimized out) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:743 #6 0x8070aee8 in VOP_RECLAIM_APV (vop=value optimized out, a=value optimized out) at vnode_if.c:1959 #7 0x8055844c in vgonel (vp=0xfe04fda5b7c0) at vnode_if.h:830 #8 0x80557a7f in vflush (mp=0xfe0533ce3cc0, rootrefs=1, flags=2, td=0xfe00bd683480) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2625 #9 0x80d6aa4e in nullfs_unmount (mp=0xfe0533ce3cc0, mntflags=value optimized out) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vfsops.c:250 #10 0x805502cf in dounmount (mp=0xfe0533ce3cc0, flags=134742016, td=value optimized out) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_mount.c:1314 #11 0x8054ff8b in sys_unmount (td=0xfe00bd683480, uap=0xff90d2c87a40) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_mount.c:1211 #12 0x806b4845 in amd64_syscall (td=0xfe00bd683480, traced=0) at subr_syscall.c:134 #13 0x8069d04b in Xfast_syscall () at exception.S:387 #14 0x000800882ffa in ?? () Previous frame inner to this frame (corrupt stack?) signature.asc Description: OpenPGP digital signature
Re: ZFS cache devs UNAVAIL
On 23.10.12 22:23, Andriy Gapon wrote: on 23/10/2012 23:08 Andriy Gapon said the following: on 23/10/2012 20:56 Michael Schmiedgen said the following: FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p I still would like to get the above information if possible. But here is a patch that you can try: I think that I introduced this bug because I used some old OpenSolaris code as an inspiration and completely missed the new states. My NAS experienced same problem, I thought the old IDE SSD had just died of old age, that's why i didn't investigate further yet. :) With the patch the cache device is back. Thanks, Florian signature.asc Description: OpenPGP digital signature
Re: MPSAFE VFS -- List of upcoming actions
On 20.09.12 00:26, Bryan Drewery wrote: On 9/18/2012 9:48 PM, Attilio Rao wrote: In addition to fusefs-kmod, Bryan and Florian have also updated fusefs-lib and fusefs-ntfs ports. For instance, please refer to this e-mail: http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html Even if this work is someway independent by the fusefs-kmod import, I warmly suggest to all of you to use their patches (and this what we have been testing so far too). I have committed my updates to sysutils/fusefs-ntfs now. The sysutils/fusefs-libs port was updated a few minutes ago. Florian signature.asc Description: OpenPGP digital signature
Re: pkgng suggestion: renaming /usr/sbin/pkg to /usr/sbin/pkg-bootstrap
On 08/24/2012 10:15, Baptiste Daroussin wrote: On Thu, Aug 23, 2012 at 06:19:57PM -0400, Steve Wills wrote: Hi, It seems to me that renaming the pkg binary in /usr/sbin/pkg to /usr/sbin/pkg-bootstrap would make sense. From a user standpoint, it is confusing that running the command gets different results the second time it is run vs. the first time. I can imagine a user saying I ran pkg, but it didn't do what they said it would. Now I run it again, and it does do what it is supposed to. Also, it would enable setting up a pkg-bootstrap man page separate from the pkg man page, without confusion about which one you're looking at. So, opinions? There may still be time to fix it for 9.1 if we can decide quickly. Thanks, Steve BTW for people who haven't tested and want to share their opinion, here is how work /usr/sbin/pkg: it first checks if ${LOCALBASE}/sbin/pkg is there - if yes it directly execute ${LOCALBASE}/sbin/pkg with arguments passed to /usr/sbin/pkg - if no then it will determine you ABI (or take the one in environnement variable), and fetch the last available pkgng version from http://pkgbeta... it will extract pkg-static and use it to install pkgng with itself. on installation is done: it executes ${LOCALBASE}/sbin/pkg with arguments passed to /usr/sbin/pkg. Lots of people having ask in the early days of pkgng for a transparent bootstrap I have done it that way. On of the thing I forgot and kan@ has added is a prompt for the user in case it is going to bootstrap. So that mean that for a normal user, on a fresh vanilla FreeBSD pkg install vim-lite will prompt the user asking if he wants to bootstrap pkgng, and once bootstraped proceed to the installation of vim-lite if pkgng is already there then it will just install vim-lite. It was just to clarify, so that anyone understand was this is about. I tend to like the bootstrap like it is now (I find it transparent, and straight forward) but as I said earlier I have no strong opinion on this, so it most people prefers a separate pkg-bootstrap tools then I'll do it :) Having installed a few 9.1-BETA1 boxes recently, i have to say i absolutely like this behavior, it's totally transparent and you can just start installing packages like you could with pkg_add -r. I don't see the need to introduce an additional step. I actually think the current behavior is user friendly, and renaming it would make it more difficult. If people think the current behavior is misleading we could still clarify the confirmation message. Florian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?
On 20.08.12 10:32, Doug Barton wrote: On 08/15/2012 03:18, Alexander Motin wrote: It is quite pointless to speculate without real info like mentioned above KTR_SCHED traces. I'm sorry, you're quite wrong about that. In the cases I mentioned, and in about 2 out of 3 of the cases where users reported problems and I suggested that they try 4BSD, the results were clear. This obviously points out that there is a serious problem with ULE, and if I were the one who was responsible for that code I would be looking at ways of helping users figure out where the problems are. But that's just me. Main thing I've learned about schedulers, things there never work as you expect. There are two many factors are relations to predict behavior in every case. In the web hosting case that I mentioned, I purposely kept every other factor consistent; and changed only s/ULE/4BSD/. The results were both clear and consistent. Can you please prove that with some actual numbers? I seem to recall you posted something not too long ago but i was unable to find that right now. Also can you tell us what you ran and how. I would really like to reproduce this. Thanks, Florian signature.asc Description: OpenPGP digital signature
Re: Scheduler + IPC performance on FreeBSD 7.4, 8.2, 9.0 and -CURRENT
On 05.04.12 20:03, Arnaud Lacombe wrote: Hi folks, Hi, Over the past months, I ran on a couple of unused box the `hackbench'[HACKBENCH] benchmark used by the Linux folks for tracking down various kind of regression/improvement. `hackbench' is a scheduler + IPC test (socket xor pipe). It creates producers/consumers groups and let a variable quantity of small messages flow happily. Producers and consumers are either processes xor threads. [Lots of likely very interesting and valuable data.] Q4: So, how can I get all the graph ? R4: All you need is git, a posix shell, a couple of utility (find, sort, ...), a recent gnuplot, and a ruby interpreter. Can you give us some hints on *how* to get the results? I checked the repo out but it's not immediately obvious what to do and how to get the graphs, as staring at thousands of numbers in lots of different files isn't exactly practical. Thanks, Florian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: /sys/conf/kmod.mk, line 111: Malformed conditional (${MK_CLANG_IS_CC}
On 03.03.12 14:24, Chris Rees wrote: On 3 March 2012 11:48, O. Hartmann ohart...@zedat.fu-berlin.de wrote: On one of my FreeBSD 10.0-CURRENT boxes I receive this morning this error message as shown below. I need to add, that I compiled the shown nvidia-driver hours ago on all FreeBSD 9.0-STABLE boxes with the same settings and I compiled the driver just two days before the same way I tried it this morning. What's wrong? Some unexpected breakage? Then this is my shout to the community. Message below. Regards and thanks in advance, Oliver === Vulnerability check disabled, database not found === License NVIDIA accepted by the user === Found saved configuration for nvidia-driver-295.20 === Extracting for nvidia-driver-295.20 = SHA256 Checksum OK for NVIDIA-FreeBSD-x86_64-295.20.tar.gz. === Patching for nvidia-driver-295.20 === nvidia-driver-295.20 depends on file: /usr/local/libdata/pkgconfig/xorg-server.pc - found === nvidia-driver-295.20 depends on shared library: GL.1 - found === Configuring for nvidia-driver-295.20 === Building for nvidia-driver-295.20 === src (all) /sys/conf/kmod.mk, line 111: Malformed conditional (${MK_CLANG_IS_CC} == no ${CC:T:Mclang} != clang) /sys/conf/kmod.mk, line 115: if-less endif /sys/conf/kern.mk, line 18: Malformed conditional (${MK_CLANG_IS_CC} != no || ${CC:T:Mclang} == clang) /sys/conf/kern.mk, line 31: if-less endif /sys/conf/kern.mk, line 101: Malformed conditional (${MK_CLANG_IS_CC} == no ${CC:T:Mclang} != clang) /sys/conf/kern.mk, line 109: if-less endif make: fatal errors encountered -- cannot continue *** [all] Error code 1 Please post your make.conf and src.conf. No need, a buildworld / installworld cycle will fix it. A make install in src/share/mk *could* also be enough, but i haven't tested it. Florian signature.asc Description: OpenPGP digital signature
Re: SeaMonkey eats the CPU as of r232144
On 02.03.12 20:08, Aleksandr Rybalko wrote: On Fri, 2 Mar 2012 09:01:25 -0800 Adrian Chadd adr...@freebsd.org wrote: Ok. So it's that exact commit? david, what did you break? :) I bet it is old enough :) I'm on 9.0-PRERELEASE #3 r227950 and when Seamonkey can't reach some document it get 100% cpu. one time I even attach to it and found what seamonkey polling socket very-very fast, but no I'm have not so much free time to found what really broken. IIRC same happen in FF also. Aleksandr, please upgrade your nspr to the latest version. This should have been fixed by http://lists.freebsd.org/pipermail/cvs-ports/2011-September/225460.html Florian signature.asc Description: OpenPGP digital signature
Re: Processes getting stuck in state tmpfs
On 01.03.12 20:31, Gleb Kurtsou wrote: Could you test the patch attached. It's also available here as seperate commits: https://github.com/glk/freebsd-head/commits/tmpfs-rename The test that used to hang within a minute has now been running successfully for almost 2 hours. Looks good to me. Florian signature.asc Description: OpenPGP digital signature
flowtable usable or not (was: Re: [CFT] modular kernel config
On 28.02.12 23:14, Doug Barton wrote: On 2/28/2012 10:48 AM, Arnaud Lacombe wrote: You will sure go really far with this kind of It is broken ? Let's not fix it and disable it instead mentality, even more when coming from a committer. As long as there will be these kind of comments around here, FreeBSD will deserve nothing but to keep dying piece by piece, and it will be deserved. In general, I tend to agree with you, but in this case it's useful to know the history of the flowtable option. 1. It was introduced in -current 2. It received fairly good testing, was pronounced good and useful, and MFC'ed. 3. Several releases happened with flowtable. 4. Users started to report problems that were ultimately tracked down to flowtable. 5. Ultimately it was decided that flowtable was not a universal good. 6. The developer of the option agreed that it should be disabled by default until such time as it can be fixed. 7. The fixing hasn't happened yet. I talked to Kip Macy, who implemented flowtable, about this. He thinks that the problem was caused by inappropriate default setting of net.inet.ip.output_flowtable_size. This should have been fixed by r205488 which was MFC'd to 8 and should be part of 8.2 and of course 9.0. However nobody who experienced the problem wanted to try any of these releases with flowtable enabled, so we still don't know if it's fixed or not. Should anyone try this it could certainly be the case that net.inet.ip.output_flowtable_size needs to be tuned even more. Florian signature.asc Description: OpenPGP digital signature
Re: Processes getting stuck in state tmpfs
On 11.02.12 11:20, Gleb Kurtsou wrote: On (10/02/2012 22:41), Florian Smeets wrote: Hi, if you set WRKDIRPREFIX to a tmpfs mountpoint and try to build audio/gsm from ports one of the mv processes gets stuck in state tmpfs quite often. Traces from a kernel with WITTNESS and DEBUG_VFS_LOCKS are available here http://tb.smeets.im/~flo/tmpfs.txt It's because of incorrect vnode locking order in tmpfs_rename. Issue is known and tmpfs is not the only file system suffering from it (e.g. ext2). There two ways of working around it in tree: * UFS: try locking vnode, unlock all vnodes on failure, restart, relookup vnodes needed. * ZFS: introduce directory entry locks to guarantee fvp won't disappear, fdvp can be safely traversed, etc. That won't be easy.. UFS-way would be a good temporal solution, but I think we should work on improving VOP_RENAME() in a long run. I'll try to prepare a patch in several days. Hey Gleb, did you get anywhere with this? Thanks, Florian signature.asc Description: OpenPGP digital signature
Processes getting stuck in state tmpfs
Hi, if you set WRKDIRPREFIX to a tmpfs mountpoint and try to build audio/gsm from ports one of the mv processes gets stuck in state tmpfs quite often. Traces from a kernel with WITTNESS and DEBUG_VFS_LOCKS are available here http://tb.smeets.im/~flo/tmpfs.txt Florian signature.asc Description: OpenPGP digital signature
ULE vs. 4BSD scheduler benchmarks
[current@ bcc'ed to get a wider audience, please discuss on performance@] Hi, in recent times i saw a lot of threads where it was suggested people should switch from the ULE to the 4BSD scheduler. That got me thinking and i decided to run a few benchmarks. I looked through all the stuff Kris and Jeff did a few years ago and tried to follow their example. The main motivation is however that we (Attilio Rao and I) want to set a baseline for future reference, mainly for the work that's going on in the vmcontention branch right now, that is the reason why all tests were run on head@r229659. All debugging was disabled (WITNESS and friends for the kernel and MALLOC_PRODUCTION=yes for libc). For now i ran 3 different things. MySQL/sysbench, PostgreSQL/pgbench and pbzip2. All software was installed from ports with the default system gcc (gcc version 4.2.1 20070831 patched [FreeBSD]), with the exception of PostgreSQL. I created new postgres92-{server,client} ports with a snapshot of PostgreSQL 9.2dev from 16.01.2012, as a lot of scalability work was done in PostgreSQL 9.2. MySQL version 5.5.20 sysbench version 0.4.12 PostgreSQL/pgbench version 9.2dev PBZIP2 version v1.1.6 The machine these test were run on is a 2x4 core Xeon L5310 @ 1.60GHz with 4GB RAM. Here is the complete topology: kern.sched.topology_spec: groups group level=1 cache-level=0 cpu count=8 mask=ff0, 1, 2, 3, 4, 5, 6, 7/cpu children group level=2 cache-level=2 cpu count=4 mask=f0, 1, 2, 3/cpu /group group level=2 cache-level=2 cpu count=4 mask=f04, 5, 6, 7/cpu /group /children /group /groups The database benchmarks were all run with a work set that fit into the configured database memory, so after the warmup phase no disk io was involved. sysbench was run with 1 million rows, innodb was the engine we used as Kris work already showed that it scales much better than myisam (also innodb is the default in MySQL's 5.5 branch). Pgbench was run using a scaling factor of 100. The connection to the databases was using a unix socket, also only read only tests were run. The input and output files for the pbzip2 test were on tmpfs. The results are available in this Google docs spreadsheet, if you scroll down there are also some nice graphs. https://docs.google.com/spreadsheet/ccc?key=0Ai0N1xDe3uNAdDRxcVFiYjNMSnJWOTZhUWVWWlBlemc Over time i will add more benchmarks to the doc (i.e nginx/php-fpm and so on). I tried to run some nginx benchmarks, but those are limited by netisr, as i did not find a web server benchmark tool which can use unix sockets, any suggestions welcome. The conclusion right now seems to be that ULE is faster for database workload, but for strongly CPU-bound workloads 4BSD can be a better choice. I can provide KTR traces and/or schedgraph output for cases where 4BSD is better than ULE. I want to thank Sean Bruno and Yahoo for setting up / providing the machines to run these test on, and Attilio for suggestions and his general helpfulness. Florian signature.asc Description: OpenPGP digital signature
Re: dogfooding over in clusteradm land
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03.01.2012 10:18, Kostik Belousov wrote: On Tue, Jan 03, 2012 at 12:02:22AM -0800, Don Lewis wrote: On 2 Jan, Don Lewis wrote: On 2 Jan, Don Lewis wrote: On 2 Jan, Florian Smeets wrote: This does not make a difference. I tried on 32K/4K with/without journal and on 16K/2K all exhibit the same problem. At some point during the cvs2svn conversion the sycer starts to use 100% CPU. The whole process hangs at that point sometimes for hours, from time to time it does continue doing some work, but really really slow. It's usually between revision 21 and 22, when the resulting svn file gets bigger than about 11-12Gb. At that point an ls in the target dir hangs in state ufs. I broke into ddb and ran all commands which i thought could be useful. The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000 cpustop_handler() at cpustop_handler+0x2b ipi_nmi_handler() at ipi_nmi_handler+0x50 trap() at trap+0x1a8 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 0xff88c97829a0 --- _mtx_assert() at _mtx_assert+0x13 pmap_remove_write() at pmap_remove_write+0x38 vm_object_page_remove_write() at vm_object_page_remove_write+0x1f vm_object_page_clean() at vm_object_page_clean+0x14d vfs_msync() at vfs_msync+0xf1 sync_fsync() at sync_fsync+0x12a sync_vnode() at sync_vnode+0x157 sched_sync() at sched_sync+0x1d1 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 --- I thinks this explains why the r228838 patch seems to help the problem. Instead of an application call to msync(), you're getting bitten by the syncer doing the equivalent. I don't know why the syncer is CPU bound, though. From my understanding of the patch it only optimizes the I/O. Without the patch, I would expect that the syncer would just spend a lot of time waiting on I/O. My guess is that this is actually a vm problem. There are nested loops in vm_object_page_clean() and vm_object_page_remove_write(), so you could be doing something that's causing lots of looping in that code. Does the machine recover if you suspend cvs2svn? I think what is happening is that cvs2svn is continuing to dirty pages while the syncer is trying to sync the file. From my limited understanding of this code, it looks to me like every time cvs2svn dirties a page, it will trigger a call to vm_object_set_writeable_dirty(), which will increment object-generation. Whenever vm_object_page_clean() detects a change in the generation count, it restarts its scan of the pages associated with the object. This is probably not optimal ... Since the syncer is only trying to flush out pages that have been dirty for the last 30 seconds, I think that vm_object_set_writeable_dirty() should just make one pass through the object, ignoring generation, and then return when it is called from the syncer. That should keep vm_object_set_writeable_dirty() from looping over the object again and again if another process is actively dirtying the object. This sounds very plausible. I think that there is no sense in restarting the scan if it is requested in async mode at all. See below. Would be thrilled if this finally solves the svn2cvs issues. commit 41aaafe5e3be5387949f303b8766da64ee4a521f Author: Kostik Belousov kostik@sirion Date: Tue Jan 3 11:16:30 2012 +0200 Do not restart the scan in vm_object_page_clean() if requested mode is async. Proposed by: truckman diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c index 716916f..52fc08b 100644 --- a/sys/vm/vm_object.c +++ b/sys/vm/vm_object.c @@ -841,7 +841,8 @@ rescan: if (p-valid == 0) continue; if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) { - if (object-generation != curgeneration) + if ((flags OBJPC_SYNC) != 0 + object-generation != curgeneration) goto rescan; np = vm_page_find_least(object, pi); continue; @@ -851,7 +852,8 @@ rescan: n = vm_object_page_collect_flush(object, p, pagerflags, flags, clearobjflags); -if (object-generation != curgeneration) + if ((flags OBJPC_SYNC) != 0 + object-generation != curgeneration) goto rescan; /* Yes, the patch fixes the problem. The cvs2svn run completed this time. 9132.25 real 8387.05 user 403.86 sys I did not see any significant syncer activity in top -S anymore. Thanks a lot. Florian -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8C+KYACgkQapo8P8lCvwkc+QCeLY8+OkEQo1/wB3J2TyjfXyc0 b0IAn1OJo1XUlBYPZRoU5NFSO5dnNbne =IGEW -END PGP SIGNATURE
Re: dogfooding over in clusteradm land
On 29.12.11 01:04, Kirk McKusick wrote: Rather than changing BKVASIZE, I would try running the cvs2svn conversion on a 16K/2K filesystem and see if that sorts out the problem. If it does, it tells us that doubling the main block size and reducing the number of buffers by half is the problem. If that is the problem, then we will have to increase the KVM allocated to the buffer cache. This does not make a difference. I tried on 32K/4K with/without journal and on 16K/2K all exhibit the same problem. At some point during the cvs2svn conversion the sycer starts to use 100% CPU. The whole process hangs at that point sometimes for hours, from time to time it does continue doing some work, but really really slow. It's usually between revision 21 and 22, when the resulting svn file gets bigger than about 11-12Gb. At that point an ls in the target dir hangs in state ufs. I broke into ddb and ran all commands which i thought could be useful. The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt The machine is still in ddb and i could run any additional commands, the kernel is from Attilio's vmcontention branch, which was MFCed yesterday, and updated after the MFC. The same problem happens on 9.0-RC3. If i run the same test on a zfs filesystem i don't see any problems. Florian signature.asc Description: OpenPGP digital signature
Re: dogfooding over in clusteradm land
On 14.12.11 14:20, Sean Bruno wrote: We're seeing what looks like a syncher/ufs resource starvation on 9.0 on the cvs2svn ports conversion box. I'm not sure what resource is tapped out. Effectively, I cannot access the directory under use and the converter application stalls out waiting for some resource that isn't clear. (Peter had posited kmem of some kind). I've upped maxvnodes a bit on the host, turned off SUJ and mounted the f/s in question with async and noatime for performance reasons. Can someone hit me up with the cluebat? I can give you direct access to the box for debuginationing. Just for the archives. This is fixed or at least considerably improved by r228838. The ports cvs2svn run went down from panicking after about ~22h to being finished after ~10h. Thanks to Sean and Attilio for giving me access to test boxes. Florian signature.asc Description: OpenPGP digital signature
Re: buildworld has been broken for me since Sunday 20110815 at atrun
On 16.08.2011 15:47, eculp wrote: Is anyone else seeing this? This is current AMD64. I'm running the Yes, there have been X mails about it on this mailing list and an entry in UPDATING ;) compile from Saturday 20110814 that I have been hammering and it has been rock solid. /usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `_nsyylex' /usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `_nsyyin' /usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `_nsyytext' /usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `_nsyyerror' /usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `_nsyylineno' *** Error code 1 You need to update your kerenl, buildworld will work after that. Cheers, Florian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: missing files in readdir(3) on NFS export of ZFS volume (since v28?)
On 08.03.11 19:40, Pawel Jakub Dawidek wrote: On Mon, Mar 07, 2011 at 01:08:46AM +0100, Pierre Beyssac wrote: Hello, I'm running a 9-current server as compiled on Sat Mar 5 02:17:14 CET 2011. Since I upgraded to ZFS v28 I noticed missing files from NFS. The files are still accessible through NFS but they don't show up on a readdir(3). [...] Could you try r219404? Yes, that fixed it for me. Thanks, Florian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: libcompiler_rt now part of FreeBSD's base system
On 11.11.10 16:52, Ed Schouten wrote: I just committed libcompiler_rt.a to HEAD. Even though I don't expect serious issues -- especially not on the tier 1 architectures -- be sure to contact me in case something goes wrong. I hooked it up to the build in a separate commit, so if your system starts to act weird, just revert r215127. Hi Ed, i'm at r215149 on sparc64, and my compiler stopped working. buildworld stops after 42 lines (http://smeets.im/~flo/bw.log). cc1 dumps a 1GB core file. Program terminated with signal 4, Illegal instruction. #0 0x004ced80 in ?? () (gdb) where #0 0x004ced80 in ?? () #1 0x004cedb0 in ?? () Previous frame identical to this frame (corrupt stack?) Right now i cannot go back to r215126 to verify that it really is this change which is causing it :-) Previously the system was running a build from around Nov. 1st Anything i can do to narrow this down? -- Florian Smeets ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Deadlock in UFS/SU+J?
On 12.08.10 13:19, Kostik Belousov wrote: On Thu, Aug 12, 2010 at 09:37:04AM +0200, Lucius Windschuh wrote: Dear list members, I tried to reproduce another bug on my test machine (i386, CURRENT r211175), but ran into the following deadlock: This is not a deadlock, but the LOR. It is irrelevant for your deadlock. Supposedly, the deadlock is fixed by r211213. Thank you! My soekris used to deadlock within 20-30 minutes when using nzbget on a SUJ filesystem. It has now been running for almost 5 hours. Looking good :-) Thanks, Florian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Weird reboots from bootmgr or loader
Lukas Ertl wrote: I saw _exactly_ the same problem on one of my boxes today: it was shutdowned correctly yesterday, and today it wouldn't boot, but panic right after boot0. The only thing I could see were some hex numbers and BTX halted for a split second, then immediately reboot. It's a -current box from Sunday evening CEST. I managed to fix it by booting from floppies and running the Fixit floppy, writing a new disklabel, which seems to have become corrupted somehow. Yes this really did the trick! :-) thanks Lukas! regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: -current want't boot this morning
quote who=Lukas Ertl On Thu, 7 Aug 2003, Gordon Bergling wrote: I tried to boot my -current this morning but the boot process only goes to bootmgr. She shows normal F1 FreeBSD F2 Other (not sure if this is (Other||Unknown) After pressing F1-Key the computer resets himself. I tried to wait but on autoboot the same thing happens. Let me guess: your swap partition is the first partition? The -current was build yesterday with recent sources. You need to boot a fixit floppy and re-write your bootblocks. Then cvsup to the very latest -current. This problem should be fixed already. shouldn't we mention this in UPDATING, since this is geting an FAQ ? regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Weird reboots from bootmgr or loader
Lukas Ertl wrote: On Tue, 5 Aug 2003, Florian Smeets wrote: Lukas Ertl wrote: I managed to fix it by booting from floppies and running the Fixit floppy, writing a new disklabel, which seems to have become corrupted somehow. Yes this really did the trick! :-) May I ask if you boot from a vinum volume? No i don't use vinum on any of the boxes. regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Weird reboots from bootmgr or loader
Hello everyone ! I got a really anoying Problem today. 3 different boxes started to reboot when i hit enter at the bootmgr, or when i don't hit enter and wait for it to boot FreeBSD it reboots when the loader should apear. I can see that it prints out some numbers but its to fast to recognise anything. I did not mess with the disk configuration on any of the boxes. I only rebuild world and kernel as usual every few days. They all had sources from 3rd or 4th of august. I don't have my serial cable handy i'll try to get it back this evening and see if i can provide any further information. One thing that might be special is that they all have -march={p2,pentium3,pentium4} in make.conf. I don't have *any* clue what might happen here. Regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Weird reboots from bootmgr or loader
quote who=Lukas Ertl On Tue, 5 Aug 2003, John-Mark Gurney wrote: Are you running -current w/ a kernel from the last 24 hrs? (After phk's mass swap check in?) If so, make sure your swap isn't at the start of your disk. If it is, phk was nice enough to only blow away your boot blocks instead of your disk label too. :) Swap currently uses all but the first page (4k on i386). Argl, YES, swap _is_ the first partition: Yes same here! Does that mean I have to rearranged or never build world again? Yeah good question, what can we do ? regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Does linux-sun-jdk_1.4.2 work?
Adam wrote: On Mon, 2003-07-21 at 08:55, Yamada Ken Takeshi wrote: tyd3# /usr/local/linux-sun-jdk1.4.2/bin/java -version # # HotSpot Virtual Machine Error, Internal Error # Please report this error at # http://java.sun.com/cgi-bin/bugreport.cgi # # Java VM: Java HotSpot(TM) Client VM (1.4.2-b28 mixed mode) # # Error ID: 4F533F4C494E55580E43505001C9 # Heap at VM Abort: Heap It happens to me too. Looks like this port was released too quickly. Java on FreeBSD is always a bit dodgy, it seems. Hi, try mounting linprocfs(5) that seemed to solved the problem for me. regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [acpi-jp 2382] Re: Updated ec-burst.diff patch
Nate Lawson wrote: On Thu, 3 Jul 2003, John Baldwin wrote: On 03-Jul-2003 Nate Lawson wrote: On Thu, 3 Jul 2003, M. Warner Losh wrote: I personally think that all tunable should be read-only (or rw if possible) sysctls... I'm still not sure why we have both mechanisms. Perhaps a useful approach would be to sweep the tree for tunables and change them to sysctls with appropriate permissions (read-only if in doubt). Then remove the tunable mechanism. Care to put together a patch? Cause you can't set sysctl's from the loader, only tunables? Are you going to duplicate the entire kernel environment from 'kenv' in sysctl? Ah, I thought the two had been merged such that you could do that. You can set sysctls from loder.conf. I just checked it to be shure. regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Updated ec-burst.diff patch
Nate Lawson wrote: On Wed, 2 Jul 2003, Florian Smeets wrote: I set hw.acpi.ec.burst_mode=0 in loader.conf but when i was trying to chek if it was set to 0 with sysctl hw.acpi.ec.burst_mode i got : [EMAIL PROTECTED] [~] 15 #sysctl hw.acpi.ec.burst_mode sysctl: unknown oid 'hw.acpi.ec.burst_mode' It's a tunable, not a sysctl. So you can only set it in loader.conf. Are there any messages when you boot with that in your loader.conf? Would you please post a separate dmesg for that case? Well i think i need to read the docs about this topic again ;) but actually the dmesg is with hw.acpi.ec.burst_mode=0 set in loader.conf i have put up another one http://flds.dyndns.org/dmesg2 without hw.acpi.ec.burst_mode=0 but it doesn't seem to change anything. Regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Updated ec-burst.diff patch
Nate Lawson wrote: Also, please report how adding hw.acpi.ec.burst_mode=0 to loader.conf changes things (but turn on hw.acpi.verbose first so we get good msgs). Well with hw.acpi.verbose=1 the messages look like this: Jul 2 00:30:39 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:30:39 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_SB_.BAT0._BST] (Node 0xc2502700), AE_ERROR Jul 2 00:30:39 lappi kernel: acpi_cmbat0: error fetching current battery status -- AE_ERROR Jul 2 00:30:54 lappi kernel: acpi_ec0: EcCommand: no response to 82 Jul 2 00:30:54 lappi kernel: acpi_ec0: EcCommand: no response to 80 Jul 2 00:30:54 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:30:54 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_TZ_.TZN0._TMP] (Node 0xc2502b60), AE_ERROR Jul 2 00:30:54 lappi kernel: acpi_tz0: error fetching current temperature -- AE_ERROR Jul 2 00:31:09 lappi kernel: acpi_ec0: EcCommand: no response to 80 Jul 2 00:31:09 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:31:09 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_SB_.ADP0._PSR] (Node 0xc2502880), AE_ERROR Jul 2 00:31:24 lappi kernel: acpi_ec0: EcCommand: no response to 80 Jul 2 00:31:24 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:31:24 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_TZ_.TZN0._TMP] (Node 0xc2502b60), AE_ERROR Jul 2 00:31:24 lappi kernel: acpi_tz0: error fetching current temperature -- AE_ERROR Jul 2 00:31:39 lappi kernel: acpi_ec0: EcCommand: no response to 80 Jul 2 00:31:39 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:31:39 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_SB_.ADP0._PSR] (Node 0xc2502880), AE_ERROR Jul 2 00:31:39 lappi kernel: acpi_ec0: EcCommand: no response to 80 Jul 2 00:31:39 lappi kernel: ACPI-0432: *** Error: Handler for [EmbeddedControl] returned AE_ERROR Jul 2 00:31:39 lappi kernel: ACPI-1287: *** Error: Method execution failed [\_SB_.BAT0._BST] (Node 0xc2502700), AE_ERROR Jul 2 00:31:39 lappi kernel: acpi_cmbat0: error fetching current battery status -- AE_ERROR here is the output of dmesg | egrep acpi_ec0\|EC\ Wait acpi_ec0: embedded controller port 0x66,0x62 on acpi0 EC Waited max 17 cycles, event occurred acpi_ec0: EcCommand: no response to 80 acpi_ec0: EcCommand: no response to 80 [...] acpi_ec0: EcCommand: no response to 80 acpi_ec0: EcCommand: no response to 82 acpi_ec0: EcRead: Failed waiting for EC to send data. acpi_ec0: EcCommand: no response to 80 acpi_ec0: EcCommand: no response to 80 The complete dmesg is available at http://flds.dyndns.org/dmesg . I set hw.acpi.ec.burst_mode=0 in loader.conf but when i was trying to chek if it was set to 0 with sysctl hw.acpi.ec.burst_mode i got : [EMAIL PROTECTED] [~] 15 #sysctl hw.acpi.ec.burst_mode sysctl: unknown oid 'hw.acpi.ec.burst_mode' Let me know if i can do something else. Regards, flo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]