CURRENT r296381 panic in vn_sendfile (/usr/src/sys/kern/kern_sendfile.c:833)
Hello. I get kernel panic on high loaded server with messages savecore: reboot after panic: vn_sendfile: mlen 326 space -20 hdrlen 326 # kgdb kernel.debug /var/crash/vmcore.0 Unread portion of the kernel message buffer: panic: vn_sendfile: mlen 326 space -20 hdrlen 326 cpuid = 5 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe20206314f0 vpanic() at vpanic+0x182/frame 0xfe2020631570 kassert_panic() at kassert_panic+0x126/frame 0xfe20206315e0 vn_sendfile() at vn_sendfile+0x14ca/frame 0xfe2020631900 sys_sendfile() at sys_sendfile+0x11e/frame 0xfe20206319a0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfe2020631ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe2020631ab0 --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801ef062a, rsp = 0x7fffd8d8, rbp = 0x7fffe1d0 --- KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/carp.ko...Reading symbols from /usr/lib/debug//boot/kernel/carp.ko.debug...done. done. Loaded symbols for /boot/kernel/carp.ko Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done. done. Loaded symbols for /boot/kernel/ums.ko Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/tmpfs.ko.debug...done. done. Loaded symbols for /boot/kernel/tmpfs.ko #0 doadump (textdump=0) at pcpu.h:221 221 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=0) at pcpu.h:221 #1 0x80384a0b in db_dump (dummy=, dummy2=false, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:533 #2 0x803847fe in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0x80384594 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0x8038702b in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:251 #5 0x80a656e3 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x80ea1298 in trap (frame=0xfe2020631420) at /usr/src/sys/amd64/amd64/trap.c:556 #7 0x80e81a77 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234 #8 0x80a64dcb in kdb_enter (why=0x813b6c2f "panic", msg=0x80 ) at cpufunc.h:63 #9 0x80a27b5f in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:750 #10 0x80a279b6 in kassert_panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:647 #11 0x80a25efa in vn_sendfile (fp=, sockfd=1619, hdr_uio=, trl_uio=0x0, offset=0, nbytes=, sent=, flags=, kflags=, td=0xa8) at /usr/src/sys/kern/kern_sendfile.c:833 #12 0x80a2641e in sys_sendfile (td=0xf80253593000, uap=0xfe2020631a40) at file.h:382 #13 0x80ea214b in amd64_syscall (td=0xf80253593000, traced=0) at subr_syscall.c:135 #14 0x80e81d5b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:394 #15 0x000801ef062a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) list *0x80a25efa 0x80a25efa is in vn_sendfile (/usr/src/sys/kern/kern_sendfile.c:833). 828 free(sfio, M_TEMP); 829 goto done; 830 } 831 832 /* Add the buffer chain to the socket buffer. */ 833 KASSERT(m_length(m, NULL) == space + hdrlen, 834 ("%s: mlen %u space %d hdrlen %d", 835 __func__, m_length(m, NULL), space, hdrlen)); 836 837 CURVNET_SET(so->so_vnet); System have 128Gb memory zfs as FS DB's worked on it and web pages served by this server. core saved. panic periodicaly repeted (few hours -- up to few days) Before this, old current (about two year old CURRENT ) work on this server without crashes. Can anybody point me to way of more complex problem diagnostic or any other useful things Thank you. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
CURRENT r296381 somewere near tcp_detach?
Hello. Just after report about panic somewere in sendfile (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=883140+0+current/freebsd-current), and disabling sendfile functionality in software (nginx) I got another kernel panic (at last twice for this moment) System message after reboot: Mar 5 05:49:11 srv11 savecore: reboot after panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL Mar 5 05:49:11 srv11 savecore: writing core to /var/crash/vmcore.2 kgdb kernel.debug /var/crash/vmcore.2 is : GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 11 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1f9d1f4730 vpanic() at vpanic+0x182/frame 0xfe1f9d1f47b0 kassert_panic() at kassert_panic+0x126/frame 0xfe1f9d1f4820 tcp_usr_detach() at tcp_usr_detach+0x1bc/frame 0xfe1f9d1f4850 sofree() at sofree+0x1a6/frame 0xfe1f9d1f4880 tcp_close() at tcp_close+0x11e/frame 0xfe1f9d1f48b0 tcp_timer_2msl() at tcp_timer_2msl+0x278/frame 0xfe1f9d1f48e0 softclock_call_cc() at softclock_call_cc+0x1af/frame 0xfe1f9d1f49c0 softclock() at softclock+0x47/frame 0xfe1f9d1f49e0 intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame 0xfe1f9d1f4a20 ithread_loop() at ithread_loop+0xa6/frame 0xfe1f9d1f4a70 fork_exit() at fork_exit+0x84/frame 0xfe1f9d1f4ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1f9d1f4ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/carp.ko...Reading symbols from /usr/lib/debug//boot/kernel/carp.ko.debug...done. done. Loaded symbols for /boot/kernel/carp.ko Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done. done. Loaded symbols for /boot/kernel/ums.ko Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/tmpfs.ko.debug...done. done. Loaded symbols for /boot/kernel/tmpfs.ko #0 doadump (textdump=0) at pcpu.h:221 221 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=0) at pcpu.h:221 #1 0x80384a0b in db_dump (dummy=, dummy2=false, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:533 #2 0x803847fe in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0x80384594 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0x8038702b in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:251 #5 0x80a656e3 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x80ea1298 in trap (frame=0xfe1f9d1f4660) at /usr/src/sys/amd64/amd64/trap.c:556 #7 0x80e81a77 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234 #8 0x80a64dcb in kdb_enter (why=0x813b6c2f "panic", msg=0x80 ) at cpufunc.h:63 #9 0x80a27b5f in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:750 #10 0x80a279b6 in kassert_panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:647 #11 0x80bf9bbc in tcp_usr_detach (so=) at /usr/src/sys/netinet/tcp_usrreq.c:213 #12 0x80aad0b6 in sofree (so=0xf81820f89000) at /usr/src/sys/kern/uipc_socket.c:820 #13 0x80bf179e in tcp_close (tp=) at /usr/src/sys/netinet/tcp_subr.c:1496 #14 0x80bf72f8 in tcp_timer_2msl (xtp=0xf81650263820) at /usr/src/sys/netinet/tcp_timer.c:374 #15 0x80a3d72f in softclock_call_cc (c=0xf81650263b68, cc=0x81d2db80, direct=0) at /usr/src/sys/kern/kern_timeout.c:723 #16 0x80a3dae7 in softclock (arg=) at /usr/src/sys/kern/kern_timeout.c:861 #17 0x809ee7b6 in intr_event_execute_handlers (p=, ie=0xf80114558d00) at /usr/src/sys/kern/kern_intr.c:1262 #18 0x809eee46 in ithread_loop (arg=0xf8011452fac0) at /usr/src/sys/kern/kern_intr.c:1275 #19 0x809ec074 in fork_exit (callout=0x809eeda0 , arg=0xf8011452fac0, frame=0xfe1f9d1f4ac0) at /usr/src/sys/kern/kern_fork.c:1034 #20 0x80e81fae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:609 #21 0x in ?? () Current language: auto; currently minimal kernel.debug and c
Re: CURRENT r296381 panic in vn_sendfile (/usr/src/sys/kern/kern_sendfile.c:833)
Hello. OK about 3 hours with last patch No panic. Sysctl - sysctl kern.ipc.sf_long_headers kern.ipc.sf_long_headers: 1 Gleb Smirnoff wrote: GS> Vitalij, GS> GS> here is latest version of the patch. If you already run the GS> previous one, no need to switch to this one, keep running as is. GS> The update covers only FreeBSD 4 and i386 compatibilties. GS> GS> current@, a review is appreciated. The patch not only fixes a GS> recent bug, but also fixes a long standing problem that headers GS> were not checked against socket buffer size. One could push GS> unlimited data into sendfile() with headers. The patch also GS> pushes also compat code under ifdef, so it is cut away if GS> you aren't interested in COMPAT_FREEBSD4. GS> GS> On Wed, Mar 23, 2016 at 04:59:25PM -0700, Gleb Smirnoff wrote: GS> T> Vitalij, GS> T> GS> T> although the first patch should fixup the panic, can you please GS> T> instead run this one. And if it is possible, can you please GS> T> monitor this sysctl: GS> T> GS> T> sysctl kern.ipc.sf_long_headers GS> T> GS> T> GS> T> -- GS> T> Totus tuus, Glebius. GS> GS> T> Index: sys/kern/kern_descrip.c GS> T> === GS> T> --- sys/kern/kern_descrip.c (revision 297217) GS> T> +++ sys/kern/kern_descrip.c (working copy) GS> T> @@ -3958,7 +3958,7 @@ badfo_chown(struct file *fp, uid_t uid, gid_t gid, GS> T> static int GS> T> badfo_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, GS> T> -int kflags, struct thread *td) GS> T> +struct thread *td) GS> T> { GS> T> GS> T> return (EBADF); GS> T> @@ -4044,7 +4044,7 @@ invfo_chown(struct file *fp, uid_t uid, gid_t gid, GS> T> int GS> T> invfo_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, GS> T> -int kflags, struct thread *td) GS> T> +struct thread *td) GS> T> { GS> T> GS> T> return (EINVAL); GS> T> Index: sys/kern/kern_sendfile.c GS> T> === GS> T> --- sys/kern/kern_sendfile.c (revision 297217) GS> T> +++ sys/kern/kern_sendfile.c (working copy) GS> T> @@ -95,6 +95,7 @@ struct sendfile_sync { GS> T> }; GS> T> GS> T> counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)]; GS> T> +static counter_u64_t sf_long_headers; /* QQQGL */ GS> T> GS> T> static void GS> T> sfstat_init(const void *unused) GS> T> @@ -102,6 +103,7 @@ sfstat_init(const void *unused) GS> T> GS> T> COUNTER_ARRAY_ALLOC(sfstat, sizeof(struct sfstat) / sizeof(uint64_t), GS> T> M_WAITOK); GS> T> +sf_long_headers = counter_u64_alloc(M_WAITOK); /* QQQGL */ GS> T> } GS> T> SYSINIT(sfstat, SI_SUB_MBUF, SI_ORDER_FIRST, sfstat_init, NULL); GS> T> GS> T> @@ -117,6 +119,8 @@ sfstat_sysctl(SYSCTL_HANDLER_ARGS) GS> T> } GS> T> SYSCTL_PROC(_kern_ipc, OID_AUTO, sfstat, CTLTYPE_OPAQUE | CTLFLAG_RW, GS> T> NULL, 0, sfstat_sysctl, "I", "sendfile statistics"); GS> T> +SYSCTL_COUNTER_U64(_kern_ipc, OID_AUTO, sf_long_headers, CTLFLAG_RW, GS> T> +&sf_long_headers, "times headers did not fit into socket buffer"); GS> T> GS> T> /* GS> T> * Detach mapped page and release resources back to the system. Called GS> T> @@ -516,7 +520,7 @@ sendfile_getsock(struct thread *td, int s, struct GS> T> int GS> T> vn_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, GS> T> -int kflags, struct thread *td) GS> T> +struct thread *td) GS> T> { GS> T> struct file *sock_fp; GS> T> struct vnode *vp; GS> T> @@ -534,7 +538,7 @@ vn_sendfile(struct file *fp, int sockfd, struct ui GS> T> so = NULL; GS> T> m = mh = NULL; GS> T> sfs = NULL; GS> T> -sbytes = 0; GS> T> +hdrlen = sbytes = 0; GS> T> softerr = 0; GS> T> GS> T> error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize); GS> T> @@ -560,26 +564,6 @@ vn_sendfile(struct file *fp, int sockfd, struct ui GS> T> cv_init(&sfs->cv, "sendfile"); GS> T> } GS> T> GS> T> -/* If headers are specified copy them into mbufs. */ GS> T> -if (hdr_uio != NULL && hdr_uio->uio_resid > 0) { GS> T> -hdr_uio->uio_td = td; GS> T> -hdr_uio->uio_rw = UIO_WRITE; GS> T> -/* GS> T> - * In FBSD < 5.0 the nbytes to send also included GS> T> - * the header. If compat is specified subtract the GS> T> - * header size from nbytes. GS> T> - */ GS> T> -if (kflags & SFK_COMPAT) { GS> T> -if (nbytes > hdr_uio->uio_resid) GS> T> -nbytes -= hdr_u
Re: CURRENT r296381 panic in vn_sendfile (/usr/src/sys/kern/kern_sendfile.c:833)
Just forget, system was upgraded to 296385 (just sync with another servers ) Vitalij Satanivskij wrote: VS> VS> Hello. VS> VS> OK about 3 hours with last patch VS> VS> No panic. VS> VS> Sysctl - VS> sysctl kern.ipc.sf_long_headers VS> kern.ipc.sf_long_headers: 1 VS> VS> VS> Gleb Smirnoff wrote: VS> GS> Vitalij, VS> GS> VS> GS> here is latest version of the patch. If you already run the VS> GS> previous one, no need to switch to this one, keep running as is. VS> GS> The update covers only FreeBSD 4 and i386 compatibilties. VS> GS> VS> GS> current@, a review is appreciated. The patch not only fixes a VS> GS> recent bug, but also fixes a long standing problem that headers VS> GS> were not checked against socket buffer size. One could push VS> GS> unlimited data into sendfile() with headers. The patch also VS> GS> pushes also compat code under ifdef, so it is cut away if VS> GS> you aren't interested in COMPAT_FREEBSD4. VS> GS> VS> GS> On Wed, Mar 23, 2016 at 04:59:25PM -0700, Gleb Smirnoff wrote: VS> GS> T> Vitalij, VS> GS> T> VS> GS> T> although the first patch should fixup the panic, can you please VS> GS> T> instead run this one. And if it is possible, can you please VS> GS> T> monitor this sysctl: VS> GS> T> VS> GS> T> sysctl kern.ipc.sf_long_headers VS> GS> T> VS> GS> T> VS> GS> T> -- VS> GS> T> Totus tuus, Glebius. VS> GS> VS> GS> T> Index: sys/kern/kern_descrip.c VS> GS> T> === VS> GS> T> --- sys/kern/kern_descrip.c (revision 297217) VS> GS> T> +++ sys/kern/kern_descrip.c (working copy) VS> GS> T> @@ -3958,7 +3958,7 @@ badfo_chown(struct file *fp, uid_t uid, gid_t gid, VS> GS> T> static int VS> GS> T> badfo_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, VS> GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, VS> GS> T> -int kflags, struct thread *td) VS> GS> T> +struct thread *td) VS> GS> T> { VS> GS> T> VS> GS> T> return (EBADF); VS> GS> T> @@ -4044,7 +4044,7 @@ invfo_chown(struct file *fp, uid_t uid, gid_t gid, VS> GS> T> int VS> GS> T> invfo_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, VS> GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, VS> GS> T> -int kflags, struct thread *td) VS> GS> T> +struct thread *td) VS> GS> T> { VS> GS> T> VS> GS> T> return (EINVAL); VS> GS> T> Index: sys/kern/kern_sendfile.c VS> GS> T> === VS> GS> T> --- sys/kern/kern_sendfile.c (revision 297217) VS> GS> T> +++ sys/kern/kern_sendfile.c (working copy) VS> GS> T> @@ -95,6 +95,7 @@ struct sendfile_sync { VS> GS> T> }; VS> GS> T> VS> GS> T> counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)]; VS> GS> T> +static counter_u64_t sf_long_headers; /* QQQGL */ VS> GS> T> VS> GS> T> static void VS> GS> T> sfstat_init(const void *unused) VS> GS> T> @@ -102,6 +103,7 @@ sfstat_init(const void *unused) VS> GS> T> VS> GS> T> COUNTER_ARRAY_ALLOC(sfstat, sizeof(struct sfstat) / sizeof(uint64_t), VS> GS> T> M_WAITOK); VS> GS> T> +sf_long_headers = counter_u64_alloc(M_WAITOK); /* QQQGL */ VS> GS> T> } VS> GS> T> SYSINIT(sfstat, SI_SUB_MBUF, SI_ORDER_FIRST, sfstat_init, NULL); VS> GS> T> VS> GS> T> @@ -117,6 +119,8 @@ sfstat_sysctl(SYSCTL_HANDLER_ARGS) VS> GS> T> } VS> GS> T> SYSCTL_PROC(_kern_ipc, OID_AUTO, sfstat, CTLTYPE_OPAQUE | CTLFLAG_RW, VS> GS> T> NULL, 0, sfstat_sysctl, "I", "sendfile statistics"); VS> GS> T> +SYSCTL_COUNTER_U64(_kern_ipc, OID_AUTO, sf_long_headers, CTLFLAG_RW, VS> GS> T> +&sf_long_headers, "times headers did not fit into socket buffer"); VS> GS> T> VS> GS> T> /* VS> GS> T> * Detach mapped page and release resources back to the system. Called VS> GS> T> @@ -516,7 +520,7 @@ sendfile_getsock(struct thread *td, int s, struct VS> GS> T> int VS> GS> T> vn_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio, VS> GS> T> struct uio *trl_uio, off_t offset, size_t nbytes, off_t *sent, int flags, VS> GS> T> -int kflags, struct thread *td) VS> GS> T> +struct thread *td) VS> GS> T>
Patch from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=179721 broke some application (xterm, pidign)
Hello. After updating my system to 11.0-ALPHA2 #20 r301583 I'm found that at last some application is broken. here backtrace for xterm #0 0x0008022d48b4 in mbsrtowcs_l () from /lib/libc.so.7 [New Thread 804816000 (LWP 102346/)] (gdb) bt #0 0x0008022d48b4 in mbsrtowcs_l () from /lib/libc.so.7 #1 0x0008022d1b4f in strcoll_l () from /lib/libc.so.7 #2 0x0008022d0ddf in __collate_range_cmp () from /lib/libc.so.7 #3 0x0008022cf6ce in vfscanf () from /lib/libc.so.7 #4 0x0008022b0114 in vsscanf () from /lib/libc.so.7 #5 0x0008022aee6d in sscanf () from /lib/libc.so.7 #6 0x004523a3 in ?? () #7 0x00430edd in ?? () for pidgin it's look same. It seems that patch not fully care about all cases where function like __collate_range_cmp used. Manualy rollback changes from http://svnweb.freebsd.org/base?view=revision&revision=301461 fix the problem for now. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
lam broken in 313938 ?
Hello Found that after 313938 (Capsicum-ize lam) it's doesn't work. portsnap auto Looking up portsnap.FreeBSD.org mirrors... 6 mirrors found. Fetching snapshot tag from your-org.portsnap.freebsd.org... done. Fetching snapshot metadata... done. Updating from Thu Feb 16 11:34:22 EET 2017 to Tue Feb 21 08:57:00 EET 2017. Fetching 5 metadata patches.lam: unable to limit stdio: Capabilities insufficient done. Applying metadata patches... done. Fetching 5 metadata files... lam: unable to limit stdio: Capabilities insufficient /usr/sbin/portsnap: cannot open 789d9ed1b338af92d7dfd15adeebe34ecf15455ff60ca989ca07dea13d1fed8b.gz: No such file or directory metadata is corrupt. Checked on few machines with current on board.. Any suggestion ? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
dhclient cause up/down cycle after 239356 ?
Hi all, After last update my home machine begin doin some strange things - Aug 21 08:28:25 home kernel: fxp0: link state changed to UP Aug 21 08:28:25 home kernel: fxp0: link state changed to DOWN Aug 21 08:28:27 home kernel: fxp0: link state changed to UP Aug 21 08:28:33 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 21 08:28:33 home kernel: fxp0: link state changed to DOWN Aug 21 08:28:33 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 21 08:28:33 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 21 08:28:33 home dhclient: New Routers (fxp0): xx.xx.xx.1 Aug 21 08:28:33 home dhclient[1395]: Interface fxp0 is down, dhclient exiting Aug 21 08:28:33 home dhclient[1339]: connection closed Aug 21 08:28:33 home dhclient[1339]: exiting. Aug 21 08:28:35 home kernel: fxp0: link state changed to UP Aug 21 08:28:35 home kernel: fxp0: link state changed to DOWN Aug 21 08:28:37 home kernel: fxp0: link state changed to UP Aug 21 08:28:40 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 21 08:28:40 home kernel: fxp0: link state changed to DOWN Aug 21 08:28:40 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 21 08:28:40 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 21 08:28:40 home dhclient: New Routers (fxp0): xx.xx.xx.1 Aug 21 08:28:40 home dhclient[1519]: Interface fxp0 is down, dhclient exiting Aug 21 08:28:40 home dhclient[1465]: connection closed Aug 21 08:28:40 home dhclient[1465]: exiting. Aug 21 08:28:42 home kernel: fxp0: link state changed to UP Aug 21 08:28:42 home kernel: fxp0: link state changed to DOWN Aug 21 08:28:44 home kernel: fxp0: link state changed to UP Aug 21 08:28:48 home dhclient: New IP Address (fxp0): xx.xx.xx.xx I have next configuration in rc.conf - ifconfig_fxp0="SYNCDHCP" in /etc/dhclient.conf interface "fxp0" { supersede domain-name "home"; supersede domain-name-servers 127.0.0.1; } Also /etc/start_if.fxp0 With content - ifconfig fxp0 ether xx:xx:xx:xx:xx:xx ifconfig fxp0 -tso ifconfig fxp0 polling And yes, no problem with static ip on fxp0 and no up/down sequence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dhclient cause up/down cycle after 239356 ?
Garrett Cooper wrote: GC> On Tue, Aug 21, 2012 at 2:55 AM, Vitalij Satanivskij wrote: GC> > Hi all, GC> > GC> > After last update my home machine begin doin some strange things - GC> GC> ... GC> GC> Try reverting r239356 -- if that works, then please let jhb@ know. GC> -Garrett Yes i'm revert it and everything is ok. Look's like dhclient do down/up sequence - Aug 21 19:21:00 home kernel: fxp0: link state changed to UP Aug 21 19:21:01 home kernel: fxp0: link state changed to DOWN Aug 21 19:21:01 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 21 19:21:01 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 21 19:21:01 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.xx Aug 21 19:21:01 home dhclient: New Routers (fxp0): xx.xx.xx.xx Aug 21 19:21:03 home kernel: fxp0: link state changed to UP and in r239356 when iface down dhclient exiting then iface become up, dhclient staring, get adress, bring iface down (why?) and exit. Before r239356 iface just doing down/up without dhclient exit and everything work fine. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dhclient cause up/down cycle after 239356 ?
ok next round :) dhclient updated to Revision 239564 with fxp : Aug 22 20:06:48 home kernel: fxp0: link state changed to DOWN Aug 22 20:06:48 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 22 20:06:48 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 22 20:06:48 home dhclient: New Routers (fxp0): xx.xx.xx.1 Aug 22 20:06:50 home kernel: fxp0: link state changed to UP Aug 22 20:06:53 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 22 20:06:53 home kernel: fxp0: link state changed to DOWN Aug 22 20:06:53 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 22 20:06:53 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 22 20:06:53 home dhclient: New Routers (fxp0): xx.xx.xx.xx Aug 22 20:06:55 home kernel: fxp0: link state changed to UP Aug 22 20:07:01 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 22 20:07:01 home kernel: fxp0: link state changed to DOWN Aug 22 20:07:01 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 22 20:07:01 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 22 20:07:01 home dhclient: New Routers (fxp0): xx.xx.xx.xx Aug 22 20:07:03 home kernel: fxp0: link state changed to UP Aug 22 20:07:07 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 22 20:07:07 home kernel: fxp0: link state changed to DOWN Aug 22 20:07:07 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 22 20:07:07 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 22 20:07:07 home dhclient: New Routers (fxp0): xx.xx.xx.xx Aug 22 20:07:09 home kernel: fxp0: link state changed to UP Aug 22 20:07:13 home dhclient: New IP Address (fxp0): xx.xx.xx.xx Aug 22 20:07:13 home kernel: fxp0: link state changed to DOWN Aug 22 20:07:13 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 Aug 22 20:07:13 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.255 Aug 22 20:07:13 home dhclient: New Routers (fxp0): xx.xx.xx.xx Aug 22 20:07:15 home kernel: fxp0: link state changed to UP ifconfig show that iface doesn't loose ip adress but, link realy loosed (for example 10 from icmp pachets cannot reach destination) Yes, my problem easy fixed by changed ethernet card to em, but there are meny motherboard with integrated ether's... YongHyeon PYUN wrote: YP> On Wed, Aug 22, 2012 at 08:27:01AM +1000, Peter Jeremy wrote: YP> > On 2012-Aug-21 19:42:17 +0300, Vitalij Satanivskij wrote: YP> > >Look's like dhclient do down/up sequence - YP> > YP> > Not intentionally. YP> > YP> > >Aug 21 19:21:00 home kernel: fxp0: link state changed to UP YP> > >Aug 21 19:21:01 home kernel: fxp0: link state changed to DOWN YP> > >Aug 21 19:21:01 home dhclient: New IP Address (fxp0): xx.xx.xx.xx YP> > >Aug 21 19:21:01 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 YP> > >Aug 21 19:21:01 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.xx YP> > >Aug 21 19:21:01 home dhclient: New Routers (fxp0): xx.xx.xx.xx YP> > >Aug 21 19:21:03 home kernel: fxp0: link state changed to UP YP> > YP> > I can reproduce this behaviour - but only on fxp (i82559 in my case) YP> > NICs. My bge (BCM5750) and rl (RTL8139) NICs do not report the YP> > spurious DOWN/UP. (I don't normally run DHCP on any fxp interfaces, YP> > so I didn't see it during my testing). YP> > YP> > The problem appears to be the YP> > $IFCONFIG $interface inet alias 0.0.0.0 netmask 255.0.0.0 broadcast 255.255.255.255 up YP> > executed by /sbin/dhclient-script during PREINIT. This is making the YP> > fxp NIC reset the link (actually, assigning _any_ IP address to an fxp YP> > NIC causes it to reset the link). The post r239356 dhclient detects YP> YP> This comes from the hardware limitation. Assigning addresses will YP> result in programming multicast filter and fxp(4) controllers YP> require full controller reset to reprogram the multicast filter. YP> YP> > the link going down and exits. YP> > YP> > >Before r239356 iface just doing down/up without dhclient exit and YP> > >everything work fine. YP> > YP> > For you, anyway. Failing to detect link down causes problems for me YP> > because my dhclient was not seeing my cable-modem resets and therefore YP> > failing to reacquire a DHCP lease. YP> > YP> > -- YP> > Peter Jeremy YP> YP> YP> ___ YP> freebsd-current@freebsd.org mailing list YP> http://lists.freebsd.org/mailman/listinfo/freebsd-current YP> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic after r244584
Hello. After upgrading server from old hardware/software to freebsd current (## SVN ## Exported commit - http://svnweb.freebsd.org/changeset/base/245479), system hung's with message - panic: make_dev_alias_v: bad si_name (error=22 si_name=enc@n5003048000bab37d/tpe0/slot@1/elmdesc@Slot 01/pass7) Screen shot can be found here - http://quad.org.ua/IMAG0055.jpg Old system was 8.2-STABLE FreeBSD 8.2-STABLE All hardware, except drive from massive, was changed. Drives has gpt label's and belong to zfs pool. For now we back to more old version of freebsd (HEAD dated Nov 26 2012). As I undertand problem with white space in si_name, so - are any way to fix this problem?. Intresting that 3 another servers with same version freebsd and hardware was modified without problem's; Another question - how/where I can see si_name for my devices on running system? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Alexander Motin wrote: AM> On 18.01.2013 11:44, Gleb Smirnoff wrote: AM> > On Fri, Jan 18, 2013 at 09:36:00AM +0200, Vitalij Satanivskij wrote: AM> > V> After upgrading server from old hardware/software to freebsd current (## SVN ## Exported commit - http://svnweb.freebsd.org/changeset/base/245479), AM> > V> system hung's with message - AM> > V> panic: make_dev_alias_v: bad si_name (error=22 si_name=enc@n5003048000bab37d/tpe0/slot@1/elmdesc@Slot 01/pass7) AM> > AM> > EINVAL (22) is caused by space character in the si_name: AM> > AM> > si_name=enc@n5003048000bab37d/tpe0/slot@1/elmdesc@Slot 01/pass7 AM> > AM> > I think Alexander (in Cc) has idea on why did that happen and how AM> > should that be fixed. AM> AM> The panic is triggered by the check added by the recent r244584 change. AM> The space in device name came from the enclosure device, and I guess it AM> may be quite often situation. Using human readable name supposed to help AM> system administrators, but with spaces banned that may be a problem. AM> That's was not created by human, it was generated (I think so) by system. May be problem not in r244584 at all but in incorect generation of the si_name ? More info drive (actualy drives, all 36 have same problem) inserted in backplane on supermicro chasis with "LSI CORP SAS2X36 0417" on board. All of them attached to lsi sas 9211-4i controler in HBA mode. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Jaakko Heinonen wrote: JH> On 2013-01-18, Alexander Motin wrote: JH> > > AM> > V> panic: make_dev_alias_v: bad si_name (error=22 si_name=enc@n5003048000bab37d/tpe0/slot@1/elmdesc@Slot 01/pass7) JH> JH> > > AM> The panic is triggered by the check added by the recent r244584 change. JH> > > AM> The space in device name came from the enclosure device, and I guess it JH> > > AM> may be quite often situation. Using human readable name supposed to help JH> > > AM> system administrators, but with spaces banned that may be a problem. JH> > > JH> > > That's was not created by human, it was generated (I think so) by system. JH> > JH> > These strings are flashed into enclosure firmware by manufacturer. JH> JH> You can't rely on that any string can be safely used as a device name JH> even if spaces were allowed. Consider for example duplicate names and JH> "../". JH> JH> Where these names are generated? The original report didn't contain a JH> backtrace. Yes. No backtrace, because of switching off all debuging in kernel. For now I can't use that's server for testing, but there are another servers waiting for upgrade. I will try to reproduce problem with kernel debuger enabled. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
May be just do sanitizing for elmpriv->descr? something like change whitespace to "_" or just delete it? Vitalij Satanivskij wrote: VS> Jaakko Heinonen wrote: VS> JH> On 2013-01-18, Alexander Motin wrote: VS> JH> > > AM> > V> panic: make_dev_alias_v: bad si_name (error=22 si_name=enc@n5003048000bab37d/tpe0/slot@1/elmdesc@Slot 01/pass7) VS> JH> VS> JH> > > AM> The panic is triggered by the check added by the recent r244584 change. VS> JH> > > AM> The space in device name came from the enclosure device, and I guess it VS> JH> > > AM> may be quite often situation. Using human readable name supposed to help VS> JH> > > AM> system administrators, but with spaces banned that may be a problem. VS> JH> > > VS> JH> > > That's was not created by human, it was generated (I think so) by system. VS> JH> > VS> JH> > These strings are flashed into enclosure firmware by manufacturer. VS> JH> VS> JH> You can't rely on that any string can be safely used as a device name VS> JH> even if spaces were allowed. Consider for example duplicate names and VS> JH> "../". VS> JH> VS> JH> Where these names are generated? The original report didn't contain a VS> JH> backtrace. VS> VS> Yes. No backtrace, because of switching off all debuging in kernel. VS> VS> For now I can't use that's server for testing, but there are another servers waiting for upgrade. VS> VS> I will try to reproduce problem with kernel debuger enabled. VS> VS> VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Alexander Motin wrote: AM> On 18.01.2013 15:49, Vitalij Satanivskij wrote: AM> > May be just do sanitizing for elmpriv->descr? AM> > AM> > something like change whitespace to "_" or just delete it? AM> AM> Yes, that is not difficult. The only question is how to stay consistent, AM> compatible, user-readable. AM> Ok, now I have kernel dump kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: da0 at mps0 bus 0 scbus7 target 8 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C) panic: make_dev_alias_v: bad si_name (error=22, si_name=enc@n5003048000baa87d/type@0/slot@a/elmdesc@Slot 10/pass7) cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff9b9ec84760 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff9b9ec84810 vpanic() at vpanic+0x127/frame 0xff9b9ec84850 panic() at panic+0x43/frame 0xff9b9ec848b0 make_dev_alias_v() at make_dev_alias_v+0x1d0/frame 0xff9b9ec84900 make_dev_alias_p() at make_dev_alias_p+0x37/frame 0xff9b9ec84960 make_dev_physpath_alias() at make_dev_physpath_alias+0x14a/frame 0xff9b9ec849c0 pass_add_physpath() at pass_add_physpath+0xbd/frame 0xff9b9ec849f0 taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xff9b9ec84a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x6c/frame 0xff9b9ec84a70 fork_exit() at fork_exit+0x84/frame 0xff9b9ec84ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xff9b9ec84ab0 --- trap 0, rip = 0, rsp = 0xff9b9ec84b70, rbp = 0 --- KDB: enter: panic ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Vitalij Satanivskij wrote: VS> Alexander Motin wrote: VS> AM> On 18.01.2013 15:49, Vitalij Satanivskij wrote: VS> AM> > May be just do sanitizing for elmpriv->descr? VS> AM> > VS> AM> > something like change whitespace to "_" or just delete it? VS> AM> VS> AM> Yes, that is not difficult. The only question is how to stay consistent, VS> AM> compatible, user-readable. VS> AM> VS> VS> Ok, now I have kernel dump VS> VS> kgdb /boot/kernel/kernel vmcore.0 VS> GNU gdb 6.1.1 [FreeBSD] VS> Copyright 2004 Free Software Foundation, Inc. VS> GDB is free software, covered by the GNU General Public License, and you are VS> welcome to change it and/or distribute copies of it under certain conditions. VS> Type "show copying" to see the conditions. VS> There is absolutely no warranty for GDB. Type "show warranty" for details. VS> This GDB was configured as "amd64-marcel-freebsd"... VS> VS> Unread portion of the kernel message buffer: VS> da0 at mps0 bus 0 scbus7 target 8 lun 0 VS> da0: Fixed Direct Access SCSI-6 device VS> da0: 300.000MB/s transfers VS> da0: Command Queueing enabled VS> da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C) VS> panic: make_dev_alias_v: bad si_name (error=22, si_name=enc@n5003048000baa87d/type@0/slot@a/elmdesc@Slot 10/pass7) VS> cpuid = 0 VS> KDB: stack backtrace: VS> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff9b9ec84760 VS> kdb_backtrace() at kdb_backtrace+0x39/frame 0xff9b9ec84810 VS> vpanic() at vpanic+0x127/frame 0xff9b9ec84850 VS> panic() at panic+0x43/frame 0xff9b9ec848b0 VS> make_dev_alias_v() at make_dev_alias_v+0x1d0/frame 0xff9b9ec84900 VS> make_dev_alias_p() at make_dev_alias_p+0x37/frame 0xff9b9ec84960 VS> make_dev_physpath_alias() at make_dev_physpath_alias+0x14a/frame 0xff9b9ec849c0 VS> pass_add_physpath() at pass_add_physpath+0xbd/frame 0xff9b9ec849f0 VS> taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xff9b9ec84a40 VS> taskqueue_thread_loop() at taskqueue_thread_loop+0x6c/frame 0xff9b9ec84a70 VS> fork_exit() at fork_exit+0x84/frame 0xff9b9ec84ab0 VS> fork_trampoline() at fork_trampoline+0xe/frame 0xff9b9ec84ab0 VS> --- trap 0, rip = 0, rsp = 0xff9b9ec84b70, rbp = 0 --- VS> KDB: enter: panic VS> VS> And of couse (kgdb) bt #0 doadump (textdump=0) at pcpu.h:229 #1 0x8034002e in db_dump (dummy=, dummy2=0, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:543 #2 0x8033fada in db_command (last_cmdp=, cmd_table=, dopager=1) at /usr/src/sys/ddb/db_command.c:449 #3 0x8033f892 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 #4 0x80342240 in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:231 #5 0x808b9753 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x80c0d3b8 in trap (frame=0xff9b9ec84740) at /usr/src/sys/amd64/amd64/trap.c:579 #7 0x80bf6512 in calltrap () at exception.S:228 #8 0x808b8f3e in kdb_enter (why=0x80e7adb1 "panic", msg=) at cpufunc.h:63 #9 0x80885a47 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:746 #10 0x80885ab3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:682 #11 0x8083add0 in make_dev_alias_v (flags=, cdev=0xfe0031b78cd0, pdev=, fmt=, ap=0xff9b9ec84940) at /usr/src/sys/kern/kern_conf.c:925 #12 0x8083ae27 in make_dev_alias_p (flags=-1631041792, cdev=0x80, pdev=0x80e72a0a, fmt=0x80 ) at /usr/src/sys/kern/kern_conf.c:968 #13 0x8083af7a in make_dev_physpath_alias (flags=8, cdev=0xfe0031b78cd0, pdev=0xfe042bb8f000, old_alias=0x0, physpath=) at /usr/src/sys/kern/kern_conf.c:1025 #14 0x80308b7d in pass_add_physpath (context=0xfe04fe563a00, pending=) at /usr/src/sys/cam/scsi/scsi_pass.c:258 #15 0x808c8050 in taskqueue_run_locked (queue=0xfe002fddf800) at /usr/src/sys/kern/subr_taskqueue.c:312 #16 0x808c87ec in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:501 #17 0x80855444 in fork_exit (callout=0x808c8780 , arg=0x81502690, frame=0xff9b9ec84ac0) at /usr/src/sys/kern/kern_fork.c:991 #18 0x80bf6a4e in fork_trampoline () at exception.S:602 #19 0x in ?? () Current language: auto; currently minimal (kgdb) what next I can do to investigate problem? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Jaakko Heinonen wrote: JH> On 2013-01-18, Alexander Motin wrote: JH> > At cam/scsi/ses_set_physpath.c ses_set_physpath(). Duplicate names are JH> > impossible there, as previous name components are unique. Special JH> > characters haven't yet seen, but I think theoretically possible. JH> JH> I see two possible solutions for the problem. JH> JH> 1) Replace non-printable, space and '/' characters for example with '_'. JH>'/' should be replaced anyway. JH> JH> 2) Apply the patches in JH>http://lists.freebsd.org/pipermail/svn-src-all/2013-January/063661.html JH>to allow spaces again. I haven't committed the patches because I JH>think that there isn't full consensus that it's right thing to do and JH>also I personally prefer not to have spaces in device names. After patch was applied, problem is gone away. da0 at mps0 bus 0 scbus7 target 8 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C) ses1: da0,pass7: Element descriptor: 'Slot 08' ses1: da0,pass7: SAS Device Slot Element: 1 Phys at Slot 7 ses1: phy 0: SATA device ses1: phy 0: parent 5003048000baa87f addr 5003048000baa853 JH> JH> -- JH> Jaakko JH> ___ JH> freebsd-current@freebsd.org mailing list JH> http://lists.freebsd.org/mailman/listinfo/freebsd-current JH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Jaakko Heinonen wrote: JH> On 2013-01-19, Jaakko Heinonen wrote: JH> > On 2013-01-18, Alexander Motin wrote: JH> > > At cam/scsi/ses_set_physpath.c ses_set_physpath(). Duplicate names are JH> > > impossible there, as previous name components are unique. Special JH> > > characters haven't yet seen, but I think theoretically possible. JH> > JH> > I see two possible solutions for the problem. JH> > JH> > 1) Replace non-printable, space and '/' characters for example with '_'. JH> >'/' should be replaced anyway. JH> > JH> > 2) Apply the patches in JH> >http://lists.freebsd.org/pipermail/svn-src-all/2013-January/063661.html JH> >to allow spaces again. I haven't committed the patches because I JH> >think that there isn't full consensus that it's right thing to do and JH> >also I personally prefer not to have spaces in device names. JH> JH> Here's a patch to implement 1: JH> JH> http://people.freebsd.org/~jh/patches/scsi_enc_ses-si_name.diff Ok that patch work's too. ses1: da0,pass5,probe8: Element descriptor: 'Slot 08' ses1: da0,pass5,probe8: SAS Device Slot Element: 1 Phys at Slot 7 ses1: phy 0: SATA device ses1: phy 0: parent 5003048000baa87f addr 5003048000baa853 da0 at mps0 bus 0 scbus7 target 8 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Vitalij Satanivskij wrote: VS> Jaakko Heinonen wrote: VS> JH> On 2013-01-19, Jaakko Heinonen wrote: VS> JH> > On 2013-01-18, Alexander Motin wrote: VS> JH> > > At cam/scsi/ses_set_physpath.c ses_set_physpath(). Duplicate names are VS> JH> > > impossible there, as previous name components are unique. Special VS> JH> > > characters haven't yet seen, but I think theoretically possible. VS> JH> > VS> JH> > I see two possible solutions for the problem. VS> JH> > VS> JH> > 1) Replace non-printable, space and '/' characters for example with '_'. VS> JH> >'/' should be replaced anyway. VS> JH> > VS> JH> > 2) Apply the patches in VS> JH> > http://lists.freebsd.org/pipermail/svn-src-all/2013-January/063661.html VS> JH> >to allow spaces again. I haven't committed the patches because I VS> JH> >think that there isn't full consensus that it's right thing to do and VS> JH> >also I personally prefer not to have spaces in device names. VS> JH> VS> JH> Here's a patch to implement 1: VS> JH> VS> JH> http://people.freebsd.org/~jh/patches/scsi_enc_ses-si_name.diff VS> VS> Ok that patch work's too. VS> VS> ses1: da0,pass5,probe8: Element descriptor: 'Slot 08' VS> ses1: da0,pass5,probe8: SAS Device Slot Element: 1 Phys at Slot 7 VS> ses1: phy 0: SATA device VS> ses1: phy 0: parent 5003048000baa87f addr 5003048000baa853 VS> da0 at mps0 bus 0 scbus7 target 8 lun 0 VS> da0: Fixed Direct Access SCSI-6 device VS> da0: 300.000MB/s transfers VS> da0: Command Queueing enabled VS> da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C) Is there any chance, that one of this patches will be merged to head? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic after r244584
Jaakko Heinonen wrote: JH> On 2013-01-23, Vitalij Satanivskij wrote: JH> > VS> JH> http://people.freebsd.org/~jh/patches/scsi_enc_ses-si_name.diff JH> > VS> JH> > VS> Ok that patch work's too. JH> > JH> > Is there any chance, that one of this patches will be merged to head? JH> JH> Committed as r245891. Thanks for reporting and testing! JH> Thank you all for the quick help in solving the problem. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Data on pool have compressratio around 1.4 On diferent servers with same data type and load L2 ARC Size: (Adaptive) can be diferent for example 1.04TiB vs 1.45TiB But it's all have same porblem - grow in time. More stange for us - ARC: 80G Total, 4412M MFU, 5040M MRU, 76M Anon, 78G Header, 2195M Other 78G header size and ubnormal - kstat.zfs.misc.arcstats.l2_cksum_bad: 210920592 kstat.zfs.misc.arcstats.l2_io_error: 7362414 sysctl's growing avery second. All part's of server (as hardware part's) in in normal state. After reboot no problem's for some period untile cache size grow to some limit. Mark Felder wrote: MF> On Mon, Oct 7, 2013, at 13:09, Dmitriy Makarov wrote: MF> > MF> > How can L2 ARC Size: (Adaptive) be 1.44 TiB (up) with total physical size MF> > of L2ARC devices 490GB? MF> > MF> MF> http://svnweb.freebsd.org/base?view=revision&revision=251478 MF> MF> L2ARC compression perhaps? MF> ___ MF> freebsd-current@freebsd.org mailing list MF> http://lists.freebsd.org/mailman/listinfo/freebsd-current MF> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
One more question - we have two counter - kstat.zfs.misc.arcstats.l2_size: 1256609410560 kstat.zfs.misc.arcstats.l2_asize: 1149007667712 can anybody explain how to understand them i.e. l2_asize - real used space on l2arc an l2_size - uncompressed size, or maybe something else ? Vitalij Satanivskij wrote: VS> VS> Data on pool have compressratio around 1.4 VS> VS> On diferent servers with same data type and load L2 ARC Size: (Adaptive) can be diferent VS> VS> for example 1.04TiB vs 1.45TiB VS> VS> But it's all have same porblem - grow in time. VS> VS> VS> More stange for us - VS> VS> ARC: 80G Total, 4412M MFU, 5040M MRU, 76M Anon, 78G Header, 2195M Other VS> VS> 78G header size and ubnormal - VS> VS> kstat.zfs.misc.arcstats.l2_cksum_bad: 210920592 VS> kstat.zfs.misc.arcstats.l2_io_error: 7362414 VS> VS> sysctl's growing avery second. VS> VS> All part's of server (as hardware part's) in in normal state. VS> VS> After reboot no problem's for some period untile cache size grow to some limit. VS> VS> VS> VS> Mark Felder wrote: VS> MF> On Mon, Oct 7, 2013, at 13:09, Dmitriy Makarov wrote: VS> MF> > VS> MF> > How can L2 ARC Size: (Adaptive) be 1.44 TiB (up) with total physical size VS> MF> > of L2ARC devices 490GB? VS> MF> > VS> MF> VS> MF> http://svnweb.freebsd.org/base?view=revision&revision=251478 VS> MF> VS> MF> L2ARC compression perhaps? VS> MF> ___ VS> MF> freebsd-current@freebsd.org mailing list VS> MF> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> MF> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Assertion in zdb?
Hello. System - 10.0-CURRENT FreeBSD 10.0-CURRENT #2 r255173 While trying to get some statistics from zdb zdb -dd disk1 > stat.log get some assertion: Assertion failed: object_count == usedobjs (0x85727 == 0x3aa93d), file /usr/src/cddl/usr.sbin/zdb/../../../cddl/contrib/opensolaris/cmd/zdb/zdb.c, line 1767. zsh: abort (core dumped) zdb -dd disk1 > stat.log Maybe somebody have any idea about what's it's can be and how big problem it's (or not a problem at all)? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Same situation hapend yesterday again :( What's confuse me while trying to understend where I'm wrong Firt some info. We have zfs pool "POOL" and one more zfs on it "POOL/zfs" POOL - have only primarycache enabled "ALL" POOL/zfs - have both primay and secondary for "ALL" POOL have compression=lz4 POOL/zfs have none POOL - have around 9TB data POOL/zfs - have 1TB Secondary cache have configuration - cache gpt/cache0ONLINE 0 0 0 gpt/cache1ONLINE 0 0 0 gpt/cache2ONLINE 0 0 0 gpt/cache0-2 it's intel sdd SSDSC2BW180A4 180gb So full real size for l2 is 540GB (realy 489gb) First question - data on l2arc will be compressed on not? Second in stats we see L2 ARC Size: (Adaptive) 2.08TiB eary it was 1.1 1.4 ... So a) how cache can be biger than zfs it self b) in case it's not compressed (answer for first question) how it an be biger than real ssd size? one more coment if l2 arc size grove above phisical sizes I se next stats kstat.zfs.misc.arcstats.l2_cksum_bad: 50907344 kstat.zfs.misc.arcstats.l2_io_error: 4547377 and growing. System is r255173 with patch from rr255173 At last maybe somebody have any ideas what's realy hapend... Vitalij Satanivskij wrote: VS> VS> One more question - VS> VS> we have two counter - VS> VS> kstat.zfs.misc.arcstats.l2_size: 1256609410560 VS> kstat.zfs.misc.arcstats.l2_asize: 1149007667712 VS> VS> can anybody explain how to understand them i.e. l2_asize - real used space on l2arc an l2_size - uncompressed size, VS> VS> or maybe something else ? VS> VS> VS> VS> Vitalij Satanivskij wrote: VS> VS> VS> VS> Data on pool have compressratio around 1.4 VS> VS> VS> VS> On diferent servers with same data type and load L2 ARC Size: (Adaptive) can be diferent VS> VS> VS> VS> for example 1.04TiB vs 1.45TiB VS> VS> VS> VS> But it's all have same porblem - grow in time. VS> VS> VS> VS> VS> VS> More stange for us - VS> VS> VS> VS> ARC: 80G Total, 4412M MFU, 5040M MRU, 76M Anon, 78G Header, 2195M Other VS> VS> VS> VS> 78G header size and ubnormal - VS> VS> VS> VS> kstat.zfs.misc.arcstats.l2_cksum_bad: 210920592 VS> VS> kstat.zfs.misc.arcstats.l2_io_error: 7362414 VS> VS> VS> VS> sysctl's growing avery second. VS> VS> VS> VS> All part's of server (as hardware part's) in in normal state. VS> VS> VS> VS> After reboot no problem's for some period untile cache size grow to some limit. VS> VS> VS> VS> VS> VS> VS> VS> Mark Felder wrote: VS> VS> MF> On Mon, Oct 7, 2013, at 13:09, Dmitriy Makarov wrote: VS> VS> MF> > VS> VS> MF> > How can L2 ARC Size: (Adaptive) be 1.44 TiB (up) with total physical size VS> VS> MF> > of L2ARC devices 490GB? VS> VS> MF> > VS> VS> MF> VS> VS> MF> http://svnweb.freebsd.org/base?view=revision&revision=251478 VS> VS> MF> VS> VS> MF> L2ARC compression perhaps? VS> VS> MF> ___ VS> VS> MF> freebsd-current@freebsd.org mailing list VS> VS> MF> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> VS> MF> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> VS> ___ VS> VS> freebsd-current@freebsd.org mailing list VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Assertion in zdb?
Hello. Yes, load on machine (on fs) is very extensive. Ok thank you for usefull information Richard Todd wrote: RT> Vitalij Satanivskij writes: RT> RT> > Hello. RT> > RT> > System - 10.0-CURRENT FreeBSD 10.0-CURRENT #2 r255173 RT> > RT> > While trying to get some statistics from zdb RT> > RT> > zdb -dd disk1 > stat.log RT> > RT> > get some assertion: RT> > RT> > Assertion failed: object_count == usedobjs (0x85727 == 0x3aa93d), file /usr/src/cddl/usr.sbin/zdb/../../../cddl/contrib/opensolaris/cmd/zdb/zdb.c, line 1767. RT> > zsh: abort (core dumped) zdb -dd disk1 > stat.log RT> > RT> > Maybe somebody have any idea about what's it's can be and how big problem it's (or not a problem at all)? RT> RT> Probably not a problem unless it happens reliably when you try it multiple RT> times. Since zdb looks at the raw disks, if the filesystem/zpool is active, RT> zdb can easily read bits of the zpool metadata off the disks at different RT> times and thus see an inconsistent state. Hence trying to get stats out of RT> zdb always carries a certain risk of not working. RT> RT> ___ RT> freebsd-current@freebsd.org mailing list RT> http://lists.freebsd.org/mailman/listinfo/freebsd-current RT> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Hm, another strange thing's on another server - zfs-stats -L ZFS Subsystem ReportThu Oct 10 12:56:54 2013 L2 ARC Summary: (DEGRADED) Passed Headroom:8.34m Tried Lock Failures:145.66m IO In Progress: 9.76m Low Memory Aborts: 526 Free on Write: 1.70m Writes While Full: 29.28k R/W Clashes:341.30k Bad Checksums: 865.91k IO Errors: 44.19k SPA Mismatch: 32.03m L2 ARC Size: (Adaptive) 189.28 GiB Header Size:4.88% 9.24GiB Looks like size have nothing similar with IO errors. So question is - when error's like Bad Checksums and IO Errors can hapend? Look's like no hardware problem's. All ssd atached to onboard intel sata controler (Motherboard is Supermicro X9SRL-F) Vitalij Satanivskij wrote: VS> Same situation hapend yesterday again :( VS> VS> What's confuse me while trying to understend where I'm wrong VS> VS> VS> Firt some info. VS> VS> We have zfs pool "POOL" and one more zfs on it "POOL/zfs" VS> VS> POOL - have only primarycache enabled "ALL" VS> POOL/zfs - have both primay and secondary for "ALL" VS> VS> POOL have compression=lz4 VS> VS> POOL/zfs have none VS> VS> VS> POOL - have around 9TB data VS> VS> POOL/zfs - have 1TB VS> VS> Secondary cache have configuration - VS> VS> cache VS> gpt/cache0ONLINE 0 0 0 VS> gpt/cache1ONLINE 0 0 0 VS> gpt/cache2ONLINE 0 0 0 VS> VS> gpt/cache0-2 it's intel sdd SSDSC2BW180A4 180gb VS> VS> So full real size for l2 is 540GB (realy 489gb) VS> VS> First question - data on l2arc will be compressed on not? VS> VS> Second in stats we see VS> VS> L2 ARC Size: (Adaptive) 2.08TiB VS> VS> eary it was 1.1 1.4 ... VS> VS> So a) how cache can be biger than zfs it self VS>b) in case it's not compressed (answer for first question) how it an be biger than real ssd size? VS> VS> VS> one more coment if l2 arc size grove above phisical sizes I se next stats VS> VS> kstat.zfs.misc.arcstats.l2_cksum_bad: 50907344 VS> kstat.zfs.misc.arcstats.l2_io_error: 4547377 VS> VS> and growing. VS> VS> VS> System is r255173 with patch from rr255173 VS> VS> VS> At last maybe somebody have any ideas what's realy hapend... VS> VS> VS> VS> VS> VS> Vitalij Satanivskij wrote: VS> VS> VS> VS> One more question - VS> VS> VS> VS> we have two counter - VS> VS> VS> VS> kstat.zfs.misc.arcstats.l2_size: 1256609410560 VS> VS> kstat.zfs.misc.arcstats.l2_asize: 1149007667712 VS> VS> VS> VS> can anybody explain how to understand them i.e. l2_asize - real used space on l2arc an l2_size - uncompressed size, VS> VS> VS> VS> or maybe something else ? VS> VS> VS> VS> VS> VS> VS> VS> Vitalij Satanivskij wrote: VS> VS> VS> VS> VS> VS> Data on pool have compressratio around 1.4 VS> VS> VS> VS> VS> VS> On diferent servers with same data type and load L2 ARC Size: (Adaptive) can be diferent VS> VS> VS> VS> VS> VS> for example 1.04TiB vs 1.45TiB VS> VS> VS> VS> VS> VS> But it's all have same porblem - grow in time. VS> VS> VS> VS> VS> VS> VS> VS> VS> More stange for us - VS> VS> VS> VS> VS> VS> ARC: 80G Total, 4412M MFU, 5040M MRU, 76M Anon, 78G Header, 2195M Other VS> VS> VS> VS> VS> VS> 78G header size and ubnormal - VS> VS> VS> VS> VS> VS> kstat.zfs.misc.arcstats.l2_cksum_bad: 210920592 VS> VS> VS> kstat.zfs.misc.arcstats.l2_io_error: 7362414 VS> VS> VS> VS> VS> VS> sysctl's growing avery second. VS> VS> VS> VS> VS> VS> All part's of server (as hardware part's) in in normal state. VS> VS> VS> VS> VS> VS> After reboot no problem's for some period untile cache size grow to some limit. VS> VS> VS> VS> VS> VS> VS> VS> VS> VS> VS> VS> Mark Felder wrote: VS> VS> VS> MF> On Mon, Oct 7, 2013, at 13:0
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
AJ> Some background on L2ARC compression for you: AJ> AJ> http://wiki.illumos.org/display/illumos/L2ARC+Compression I'm alredy see it. AJ> http://svnweb.freebsd.org/base?view=revision&revision=251478 AJ> AJ> Are you sure that compression on pool/zfs is off? it would normally AJ> inherit from the parent, so double check with: zfs get compression pool/zfs Yes, compression turned off on pool/zfs, it's was may time rechecked. AJ> Is the data on pool/zfs related to the data on the root pool? if AJ> pool/zfs were a clone, and the data is actually used in both places, the AJ> newer 'single copy ARC' feature may come in to play: AJ> https://www.illumos.org/issues/3145 No, both pool and pool/zfs have diferent type of data, pool/zfs was created as new empty zfs (zfs create pool/zfs) and data was writed to it from another server. Right now one machine work fine with l2arc. This machine without patch for corecting ashift on cache devices. At last 3 day's working with zero errors. Another servers with same config similar data, load and so on after 2 day work began report abouy errors. AJ> AJ> AJ> AJ> -- AJ> Allan Jude AJ> AJ> ___ AJ> freebsd-current@freebsd.org mailing list AJ> http://lists.freebsd.org/mailman/listinfo/freebsd-current AJ> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Hello. Patch brocke cache functionality. Look at's Dmitriy's mail from Mon, 07 Oct 2013 21:09:06 +0300 With subject ZFS L2ARC - incorrect size and abnormal system load on r255173 As patch alredy in head and BETA it's not good. Yesterday we update one machine up to beta1 and forgot about patch. So 12 Hours and cache broken... :(( Dmitriy Makarov wrote: DM> The attached patch by Steven Hartland fixes issue for me too. Thank you! DM> DM> DM> --- Исходное сообщение --- DM> От кого: "Steven Hartland" < kill...@multiplay.co.uk > DM> Дата: 18 сентября 2013, 01:53:10 DM> DM> - Original Message - DM> From: "Justin T. Gibbs" < DM> DM> --- DM> Дмитрий Макаров DM> ___ DM> freebsd-current@freebsd.org mailing list DM> http://lists.freebsd.org/mailman/listinfo/freebsd-current DM> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Yes We have 15 servers, all of them have problem while using with patch fo ashift, sh we rollback path (for r255173) and all of them works for a week without that's problem's. Yesterday one of of servers was updated to stable/10 (beta1) wich include patch and after around 12 hours of works l2arc begin et errors like that kstat.zfs.misc.arcstats.l2_cksum_bad kstat.zfs.misc.arcstats.l2_io_error For now patch disabled in ower production. Please note we have very heavy load on zfs pool so 90GB arc and 3x180Gb L2arc have very big hit's on it on it. SSD used for cache's is intel ssd 530 series smart for all devices in in normal states's no bad values on it. Steven Hartland wrote: SH> Have you confirmed the ashift changes are the actual cause of this SH> by backing out just those changes and retesting on the same hardware. SH> SH> Also worth checking your disks smart values to confirm there are no SH> visible signs of HW errors. SH> SH> Regards SH> Steve SH> SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> To: "Dmitriy Makarov" SH> Cc: "Steven Hartland" ; "Justin T. Gibbs" ; "Borja Marcos" ; SH> SH> Sent: Wednesday, October 16, 2013 9:01 AM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > Hello. SH> > SH> > Patch brocke cache functionality. SH> > SH> > Look at's Dmitriy's mail from Mon, 07 Oct 2013 21:09:06 +0300 SH> > SH> > With subject ZFS L2ARC - incorrect size and abnormal system load on r255173 SH> > SH> > As patch alredy in head and BETA it's not good. SH> > SH> > Yesterday we update one machine up to beta1 and forgot about patch. So 12 Hours and cache broken... :(( SH> > SH> > SH> > SH> > Dmitriy Makarov wrote: SH> > DM> The attached patch by Steven Hartland fixes issue for me too. Thank you! SH> > DM> SH> > DM> SH> > DM> --- Исходное сообщение --- SH> > DM> От кого: "Steven Hartland" < kill...@multiplay.co.uk > SH> > DM> Дата: 18 сентября 2013, 01:53:10 SH> > DM> SH> > DM> - Original Message - SH> > DM> From: "Justin T. Gibbs" < SH> > DM> SH> > DM> --- SH> > DM> Дмитрий Макаров SH> > DM> ___ SH> > DM> freebsd-current@freebsd.org mailing list SH> > DM> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > DM> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" SH> > SH> SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Steven Hartland wrote: SH> I'm not clear what you rolled back there as r255173 has ntothing to do SH> with this. Could you clarify r255173 with you patch from email dated Tue, 17 Sep 2013 23:53:12 +0100 with subject Re: ZFS secondarycache on SSD problem on r255173 Errors wich we gets is in arcstats count not in messages, and was desribed some time ago in mails be me and Dmitriy Makarov with subject's ZFS L2ARC - incorrect size and abnormal system load on r255173 On r255173 without patch and with vfs.zfs.max_auto_ashift=9 when added to pool 2 ssd as caches get cache gpt/cache1ONLINE 0 0 0 block size: 512B configured, 4096B native gpt/cache2ONLINE 0 0 0 block size: 512B configured, 4096B native Same message we seen with default vfs.zfs.max_auto_ashift Will wait some time to see how it works. SH> Any errors recorded in /var/log/messages? SH> SH> Could you add code to record the none zero value of zio->io_error in SH> l2arc_read_done as this may give some indication of the underlying SH> issue. SH> SH> Additionally could always put a panic in that code path too and then SH> create a dump so the details can be fully exhamined. SH> SH> In terms of the slowness thats going to be a side effect of the cache SH> failures. SH> SH> Oh could you also confirm that the issue doesn't exist if you SH> 1. Exclude r255753 SH> 2. Set vfs.zfs.max_auto_ashift=9 SH> SH> Regards SH> Steve SH> - Original Message ----- SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: "Vitalij Satanivskij" ; "Dmitriy Makarov" ; "Justin T. Gibbs" ; "Borja SH> Marcos" ; SH> Sent: Wednesday, October 16, 2013 3:10 PM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > Yes SH> > SH> > We have 15 servers, all of them have problem while using with patch fo ashift, sh we rollback path (for r255173) SH> > and all of them works for a week without that's problem's. Yesterday one of of servers was updated to stable/10 (beta1) SH> > SH> > wich include patch and after around 12 hours of works l2arc begin et errors like that SH> > SH> > kstat.zfs.misc.arcstats.l2_cksum_bad SH> > kstat.zfs.misc.arcstats.l2_io_error SH> > SH> > SH> > For now patch disabled in ower production. SH> > SH> > SH> > Please note we have very heavy load on zfs pool so 90GB arc and 3x180Gb L2arc have very big hit's on it on it. SH> > SH> > SH> > SSD used for cache's is intel ssd 530 series smart for all devices in in normal states's SH> > no bad values on it. SH> > SH> > Steven Hartland wrote: SH> > SH> Have you confirmed the ashift changes are the actual cause of this SH> > SH> by backing out just those changes and retesting on the same hardware. SH> > SH> SH> > SH> Also worth checking your disks smart values to confirm there are no SH> > SH> visible signs of HW errors. SH> > SH> SH> > SH> Regards SH> > SH> Steve SH> > SH> SH> > SH> - Original Message - SH> > SH> From: "Vitalij Satanivskij" SH> > SH> To: "Dmitriy Makarov" SH> > SH> Cc: "Steven Hartland" ; "Justin T. Gibbs" ; "Borja Marcos" ; SH> > SH> SH> > SH> Sent: Wednesday, October 16, 2013 9:01 AM SH> > SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> > SH> SH> > SH> SH> > SH> > Hello. SH> > SH> > SH> > SH> > Patch brocke cache functionality. SH> > SH> > SH> > SH> > Look at's Dmitriy's mail from Mon, 07 Oct 2013 21:09:06 +0300 SH> > SH> > SH> > SH> > With subject ZFS L2ARC - incorrect size and abnormal system load on r255173 SH> > SH> > SH> > SH> > As patch alredy in head and BETA it's not good. SH> > SH> > SH> > SH> > Yesterday we update one machine up to beta1 and forgot about patch. So 12 Hours and cache broken... :(( SH> > SH> > SH> > SH> > SH> > SH> > SH> > SH> > Dmitriy Makarov wrote: SH> > SH> > DM> The attached patch by Steven Hartland fixes issue for me too. Thank you! SH> > SH> > DM> SH> > SH> > DM> SH> > SH> > DM> --- Исходное сообщение --- SH> > SH> > DM> От кого: "Steven Hartland" < kill...@multiplay.co.uk > SH> > SH> > DM> Дата: 18 сентября 2013, 01:53:10 SH> > SH> > DM> SH> > SH> &g
Re: ZFS secondarycache on SSD problem on r255173
Hello. Problem description is in - http://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html As we find later first begin's problem with errors counter in arcstats than size of l2 grows abnormal. After patch rollback everything is ok. Justin T. Gibbs wrote: JTG> You'll have to be more specific. I don't have that email or know what list on which to search. JTG> JTG> Thanks, JTG> Justin JTG> JTG> On Oct 16, 2013, at 2:01 AM, Vitalij Satanivskij wrote: JTG> JTG> > Hello. JTG> > JTG> > Patch brocke cache functionality. JTG> > JTG> > Look at's Dmitriy's mail from Mon, 07 Oct 2013 21:09:06 +0300 JTG> > JTG> > With subject ZFS L2ARC - incorrect size and abnormal system load on r255173 JTG> > JTG> > As patch alredy in head and BETA it's not good. JTG> > JTG> > Yesterday we update one machine up to beta1 and forgot about patch. So 12 Hours and cache broken... :(( JTG> > JTG> > JTG> > JTG> > Dmitriy Makarov wrote: JTG> > DM> The attached patch by Steven Hartland fixes issue for me too. Thank you! JTG> > DM> JTG> > DM> JTG> > DM> --- Исходное сообщение --- JTG> > DM> От кого: "Steven Hartland" < kill...@multiplay.co.uk > JTG> > DM> Дата: 18 сентября 2013, 01:53:10 JTG> > DM> JTG> > DM> - Original Message - JTG> > DM> From: "Justin T. Gibbs" < JTG> > DM> JTG> > DM> --- JTG> > DM> Дмитрий Макаров JTG> > DM> ___ JTG> > DM> freebsd-current@freebsd.org mailing list JTG> > DM> http://lists.freebsd.org/mailman/listinfo/freebsd-current JTG> > DM> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" JTG> > JTG> JTG> ___ JTG> freebsd-current@freebsd.org mailing list JTG> http://lists.freebsd.org/mailman/listinfo/freebsd-current JTG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Hello. SSD is Intel SSD 530 series (INTEL SSDSC2BW180A4 DC12) Controler is onboard intel sata controler, motherboard is Supermicro X9SRL-F so it's Intel C602 chipset All cache ssd connected to sata 2 ports. System has LSI MPS controler (SAS2308) with firmware version - 16.00.00.00, but only hdd's (36 1TB WD RE4 drives) connected to it. Steven Hartland wrote: SH> Ohh stupid question what hardware are you running this on, SH> specifically what SSD's and what controller and if relavent SH> what controller Firmware version? SH> SH> I wonder if you might have bad HW / FW, such as older LSI SH> mps Firmware, which is know to causing corruption with SH> some delete methods. SH> SH> Without the ashift fixes, you'll be sending short requests SH> to the disk which will then ignore them, so I can see a SH> potential corrilation there. SH> SH> You could rule this out by disabling ZFS TRIM with SH> vfs.zfs.trim.enabled=0 in /boot/loader.conf SH> SH> Regards SH> Steve SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Just to be sure I understand you clearly, I need to test next configuration: 1) System with ashift patch eg. just latest stable/10 revision 2) vfs.zfs.trim.enabled=0 in /boot/loader.conf So realy only diferens in default system configuration is disabled trim functional ? Steven Hartland wrote: SH> Still worth testing with the problem version installed but SH> with trim disabled to see if that clears the issues, if SH> nothing else it will confirm / deny if trim is involved. SH> SH> Regards SH> Steve SH> SH> - Original Message ----- SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: "Justin T. Gibbs" ; "Vitalij Satanivskij" ; ; "Borja Marcos" SH> ; "Dmitriy Makarov" SH> Sent: Thursday, October 17, 2013 7:12 AM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > Hello. SH> > SH> > SSD is Intel SSD 530 series (INTEL SSDSC2BW180A4 DC12) SH> > SH> > Controler is onboard intel sata controler, motherboard is Supermicro X9SRL-F so it's Intel C602 chipset SH> > SH> > All cache ssd connected to sata 2 ports. SH> > SH> > System has LSI MPS controler (SAS2308) with firmware version - 16.00.00.00, but only hdd's (36 1TB WD RE4 drives) SH> > connected to it. SH> > SH> > SH> > Steven Hartland wrote: SH> > SH> Ohh stupid question what hardware are you running this on, SH> > SH> specifically what SSD's and what controller and if relavent SH> > SH> what controller Firmware version? SH> > SH> SH> > SH> I wonder if you might have bad HW / FW, such as older LSI SH> > SH> mps Firmware, which is know to causing corruption with SH> > SH> some delete methods. SH> > SH> SH> > SH> Without the ashift fixes, you'll be sending short requests SH> > SH> to the disk which will then ignore them, so I can see a SH> > SH> potential corrilation there. SH> > SH> SH> > SH> You could rule this out by disabling ZFS TRIM with SH> > SH> vfs.zfs.trim.enabled=0 in /boot/loader.conf SH> > SH> SH> > SH> Regards SH> > SH> Steve SH> > SH> SH> > SH> SH> > SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the SH> > event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any SH> > information contained in it. SH> > SH> SH> > SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> > SH> or return the E.mail to postmas...@multiplay.co.uk. SH> > SH> SH> > SH> ___ SH> > SH> freebsd-current@freebsd.org mailing list SH> > SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" SH> > SH> SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Hello. Yesterday system was rebooted with vfs.zfs.trim.enabled=0 System version 10.0-BETA1 FreeBSD 10.0-BETA1 #6 r256669, without any changes in code Uptime 10:51 up 16:41 sysctl vfs.zfs.trim.enabled vfs.zfs.trim.enabled: 0 Around 2 hours ago errors counter's kstat.zfs.misc.arcstats.l2_cksum_bad: 854359 kstat.zfs.misc.arcstats.l2_io_error: 38254 begin grow from zero values. After remove cache 2013-10-18.10:37:10 zpool remove disk1 gpt/cache0 gpt/cache1 gpt/cache2 and attach again 2013-10-18.10:38:28 zpool add disk1 cache gpt/cache0 gpt/cache1 gpt/cache2 counters stop growing (of couse thay not zeroed) before cache remove kstat.zfs.misc.arcstats.l2_asize was around 280GB hw size of l2 cache is 3x164G => 34 351651821 ada3 GPT (168G) 34 6- free - (3.0K) 408388608 1 zil2 (4.0G) 8388648 343263200 2 cache2 (164G) 351651848 7- free - (3.5K) Any hypothesis what alse we can test/try etc? Steven Hartland wrote: SH> Correct. SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> SH> SH> > Just to be sure I understand you clearly, I need to test next configuration: SH> > SH> > 1) System with ashift patch eg. just latest stable/10 revision SH> > 2) vfs.zfs.trim.enabled=0 in /boot/loader.conf SH> > SH> > So realy only diferens in default system configuration is disabled trim functional ? SH> > SH> > SH> > SH> > Steven Hartland wrote: SH> > SH> Still worth testing with the problem version installed but SH> > SH> with trim disabled to see if that clears the issues, if SH> > SH> nothing else it will confirm / deny if trim is involved. SH> SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Just right now stats not to actual because of some another test. Test is simply all gpart information destroyed from ssd and They used as raw cache devices. Just 2013-10-18.11:30:49 zpool add disk1 cache /dev/ada1 /dev/ada2 /dev/ada3 So sizes at last l2_size and l2_asize in not actual. But heare it is: kstat.zfs.misc.arcstats.hits: 5178174063 kstat.zfs.misc.arcstats.misses: 57690806 kstat.zfs.misc.arcstats.demand_data_hits: 313995744 kstat.zfs.misc.arcstats.demand_data_misses: 37414740 kstat.zfs.misc.arcstats.demand_metadata_hits: 4719242892 kstat.zfs.misc.arcstats.demand_metadata_misses: 9266394 kstat.zfs.misc.arcstats.prefetch_data_hits: 1182495 kstat.zfs.misc.arcstats.prefetch_data_misses: 9951733 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 143752935 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 1057939 kstat.zfs.misc.arcstats.mru_hits: 118609738 kstat.zfs.misc.arcstats.mru_ghost_hits: 1895486 kstat.zfs.misc.arcstats.mfu_hits: 4914673425 kstat.zfs.misc.arcstats.mfu_ghost_hits: 14537497 kstat.zfs.misc.arcstats.allocated: 103796455 kstat.zfs.misc.arcstats.deleted: 40168100 kstat.zfs.misc.arcstats.stolen: 20832742 kstat.zfs.misc.arcstats.recycle_miss: 15663428 kstat.zfs.misc.arcstats.mutex_miss: 1456781 kstat.zfs.misc.arcstats.evict_skip: 25960184 kstat.zfs.misc.arcstats.evict_l2_cached: 891379153920 kstat.zfs.misc.arcstats.evict_l2_eligible: 50578438144 kstat.zfs.misc.arcstats.evict_l2_ineligible: 956055729664 kstat.zfs.misc.arcstats.hash_elements: 8693451 kstat.zfs.misc.arcstats.hash_elements_max: 14369414 kstat.zfs.misc.arcstats.hash_collisions: 90967764 kstat.zfs.misc.arcstats.hash_chains: 1891463 kstat.zfs.misc.arcstats.hash_chain_max: 24 kstat.zfs.misc.arcstats.p: 73170954752 kstat.zfs.misc.arcstats.c: 85899345920 kstat.zfs.misc.arcstats.c_min: 42949672960 kstat.zfs.misc.arcstats.c_max: 85899345920 kstat.zfs.misc.arcstats.size: 85899263104 kstat.zfs.misc.arcstats.hdr_size: 1425948696 kstat.zfs.misc.arcstats.data_size: 77769994240 kstat.zfs.misc.arcstats.other_size: 6056233632 kstat.zfs.misc.arcstats.l2_hits: 21725934 kstat.zfs.misc.arcstats.l2_misses: 35876251 kstat.zfs.misc.arcstats.l2_feeds: 130197 kstat.zfs.misc.arcstats.l2_rw_clash: 110181 kstat.zfs.misc.arcstats.l2_read_bytes: 391282009600 kstat.zfs.misc.arcstats.l2_write_bytes: 1098703347712 kstat.zfs.misc.arcstats.l2_writes_sent: 130037 kstat.zfs.misc.arcstats.l2_writes_done: 130037 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 375921 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 331 kstat.zfs.misc.arcstats.l2_evict_reading: 43 kstat.zfs.misc.arcstats.l2_free_on_write: 255730 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 854359 kstat.zfs.misc.arcstats.l2_io_error: 38254 kstat.zfs.misc.arcstats.l2_size: 136696884736 kstat.zfs.misc.arcstats.l2_asize: 131427690496 kstat.zfs.misc.arcstats.l2_hdr_size: 742951208 kstat.zfs.misc.arcstats.l2_compress_successes: 5565311 kstat.zfs.misc.arcstats.l2_compress_zeros: 0 kstat.zfs.misc.arcstats.l2_compress_failures: 0 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 325157131 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 4897854 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 115704249 kstat.zfs.misc.arcstats.l2_write_in_l2: 15114214372 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 63417 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 3291593934 kstat.zfs.misc.arcstats.l2_write_full: 47672 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 130197 kstat.zfs.misc.arcstats.l2_write_pios: 130037 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 369077156457472 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 8015080 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 79825 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.duplicate_buffers: 0 kstat.zfs.misc.arcstats.duplicate_buffers_size: 0 kstat.zfs.misc.arcstats.duplicate_reads: 0 Values of - kstat.zfs.misc.arcstats.l2_cksum_bad: 854359 kstat.zfs.misc.arcstats.l2_io_error: 38254 not growing from last cache reconfiguration, just wait some time to see - maybe problem disapers :) Steven Hartland wrote: SH> Hmm so that rules out a TRIM related issue. I wonder if the SH> increase in ashift has triggered a problem in compression. SH> SH> What are all the values reported by: SH> sysctl -a kstat.zfs.misc.arcstats SH> SH> Regards SH> Steve SH> SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: ; "Justin T. Gibbs" ; ; "Borja Marcos" ; SH> "Dmitriy Makarov" SH> Sent: Friday, October 18, 2013 9:01 AM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > Hello. SH> > SH> > Yesterday system was rebooted with vfs.zfs.trim.enabled=0 SH> > SH> > Syste
Re: ZFS secondarycache on SSD problem on r255173
Ok. Just right now system rebooted with you patch. Trim enabled again. WIll wait some time untile size of used cache grow's. Steven Hartland wrote: SH> Looking at the l2arc compression code I believe that metadata is always SH> compressed with lz4, even if compression is off on all datasets. SH> SH> This is backed up by what I'm seeing on my system here as it shows a SH> non-zero l2_compress_successes value even though I'm not using SH> compression at all. SH> SH> I think we we may well need the following patch to set the minblock SH> size based on the vdev ashift and not SPA_MINBLOCKSIZE. SH> SH> svn diff -x -p sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> === SH> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c(revision 256554) SH> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c(working copy) SH> @@ -5147,7 +5147,7 @@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr) SH> len = l2hdr->b_asize; SH> cdata = zio_data_buf_alloc(len); SH> csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata, SH> - cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE); SH> + cdata, l2hdr->b_asize, (size_t)(1ULL << l2hdr->b_dev->l2ad_vdev->vdev_ashift)); SH> SH> if (csize == 0) { SH> /* zero block, indicate that there's nothing to write */ SH> SH> Could you try this patch on your system Vitalij see if it has any effect SH> on the number of l2_cksum_bad / l2_io_error? SH> SH> Regards SH> Steve SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: "Vitalij Satanivskij" ; "Dmitriy Makarov" ; "Justin T. Gibbs" ; "Borja SH> Marcos" ; SH> Sent: Friday, October 18, 2013 3:45 PM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > SH> > Just right now stats not to actual because of some another test. SH> > SH> > Test is simply all gpart information destroyed from ssd and SH> > SH> > They used as raw cache devices. Just SH> > 2013-10-18.11:30:49 zpool add disk1 cache /dev/ada1 /dev/ada2 /dev/ada3 SH> > SH> > So sizes at last l2_size and l2_asize in not actual. SH> > SH> > But heare it is: SH> > SH> > kstat.zfs.misc.arcstats.hits: 5178174063 SH> > kstat.zfs.misc.arcstats.misses: 57690806 SH> > kstat.zfs.misc.arcstats.demand_data_hits: 313995744 SH> > kstat.zfs.misc.arcstats.demand_data_misses: 37414740 SH> > kstat.zfs.misc.arcstats.demand_metadata_hits: 4719242892 SH> > kstat.zfs.misc.arcstats.demand_metadata_misses: 9266394 SH> > kstat.zfs.misc.arcstats.prefetch_data_hits: 1182495 SH> > kstat.zfs.misc.arcstats.prefetch_data_misses: 9951733 SH> > kstat.zfs.misc.arcstats.prefetch_metadata_hits: 143752935 SH> > kstat.zfs.misc.arcstats.prefetch_metadata_misses: 1057939 SH> > kstat.zfs.misc.arcstats.mru_hits: 118609738 SH> > kstat.zfs.misc.arcstats.mru_ghost_hits: 1895486 SH> > kstat.zfs.misc.arcstats.mfu_hits: 4914673425 SH> > kstat.zfs.misc.arcstats.mfu_ghost_hits: 14537497 SH> > kstat.zfs.misc.arcstats.allocated: 103796455 SH> > kstat.zfs.misc.arcstats.deleted: 40168100 SH> > kstat.zfs.misc.arcstats.stolen: 20832742 SH> > kstat.zfs.misc.arcstats.recycle_miss: 15663428 SH> > kstat.zfs.misc.arcstats.mutex_miss: 1456781 SH> > kstat.zfs.misc.arcstats.evict_skip: 25960184 SH> > kstat.zfs.misc.arcstats.evict_l2_cached: 891379153920 SH> > kstat.zfs.misc.arcstats.evict_l2_eligible: 50578438144 SH> > kstat.zfs.misc.arcstats.evict_l2_ineligible: 956055729664 SH> > kstat.zfs.misc.arcstats.hash_elements: 8693451 SH> > kstat.zfs.misc.arcstats.hash_elements_max: 14369414 SH> > kstat.zfs.misc.arcstats.hash_collisions: 90967764 SH> > kstat.zfs.misc.arcstats.hash_chains: 1891463 SH> > kstat.zfs.misc.arcstats.hash_chain_max: 24 SH> > kstat.zfs.misc.arcstats.p: 73170954752 SH> > kstat.zfs.misc.arcstats.c: 85899345920 SH> > kstat.zfs.misc.arcstats.c_min: 42949672960 SH> > kstat.zfs.misc.arcstats.c_max: 85899345920 SH> > kstat.zfs.misc.arcstats.size: 85899263104 SH> > kstat.zfs.misc.arcstats.hdr_size: 1425948696 SH> > kstat.zfs.misc.arcstats.data_size: 77769994240 SH> > kstat.zfs.misc.arcstats.other_size: 6056233632 SH> > kstat.zfs.misc.arcstats.l2_hits: 21725934 SH> > kstat.zfs.misc.arcstats.l2_misses: 35876251 SH> > kstat.zfs.misc.arcstats.l2_feeds: 130197 SH> > k
Re: ZFS secondarycache on SSD problem on r255173
Just now I cannot say, as to triger problem we need at last 200+gb size on l2arc wich usually grow in one production day. But for some reason today in the morning server was rebooted so cache was flushed and now only 100Gb. Need to wait some more time. At last for now none error on l2. Steven Hartland wrote: SH> Hows things looking Vitalij? SH> SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> SH> SH> > Ok. Just right now system rebooted with you patch. SH> > SH> > Trim enabled again. SH> > SH> > WIll wait some time untile size of used cache grow's. SH> > SH> > SH> > Steven Hartland wrote: SH> > SH> Looking at the l2arc compression code I believe that metadata is always SH> > SH> compressed with lz4, even if compression is off on all datasets. SH> > SH> SH> > SH> This is backed up by what I'm seeing on my system here as it shows a SH> > SH> non-zero l2_compress_successes value even though I'm not using SH> > SH> compression at all. SH> > SH> SH> > SH> I think we we may well need the following patch to set the minblock SH> > SH> size based on the vdev ashift and not SPA_MINBLOCKSIZE. SH> > SH> SH> > SH> svn diff -x -p sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> > SH> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> > SH> === SH> > SH> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 256554) SH> > SH> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy) SH> > SH> @@ -5147,7 +5147,7 @@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr) SH> > SH> len = l2hdr->b_asize; SH> > SH> cdata = zio_data_buf_alloc(len); SH> > SH> csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata, SH> > SH> - cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE); SH> > SH> + cdata, l2hdr->b_asize, (size_t)(1ULL << l2hdr->b_dev->l2ad_vdev->vdev_ashift)); SH> > SH> SH> > SH> if (csize == 0) { SH> > SH> /* zero block, indicate that there's nothing to write */ SH> > SH> SH> > SH> Could you try this patch on your system Vitalij see if it has any effect SH> > SH> on the number of l2_cksum_bad / l2_io_error? SH> SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Steven Hartland wrote: SH> So previously you only started seeing l2 errors after there was SH> a significant amount of data in l2arc? Thats interesting in itself SH> if thats the case. Yes someting arround 200+gb SH> I wonder if its the type of data, or something similar. Do you SH> run compression on any of your volumes? SH> zfs get compression Just now testing goes on next configuration first zfs is top level pool calling disk1 have enable lz4 compression and secondarycache = metadata next zfs is disk1/data with compression=off and secondarycache = all Error was seen on confiruration like that and on configuration where was seted as secondarycache = none for disk1 (disk1/data still fully cached) SH> Regards SH> Steve SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> SH> SH> > SH> > Just now I cannot say, as to triger problem we need at last 200+gb size on l2arc wich usually grow in one production day. SH> > SH> > But for some reason today in the morning server was rebooted so cache was flushed and now only 100Gb. SH> > SH> > Need to wait some more time. SH> > SH> > At last for now none error on l2. SH> SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
How to debug whats cause to much __mtx_lock_sleep in system
Hello. Have 10.0-BETA1 #7 r256765 whith terible load's "load averages: 23.31, 30.53, 31" wich degraded more and more with time. Kernel compilied with dtrace support and using script called hotkernel from DTraceToolkit-0.99 found some stange statistics zfs.ko`lz4_compress 5045 0.2% kernel`0x80 5185 0.2% kernel`uma_zalloc_arg5302 0.2% kernel`bcopy 5322 0.2% kernel`_sx_xlock 7310 0.3% kernel`_sx_xunlock 7434 0.3% zfs.ko`l2arc_feed_thread 9797 0.4% zfs.ko`lzjb_compress 9912 0.4% zfs.ko`list_prev17894 0.7% kernel`__rw_wlock_hard 30522 1.2% kernel`spinlock_exit31310 1.3% kernel`acpi_cpu_c1 103495 4.1% kernel`_sx_xlock_hard 138743 5.5% kernel`vmem_xalloc 175869 7.0% kernel`cpu_idle371159 14.8% kernel`__mtx_lock_sleep 1345815 53.8% Theris another same machine with simple data and usage but with old curent r245701 Which have none problem's with load zfs.ko`fletcher_4_native 2366 0.1% kernel`uma_zfree_arg 2387 0.1% zfs.ko`lzjb_decompress 2392 0.1% kernel`__rw_rlock2477 0.1% zfs.ko`dmu_zfetch2553 0.1% kernel`bcopy 3035 0.1% kernel`vm_page_splay 3089 0.1% kernel`_mtx_trylock_flags_ 3346 0.2% kernel`bzero 3411 0.2% kernel`0x80 3665 0.2% kernel`_sx_xunlock 3818 0.2% kernel`uma_zalloc_arg4216 0.2% kernel`vmtotal 4702 0.2% kernel`_sx_xlock 5117 0.2% kernel`free 5476 0.2% zfs.ko`lzjb_compress 6674 0.3% kernel`spinlock_exit21590 1.0% kernel`__mtx_lock_sleep 40819 1.9% kernel`acpi_cpu_c1 311077 14.1% kernel`cpu_idle 1639418 74.6% Both servers have same hardware, same software of cause not system version. So which way is the right to investigate problem and find resolution? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS secondarycache on SSD problem on r255173
Ок, just up to now no error on l2arc L2 ARC Summary: (HEALTHY) Passed Headroom:1.99m Tried Lock Failures:144.53m IO In Progress: 130.15k Low Memory Aborts: 7 Free on Write: 335.56k Writes While Full: 30.31k R/W Clashes:115.31k Bad Checksums: 0 IO Errors: 0 SPA Mismatch: 153.15m L2 ARC Size: (Adaptive) 433.75 GiB Header Size:0.49% 2.12GiB I will test for longer time, but looks like problem gone. Vitalij Satanivskij wrote: VS> Steven Hartland wrote: VS> SH> So previously you only started seeing l2 errors after there was VS> SH> a significant amount of data in l2arc? Thats interesting in itself VS> SH> if thats the case. VS> VS> Yes someting arround 200+gb VS> VS> SH> I wonder if its the type of data, or something similar. Do you VS> SH> run compression on any of your volumes? VS> SH> zfs get compression VS> VS> Just now testing goes on next configuration VS> VS> first zfs is top level pool calling disk1 have enable lz4 compression and secondarycache = metadata VS> VS> next zfs is disk1/data with compression=off and secondarycache = all VS> VS> Error was seen on confiruration like that and on configuration where was seted as secondarycache = none for disk1 (disk1/data still fully cached) VS> VS> VS> VS> VS> SH> Regards VS> SH> Steve VS> SH> - Original Message - VS> SH> From: "Vitalij Satanivskij" VS> SH> VS> SH> VS> SH> > VS> SH> > Just now I cannot say, as to triger problem we need at last 200+gb size on l2arc wich usually grow in one production day. VS> SH> > VS> SH> > But for some reason today in the morning server was rebooted so cache was flushed and now only 100Gb. VS> SH> > VS> SH> > Need to wait some more time. VS> SH> > VS> SH> > At last for now none error on l2. VS> SH> VS> SH> VS> SH> VS> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. VS> SH> VS> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 VS> SH> or return the E.mail to postmas...@multiplay.co.uk. VS> SH> VS> SH> ___ VS> SH> freebsd-current@freebsd.org mailing list VS> SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
First of all Thank you for help. As for high load on system, looks like problems with l2arc have litle impact on load comparatively to another just now not fully classifed things. Looks like ower internal software and libs that it use didn't like new VMEM subsystem, at last system behavior complitely diferent from 6 month older CURRENT. So for now none problem's with l2arc errors. Will try to understand reason of load and fix or at last ask for help again ^). Steven Hartland wrote: SH> First off I just wanted to clarify that you don't need to compression on SH> dataset for L2ARC to use LZ4 compression, it does this by default as is SH> not currently configurable. SH> SH> Next up I believe we've found the cause of this high load and I've just SH> committed the fix to head: SH> http://svnweb.freebsd.org/base?view=revision&sortby=file&revision=256889 SH> SH> Thanks to Vitalij for testing :) SH> SH> Dmitriy if you could test on your side too that would be appreciated. SH> SH> Regards SH> Steve SH> SH> - Original Message - SH> From: "Vitalij Satanivskij" SH> To: "Allan Jude" SH> Cc: SH> Sent: Thursday, October 10, 2013 6:03 PM SH> Subject: Re: ZFS L2ARC - incorrect size and abnormal system load on r255173 SH> SH> SH> > AJ> Some background on L2ARC compression for you: SH> > AJ> SH> > AJ> http://wiki.illumos.org/display/illumos/L2ARC+Compression SH> > SH> > I'm alredy see it. SH> > SH> > SH> > SH> > AJ> http://svnweb.freebsd.org/base?view=revision&revision=251478 SH> > AJ> SH> > AJ> Are you sure that compression on pool/zfs is off? it would normally SH> > AJ> inherit from the parent, so double check with: zfs get compression pool/zfs SH> > SH> > Yes, compression turned off on pool/zfs, it's was may time rechecked. SH> > SH> > SH> > SH> > AJ> Is the data on pool/zfs related to the data on the root pool? if SH> > AJ> pool/zfs were a clone, and the data is actually used in both places, the SH> > AJ> newer 'single copy ARC' feature may come in to play: SH> > AJ> https://www.illumos.org/issues/3145 SH> > SH> > No, both pool and pool/zfs have diferent type of data, pool/zfs was created as new empty zfs (zfs create pool/zfs) SH> > SH> > and data was writed to it from another server. SH> > SH> > SH> > Right now one machine work fine with l2arc. This machine without patch for corecting ashift on cache devices. SH> > SH> > At last 3 day's working with zero errors. Another servers with same config similar data, load and so on after 2 day SH> > work began report abouy errors. SH> > SH> > SH> > AJ> SH> > AJ> SH> > AJ> SH> > AJ> -- SH> > AJ> Allan Jude SH> > AJ> SH> > AJ> ___ SH> > AJ> freebsd-current@freebsd.org mailing list SH> > AJ> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > AJ> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" SH> > ___ SH> > freebsd-current@freebsd.org mailing list SH> > http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" SH> > SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Hello Bad news, to day two servers with applayed patches get L2ARC degraded L2 ARC Summary: (DEGRADED) Passed Headroom:4.17m Tried Lock Failures:635.53m IO In Progress: 41.89k Low Memory Aborts: 8 Free on Write: 1.03m Writes While Full: 12.95k R/W Clashes:405.05k Bad Checksums: 362.19k IO Errors: 45.60k SPA Mismatch: 526.37m L2 ARC Size: (Adaptive) 391.03 GiB Header Size:0.59% 2.30GiB So looks like problem not disapered ^( Vitalij Satanivskij wrote: VS> VS> First of all Thank you for help. VS> VS> As for high load on system, looks like problems with l2arc have litle impact on load comparatively to another just now VS> VS> not fully classifed things. VS> VS> Looks like ower internal software and libs that it use didn't like new VMEM subsystem, at last VS> system behavior complitely diferent from 6 month older CURRENT. VS> VS> So for now none problem's with l2arc errors. VS> VS> Will try to understand reason of load and fix or at last ask for help again ^). VS> VS> VS> VS> VS> Steven Hartland wrote: VS> SH> First off I just wanted to clarify that you don't need to compression on VS> SH> dataset for L2ARC to use LZ4 compression, it does this by default as is VS> SH> not currently configurable. VS> SH> VS> SH> Next up I believe we've found the cause of this high load and I've just VS> SH> committed the fix to head: VS> SH> http://svnweb.freebsd.org/base?view=revision&sortby=file&revision=256889 VS> SH> VS> SH> Thanks to Vitalij for testing :) VS> SH> VS> SH> Dmitriy if you could test on your side too that would be appreciated. VS> SH> VS> SH> Regards VS> SH> Steve VS> SH> VS> SH> - Original Message - VS> SH> From: "Vitalij Satanivskij" VS> SH> To: "Allan Jude" VS> SH> Cc: VS> SH> Sent: Thursday, October 10, 2013 6:03 PM VS> SH> Subject: Re: ZFS L2ARC - incorrect size and abnormal system load on r255173 VS> SH> VS> SH> VS> SH> > AJ> Some background on L2ARC compression for you: VS> SH> > AJ> VS> SH> > AJ> http://wiki.illumos.org/display/illumos/L2ARC+Compression VS> SH> > VS> SH> > I'm alredy see it. VS> SH> > VS> SH> > VS> SH> > VS> SH> > AJ> http://svnweb.freebsd.org/base?view=revision&revision=251478 VS> SH> > AJ> VS> SH> > AJ> Are you sure that compression on pool/zfs is off? it would normally VS> SH> > AJ> inherit from the parent, so double check with: zfs get compression pool/zfs VS> SH> > VS> SH> > Yes, compression turned off on pool/zfs, it's was may time rechecked. VS> SH> > VS> SH> > VS> SH> > VS> SH> > AJ> Is the data on pool/zfs related to the data on the root pool? if VS> SH> > AJ> pool/zfs were a clone, and the data is actually used in both places, the VS> SH> > AJ> newer 'single copy ARC' feature may come in to play: VS> SH> > AJ> https://www.illumos.org/issues/3145 VS> SH> > VS> SH> > No, both pool and pool/zfs have diferent type of data, pool/zfs was created as new empty zfs (zfs create pool/zfs) VS> SH> > VS> SH> > and data was writed to it from another server. VS> SH> > VS> SH> > VS> SH> > Right now one machine work fine with l2arc. This machine without patch for corecting ashift on cache devices. VS> SH> > VS> SH> > At last 3 day's working with zero errors. Another servers with same config similar data, load and so on after 2 day VS> SH> > work began report abouy errors. VS> SH> > VS> SH> > VS> SH> > AJ> VS> SH> > AJ> VS> SH> > AJ> VS> SH> > AJ> -- VS> SH> > AJ> Allan Jude VS> SH> > AJ> VS> SH> > AJ> ___ VS> SH> > AJ> freebsd-current@freebsd.org mailing list VS> SH> > AJ> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> SH> > AJ> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> SH> > ___ VS> SH> > freebsd-current@freebsd.org mailing list VS> SH> > ht
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Just after system reboot with this patch found kstat.zfs.misc.arcstats.l2_compress_successes: 6083 kstat.zfs.misc.arcstats.l2_compress_zeros: 1 kstat.zfs.misc.arcstats.l2_compress_failures: 296 compression on test pool (where I'm test this patch) is lz4 so is it ok ? Steven Hartland wrote: SH> If you are still seeing high load try commenting out the following SH> which should disable l2arc compression. SH> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> if (l2arc_compress) SH> hdr->b_flags |= ARC_L2COMPRESS; SH> ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
So that's patch not disabling compresion on l2arc? Steven Hartland wrote: SH> I would have expected zero for all l2_compress values. SH> SH> Regards SH> Steve SH> SH> - Original Message ----- SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: "Vitalij Satanivskij" ; "Dmitriy Makarov" ; SH> Sent: Friday, October 25, 2013 8:32 AM SH> Subject: Re: ZFS L2ARC - incorrect size and abnormal system load on r255173 SH> SH> SH> > Just after system reboot with this patch SH> > SH> > found SH> > kstat.zfs.misc.arcstats.l2_compress_successes: 6083 SH> > kstat.zfs.misc.arcstats.l2_compress_zeros: 1 SH> > kstat.zfs.misc.arcstats.l2_compress_failures: 296 SH> > SH> > compression on test pool (where I'm test this patch) is lz4 SH> > SH> > so is it ok ? SH> > SH> > SH> > SH> > Steven Hartland wrote: SH> > SH> If you are still seeing high load try commenting out the following SH> > SH> which should disable l2arc compression. SH> > SH> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c SH> > SH> if (l2arc_compress) SH> > SH> hdr->b_flags |= ARC_L2COMPRESS; SH> > SH> SH> > SH> SH> SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmas...@multiplay.co.uk. SH> SH> ___ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
I dont apply previos patch on high load server, only on test where find that it's not disabling compression. Thank you for help, i will try new patch as soon as posible. SH> SH> Have you seen any l2_io_error or l2_cksum_bad since SH> applying the ashift patch? SH> ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS L2ARC - incorrect size and abnormal system load on r255173
Hello. Patch worked as expected - no compresion on l2, but a) l2arc couse big overhad on fs io b) after some time errors apears again. For now best choice not to use any patch on system so, for now we delete next changes from system 1) http://svnweb.freebsd.org/base?view=revision&sortby=file&revision=256889 2) http://svnweb.freebsd.org/base?view=revision&revision=255753 Yes, zpool status says cache gpt/cache0ONLINE 0 0 0 block size: 512B configured, 4096B native gpt/cache1ONLINE 0 0 0 block size: 512B configured, 4096B native gpt/cache2ONLINE 0 0 0 block size: 512B configured, 4096B native But l2arc work as expected (no errors, no noticeable performance problem ) There is many another problem with performance on freebsd 10 after implement new vmem subsystem (at least no problem was before) but thay not corespond to l2 Don't even know what to do with situation. Vitalij Satanivskij wrote: VS> I dont apply previos patch on high load server, only on test where find that it's not disabling compression. VS> VS> Thank you for help, i will try new patch as soon as posible. VS> VS> SH> VS> SH> Have you seen any l2_io_error or l2_cksum_bad since VS> SH> applying the ashift patch? VS> SH> VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, Andriy Gapon wrote: AG> on 14/01/2014 07:27 Vladimir Sharun said the following: AG> > Dear Andriy and FreeBSD community, AG> > AG> >> I am not sure if the buffers are leaked somehow or if they are actually in use. AG> >> It's one of the very few places where data buffers are allocated without AG> >> charging ARC. In all other places it's quite easy to match allocations and AG> >> deallocations. But in L2ARC it is not obvious that all buffers get freed or AG> >> when that happens. AG> > AG> > After one week under load I think we figure out the cause: it's L2ARC. AG> > Here's the top's header for 7d17h of the runtime: AG> > AG> > last pid: 46409; load averages: 0.37, 0.62, 0.70 up 7+17:14:01 07:24:10 AG> > 173 processes: 1 running, 171 sleeping, 1 zombie AG> > CPU: 2.0% user, 0.0% nice, 3.5% system, 0.4% interrupt, 94.2% idle AG> > Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 3542M Free AG> > ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M Other AG> > AG> > ARC related tunables: AG> > AG> > vm.kmem_size="110G" AG> > vfs.zfs.arc_max="90G" AG> > vfs.zfs.arc_min="42G" AG> > AG> > For more than 7 days of hard runtime the picture clearly shows: AG> > Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and the AG> > system runs just fine. AG> > AG> > So what shall we do with L2ARC leakage ? AG> AG> AG> Could you please try this patch AG> http://cr.illumos.org/~webrev/skiselkov/3995/illumos-gate.patch ? AG> While applying path to curent version of arc.c (r260622) I'm found next truble with compilation olaris/uts/common/fs/zfs/arc.c -o arc.o /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4628:18: error: use of undeclared identifier 'abl2' trim_map_free(abl2->b_dev->l2ad_vdev, abl2->b_daddr, ^ 1 error generated. *** Error code 1 the code is - if (zio->io_error != 0) { /* * Error - drop L2ARC entry. */ list_remove(buflist, ab); ARCSTAT_INCR(arcstat_l2_asize, -l2hdr->b_asize); ab->b_l2hdr = NULL; trim_map_free(abl2->b_dev->l2ad_vdev, abl2->b_daddr, ab->b_size, 0); kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t)); ARCSTAT_INCR(arcstat_l2_size, -ab->b_size); } Looks like it's part is freebsd specific changes. Can somebody help with this part of code ? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, AG> AG> The first hunk of the patch is renaming of abl2 to l2hdr. AG> So it.s ok just change trim_map_free(abl2->b_dev->l2ad_vdev, abl2->b_daddr, ab->b_size, 0); to trim_map_free(l2hdr->b_dev->l2ad_vdev, l2hdr->b_daddr, ab->b_size, 0); ? Ok. Thank you. I will try this patch ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, Build world with path failed with error /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13: error: use of undeclared identifier 'l2hdr' ASSERT3P(l2hdr->b_tmp_cdata, ==, NULL); ^ /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40: note: expanded from macro 'ASSERT3P' #define ASSERT3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) ^ /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29: note: expanded from macro 'VERIFY3_IMPL' const TYPE __left = (TYPE)(LEFT); \ ^ 1 error generated. *** Error code 1 Vladimir Sharun wrote: VS> Dear Andriy and FreeBSD community, VS> VS> L2ARC temporarily turned off by setting secondarycache=none everywhere it was enabled, VS> so no more leak for one particular day. VS> VS> Here's the top header: VS> last pid: 89916; load averages: 2.49, 2.91, 2.89up 5+19:21:42 14:09:12 VS> 561 processes: 2 running, 559 sleeping VS> CPU: 5.7% user, 0.0% nice, 14.0% system, 1.0% interrupt, 79.3% idle VS> Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free VS> ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other VS> Swap: VS> VS> Here's the calculated vmstat -z (mean all of the allocations, which exceeds 100*1024^2 printed): VS> UMA Slabs: 199,915M VS> VM OBJECT: 207,354M VS> 32: 205,558M VS> 64: 901,122M VS> 128:215,211M VS> 256:242,262M VS> 4096: 2316,01M VS> range_seg_cache:205,396M VS> zio_buf_512:1103,31M VS> zio_buf_16384: 15697,9M VS> zio_data_buf_16384: 348,297M VS> zio_data_buf_24576: 129,352M VS> zio_data_buf_32768: 104,375M VS> zio_data_buf_36864: 163,371M VS> zio_data_buf_53248: 100,496M VS> zio_data_buf_57344: 105,93M VS> zio_data_buf_65536: 101,75M VS> zio_data_buf_73728: 111,938M VS> zio_data_buf_90112: 104,414M VS> zio_data_buf_106496:100,242M VS> zio_data_buf_131072:61652,5M VS> dnode_t:3203,98M VS> dmu_buf_impl_t: 797,695M VS> arc_buf_hdr_t: 1498,76M VS> arc_buf_t: 105,802M VS> zfs_znode_cache:352,61M VS> VS> zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M VS> easily exceeds ARC total (70G) VS> VS> VS> Here's the same calculations from exact the same system where L2 was disabled before reboot: VS> last pid: 63407; load averages: 2.35, 2.71, 2.73up 8+19:42:54 14:17:33 VS> 527 processes: 1 running, 526 sleeping VS> CPU: 4.8% user, 0.0% nice, 6.6% system, 1.1% interrupt, 87.4% idle VS> Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free VS> ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other VS> Swap: VS> VS> and the vmstat -z filtered: VS> UMA Slabs: 208,004M VS> VM OBJECT: 207,392M VS> 32: 172,831M VS> 64: 752,226M VS> 128:210,024M VS> 256:244,204M VS> 4096: 2249,02M VS> range_seg_cache:245,711M VS> zio_buf_512:1145,25M VS> zio_buf_16384: 15170,1M VS> zio_data_buf_16384: 422,766M VS> zio_data_buf_20480: 120,742M VS> zio_data_buf_24576: 148,641M VS> zio_data_buf_28672: 112,848M VS> zio_data_buf_32768: 117,375M VS> zio_data_buf_36864: 185,379M VS> zio_data_buf_45056: 103,168M VS> zio_data_buf_53248: 105,32M VS> zio_data_buf_57344: 122,828M VS> zio_data_buf_65536: 109,25M VS> zio_data_buf_69632: 100,406M VS> zio_data_buf_73728: 126,844M VS> zio_data_buf_77824: 101,086M VS> zio_data_buf_81920: 100,391M VS> zio_data_buf_86016: 101,391M VS> zio_data_buf_90112: 112,836M VS> zio_data_buf_98304: 100,688M VS> zio_data_buf_102400:106,543M VS> zio_data_buf_106496:108,875M VS> zio_data_buf_131072:63190,5M VS> dnode_t:3437,36M VS> dmu_buf_impl_t: 840,62M VS> arc_buf_hdr_t: 1870,88M VS> arc_buf_t: 114,942M VS> zfs_znode_cache:353,055M VS> VS> Everything seems within ARC total range. VS> VS> We will try patch attached within few days and will come back with the result. VS> VS> Thank you for your help. VS> VS> > on 28/01/2014 11:28 Vladimir Sharun said the following: VS> > > Dear Andriy and FreeBSD community, VS> > > VS> > > After applying this path one of the systems runs fine (disk subsystem load low to moderate VS> > > - 10-20% busy sustained), VS> > > VS> > > Then I saw this patch was merged to the HEAD and we apply it to the one of the systems VS> > > with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri Jan 24 17:25:08 EET 2014) VS> > > VS> > > Within 4 days we experiencing the same leak(?) as without patch: VS> > > VS> > > last pid: 53841; load averages: 4.47, 4.18, 3.78 up 3+16:37:09 11:24:39 VS> > > 543 processes: 6 running, 537
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, With patch system panic on boot. After remove cache device from pool system boot without problem. After this cache added again and sone kernel panic happened Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg Vitalij Satanivskij wrote: VS> Dear Andriy and FreeBSD community, VS> VS> Build world with path failed with error VS> VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13: error: use of VS> undeclared identifier 'l2hdr' VS> ASSERT3P(l2hdr->b_tmp_cdata, ==, NULL); VS> ^ VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40: note: expanded from VS> macro 'ASSERT3P' VS> #define ASSERT3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) VS> ^ VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29: note: expanded from VS> macro 'VERIFY3_IMPL' VS> const TYPE __left = (TYPE)(LEFT); \ VS>^ VS> 1 error generated. VS> *** Error code 1 VS> VS> VS> VS> Vladimir Sharun wrote: VS> VS> Dear Andriy and FreeBSD community, VS> VS> VS> VS> L2ARC temporarily turned off by setting secondarycache=none everywhere it was enabled, VS> VS> so no more leak for one particular day. VS> VS> VS> VS> Here's the top header: VS> VS> last pid: 89916; load averages: 2.49, 2.91, 2.89up 5+19:21:42 14:09:12 VS> VS> 561 processes: 2 running, 559 sleeping VS> VS> CPU: 5.7% user, 0.0% nice, 14.0% system, 1.0% interrupt, 79.3% idle VS> VS> Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free VS> VS> ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other VS> VS> Swap: VS> VS> VS> VS> Here's the calculated vmstat -z (mean all of the allocations, which exceeds 100*1024^2 printed): VS> VS> UMA Slabs: 199,915M VS> VS> VM OBJECT: 207,354M VS> VS> 32: 205,558M VS> VS> 64: 901,122M VS> VS> 128:215,211M VS> VS> 256:242,262M VS> VS> 4096: 2316,01M VS> VS> range_seg_cache:205,396M VS> VS> zio_buf_512:1103,31M VS> VS> zio_buf_16384: 15697,9M VS> VS> zio_data_buf_16384: 348,297M VS> VS> zio_data_buf_24576: 129,352M VS> VS> zio_data_buf_32768: 104,375M VS> VS> zio_data_buf_36864: 163,371M VS> VS> zio_data_buf_53248: 100,496M VS> VS> zio_data_buf_57344: 105,93M VS> VS> zio_data_buf_65536: 101,75M VS> VS> zio_data_buf_73728: 111,938M VS> VS> zio_data_buf_90112: 104,414M VS> VS> zio_data_buf_106496:100,242M VS> VS> zio_data_buf_131072:61652,5M VS> VS> dnode_t:3203,98M VS> VS> dmu_buf_impl_t: 797,695M VS> VS> arc_buf_hdr_t: 1498,76M VS> VS> arc_buf_t: 105,802M VS> VS> zfs_znode_cache:352,61M VS> VS> VS> VS> zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M VS> VS> easily exceeds ARC total (70G) VS> VS> VS> VS> VS> VS> Here's the same calculations from exact the same system where L2 was disabled before reboot: VS> VS> last pid: 63407; load averages: 2.35, 2.71, 2.73up 8+19:42:54 14:17:33 VS> VS> 527 processes: 1 running, 526 sleeping VS> VS> CPU: 4.8% user, 0.0% nice, 6.6% system, 1.1% interrupt, 87.4% idle VS> VS> Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free VS> VS> ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other VS> VS> Swap: VS> VS> VS> VS> and the vmstat -z filtered: VS> VS> UMA Slabs: 208,004M VS> VS> VM OBJECT: 207,392M VS> VS> 32: 172,831M VS> VS> 64: 752,226M VS> VS> 128:210,024M VS> VS> 256:244,204M VS> VS> 4096: 2249,02M VS> VS> range_seg_cache:245,711M VS> VS> zio_buf_512:1145,25M VS> VS> zio_buf_16384: 15170,1M VS> VS> zio_data_buf_16384: 422,766M VS> VS> zio_data_buf_20480: 120,742M VS> VS> zio_data_buf_24576: 148,641M VS> VS> zio_data_buf_28672: 112,848M VS> VS> zio_data_buf_32768: 117,375M VS> VS> zio_data_buf_36864: 185,379M VS> VS> zio_data_buf_45056: 103,168M VS> VS> zio_data_buf_53248: 105,32M VS> VS> zio_data_buf_57344: 122,828M VS> VS> zio_data_buf_65536: 109,25M VS> VS> zio_data_buf_69632: 100,406M VS> VS> zio_data_buf_73728: 126,844M VS> VS> zio_data_buf_77824: 101,086M VS&g
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, I'm aply patch and ofter few minutes of work get new panic screen shot on picture. http://i59.tinypic.com/sfctvc.jpg Andriy Gapon wrote: AG> on 04/02/2014 12:08 Vitalij Satanivskij said the following: AG> > AG> > Dear Andriy and FreeBSD community, AG> > AG> > With patch system panic on boot. AG> > AG> > After remove cache device from pool system boot without problem. AG> > AG> > After this cache added again and sone kernel panic happened AG> > AG> > Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg AG> AG> I think that my previous patch was wrong. AG> I've updated it in place: AG> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch AG> AG> AG> -- AG> Andriy Gapon AG> ___ AG> freebsd-current@freebsd.org mailing list AG> http://lists.freebsd.org/mailman/listinfo/freebsd-current AG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, Andriy Gapon wrote: AG> on 04/02/2014 19:10 Vitalij Satanivskij said the following: AG> > Dear Andriy and FreeBSD community, AG> > AG> > I'm aply patch and ofter few minutes of work get new panic AG> > AG> > screen shot on picture. AG> > AG> > http://i59.tinypic.com/sfctvc.jpg AG> AG> Does this happen too early to get a crashdump? AG> Do you have a chance to attach with remote kgdb? How I reproduce crash - simply attach cache device (zpool add pool cache /dev/gpt/cache0 ) and run ls -R -la /pool I repeat eksperimet and try to get core. About kgdb - server on which we test path is no very critical so I can connect via remove ipmi (acceptibly from local network) and run some comands at any time and of course I try to get kernel core dump ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, Ok. I'm get coredump on panic. What else i need to do? Vitalij Satanivskij wrote: VS> Dear Andriy and FreeBSD community, VS> VS> Andriy Gapon wrote: VS> AG> on 04/02/2014 19:10 Vitalij Satanivskij said the following: VS> AG> > Dear Andriy and FreeBSD community, VS> AG> > VS> AG> > I'm aply patch and ofter few minutes of work get new panic VS> AG> > VS> AG> > screen shot on picture. VS> AG> > VS> AG> > http://i59.tinypic.com/sfctvc.jpg VS> AG> VS> AG> Does this happen too early to get a crashdump? VS> AG> Do you have a chance to attach with remote kgdb? VS> VS> How I reproduce crash - simply attach cache device (zpool add pool cache /dev/gpt/cache0 ) and VS> run ls -R -la /pool VS> VS> I repeat eksperimet and try to get core. VS> VS> About kgdb - server on which we test path is no very critical so I can connect via remove ipmi (acceptibly from local network) VS> and run some comands at any time and of course I try to get kernel core dump VS> VS> VS> VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, For now I begin testing l2 cache without compression (with you path provided in last messages) in production. I will test the new patch on the test server first, and then if all is ok on one of the production servers. Andriy Gapon wrote: AG> on 07/02/2014 11:11 Andriy Gapon said the following: AG> > on 05/02/2014 14:22 Vitalij Satanivskij said the following: AG> >> Dear Andriy and FreeBSD community, AG> >> AG> >> Ok. I'm get coredump on panic. AG> >> AG> >> What else i need to do? AG> > AG> > AG> > Vitalij, Vladimir, AG> > AG> > I have been able to reproduce the leak at work, so now I have full access to all AG> > debugging information that I need. Thank you for your testing and reports. AG> > AG> > I have reported my observations to OpenZFS developers. It looks like the author AG> > of L2ARC compression code is too busy right now to produce a fix. AG> > Unfortunately, I am not very familiar with the L2ARC code, so I can not promise AG> > to produce a patch soon. AG> AG> I've been able to spend some time on this issue. AG> Could you please try the following patch? AG> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch AG> It obsoletes all previous patches from me. AG> AG> -- AG> Andriy Gapon AG> ___ AG> freebsd-current@freebsd.org mailing list AG> http://lists.freebsd.org/mailman/listinfo/freebsd-current AG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Get first result's while testing l2 without compression Memory leak is not seen for now ( system working only 20 hours) but zfs stats saying that l2 degraded output of zfs-stats -L: ZFS Subsystem ReportTue Feb 11 16:34:43 2014 L2 ARC Summary: (DEGRADED) Passed Headroom:3.81m Tried Lock Failures:79.52m IO In Progress: 9 Low Memory Aborts: 235 Free on Write: 54.37k Writes While Full: 9.68k R/W Clashes:2.82k Bad Checksums: 211.94k IO Errors: 0 SPA Mismatch: 58.33m L2 ARC Size: (Adaptive) 243.32 GiB Header Size:0.36% 895.11 MiB L2 ARC Evicts: Lock Retries: 45 Upon Reading: 0 L2 ARC Breakdown: 38.15m Hit Ratio: 17.79% 6.79m Miss Ratio: 82.21% 31.36m Feeds: 88.88k L2 ARC Buffer: Bytes Scanned: 292.58 TiB Buffer Iterations: 88.88k List Iterations:5.63m NULL List Iterations: 17.26k L2 ARC Writes: Writes Sent: (FAULTED) 77.95k Done Ratio: 100.00% 77.95k Error Ratio: 0.00% 0 As you can see we have Bad Checksums: 211.94k and growing and also Writes Sent: (FAULTED) 77.95k Done Ratio: 100.00% 77.95k Another question: Please provide revision number of arc.c against which was diff created (http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch) Because in version in head have some small diferent's and I need manualy aply patch. Thank you. Vitalij Satanivskij wrote: VS> Dear Andriy and FreeBSD community, VS> VS> VS> For now I begin testing l2 cache without compression (with you path provided in last messages) in production. VS> VS> I will test the new patch on the test server first, and then if all is ok on one of the production servers. VS> VS> VS> Andriy Gapon wrote: VS> AG> on 07/02/2014 11:11 Andriy Gapon said the following: VS> AG> > on 05/02/2014 14:22 Vitalij Satanivskij said the following: VS> AG> >> Dear Andriy and FreeBSD community, VS> AG> >> VS> AG> >> Ok. I'm get coredump on panic. VS> AG> >> VS> AG> >> What else i need to do? VS> AG> > VS> AG> > VS> AG> > Vitalij, Vladimir, VS> AG> > VS> AG> > I have been able to reproduce the leak at work, so now I have full access to all VS> AG> > debugging information that I need. Thank you for your testing and reports. VS> AG> > VS> AG> > I have reported my observations to OpenZFS developers. It looks like the author VS> AG> > of L2ARC compression code is too busy right now to produce a fix. VS> AG> > Unfortunately, I am not very familiar with the L2ARC code, so I can not promise VS> AG> > to produce a patch soon. VS> AG> VS> AG> I've been able to spend some time on this issue. VS> AG> Could you please try the following patch? VS> AG> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch VS> AG> It obsoletes all previous patches from me. VS> AG> VS> AG> -- VS> AG> Andriy Gapon VS> AG> ___ VS> AG> freebsd-current@freebsd.org mailing list VS> AG> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> AG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, I'm testing you patch for sometime and looks like everything is ok. At last for 5 day of working any notisible memory leak wos not found. AG> AG> I've been able to spend some time on this issue. AG> Could you please try the following patch? AG> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch AG> It obsoletes all previous patches from me. AG> AG> -- AG> Andriy Gapon AG> ___ AG> freebsd-current@freebsd.org mailing list AG> http://lists.freebsd.org/mailman/listinfo/freebsd-current AG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy and FreeBSD community, No checksume errors or any other errors found for now. Andriy Gapon wrote: AG> on 18/02/2014 15:38 Vitalij Satanivskij said the following: AG> > Dear Andriy and FreeBSD community, AG> > AG> > I'm testing you patch for sometime and looks like everything is ok. AG> > AG> > At last for 5 day of working any notisible memory leak wos not found. AG> AG> Vitalij, AG> AG> thank you very much for testing! AG> What about those checksum errors? Do you see them now? AG> AG> > AG> AG> > AG> I've been able to spend some time on this issue. AG> > AG> Could you please try the following patch? AG> > AG> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch AG> > AG> It obsoletes all previous patches from me. AG> AG> -- AG> Andriy Gapon AG> ___ AG> freebsd-current@freebsd.org mailing list AG> http://lists.freebsd.org/mailman/listinfo/freebsd-current AG> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)
Dear Andriy, system uptime is 8 days, 20:26 Output: kstat.zfs.misc.arcstats.evict_l2_cached: 9771077767680 kstat.zfs.misc.arcstats.evict_l2_eligible: 3844577713152 kstat.zfs.misc.arcstats.evict_l2_ineligible: 8855320643072 kstat.zfs.misc.arcstats.l2_hits: 79824726 kstat.zfs.misc.arcstats.l2_misses: 217864980 kstat.zfs.misc.arcstats.l2_feeds: 760023 kstat.zfs.misc.arcstats.l2_rw_clash: 61903 kstat.zfs.misc.arcstats.l2_read_bytes: 3058416338944 kstat.zfs.misc.arcstats.l2_write_bytes: 2487863166464 kstat.zfs.misc.arcstats.l2_writes_sent: 732146 kstat.zfs.misc.arcstats.l2_writes_done: 732146 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 51888 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 4416 kstat.zfs.misc.arcstats.l2_evict_reading: 1 kstat.zfs.misc.arcstats.l2_free_on_write: 282867 kstat.zfs.misc.arcstats.l2_cdata_free_on_write: 326028 kstat.zfs.misc.arcstats.l2_abort_lowmem: 1348 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 257940027392 kstat.zfs.misc.arcstats.l2_asize: 108789048832 kstat.zfs.misc.arcstats.l2_hdr_size: 881715600 kstat.zfs.misc.arcstats.l2_compress_successes: 60790954 kstat.zfs.misc.arcstats.l2_compress_zeros: 0 kstat.zfs.misc.arcstats.l2_compress_failures: 1738173 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 1168505250 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 29511803 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 1307899433 kstat.zfs.misc.arcstats.l2_write_in_l2: 51108634609 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 637 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 100398037509 kstat.zfs.misc.arcstats.l2_write_full: 97839 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 760023 kstat.zfs.misc.arcstats.l2_write_pios: 732146 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 4642717602824192 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 48013995 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 80483 Andriy Gapon wrote: AG> on 18/02/2014 15:47 Vitalij Satanivskij said the following: AG> > No checksume errors or any other errors found for now. AG> AG> Thank you again! AG> Could you please send me an output of AG> sysctl kstat | fgrep 'l2' AG> from a system that has been patched and with a sufficiently long uptime? AG> AG> -- AG> Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
USB Keyboard not worked on current (r251681)
Hello There is system - CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2400.05-MHz K8-class CPU) real memory = 137438953472 (131072 MB) avail memory = 132517056512 (126378 MB) motherboard - X9DR3-F Keyboard is Logitech, identified as ugen1.2: at usbus1 ukbd0: on usbus1 kbd2 at ukbd0 kbd2: ukbd0, generic (0), config:0x0, flags:0x3d Messages from dmesg (with "-v" in boot.conf ) Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM Jan 1 03:21:31 fmst-test kernel: Root mount waiting for: usbus1 usbus0 Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_NOMEM, ignored) Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_NOMEM, ignored) Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test last message repeated 4 times Jan 1 03:21:31 fmst-test kernel: Root mount waiting for: usbus1 usbus0 Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test last message repeated 8 times and at the end - Jan 1 03:21:31 fmst-test last message repeated 2 times Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM Jan 1 03:21:31 fmst-test kernel: ugen0.2: at usbus0 (disconnected) Jan 1 03:21:31 fmst-test kernel: uhub_reattach_port: could not allocate new device Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM Jan 1 03:21:31 fmst-test kernel: ugen1.2: at usbus1 (disconnected) Jan 1 03:21:31 fmst-test kernel: uhub_reattach_port: could not allocate new device I see thread's about same problem early, but patch alredy in head and no changes. What and how can i debug to see where is problem? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: USB Keyboard not worked on current (r251681)
Yes, that's why I'm asked for help. Update, ipmi virtual keyboard alsow not working :( Sergey V. Dyatko wrote: SVD> On Fri, 14 Jun 2013 11:36:56 +0200 SVD> Hans Petter Selasky wrote: SVD> SVD> > See this thread and solution "Supermicro 6027R-N3RF+head, usb trouble" SVD> > SVD> It was 'fixed' by r251282, but I see r251681 on subject :) SVD> added kib@ to Cc SVD> SVD> > --HPS SVD> SVD> -- SVD> wbr, tiger SVD> ___ SVD> freebsd-current@freebsd.org mailing list SVD> http://lists.freebsd.org/mailman/listinfo/freebsd-current SVD> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: USB Keyboard not worked on current (r251681)
Sergey V. Dyatko wrote: SVD> On Fri, 14 Jun 2013 12:51:44 +0300 SVD> Vitalij Satanivskij wrote: SVD> SVD> > SVD> > Yes, that's why I'm asked for help. SVD> > SVD> > Update, ipmi virtual keyboard alsow not working :( SVD> > SVD> can you try sysctl values from "Supermicro 6027R-N3RF+head, usb SVD> trouble" thread ? SVD> Trying kern.maxbcache="128M" vfs.maxbufspace=134217728 without success ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: USB Keyboard not worked on current (r251681)
In attach file with dmesg log and usb debug enabled. Maybe it's help? Ryan Stone wrote: RS> I am able to reproduce this on a Supermicro X8-something that I have. A RS> git bisect took me down a strange path into a /projects branch. It is RS> possible the branch got a bad merge from -CURRENT at one point; I'm still RS> trying to narrow down where things went wrong (and even whether the branch RS> is actually responsible or whether git got confused by the svn->git export RS> process). RS> ___ RS> freebsd-current@freebsd.org mailing list RS> http://lists.freebsd.org/mailman/listinfo/freebsd-current RS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.0-CURRENT #2 r251990: Wed Jun 19 12:27:14 EEST 2013 r...@fmst-test.ukr.net:/usr/obj/usr/src/sys/ZEBRA amd64 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2400.06-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x206d7 Family = 0x6 Model = 0x2d Stepping = 7 Features=0xbfebfbff Features2=0x1fbee3ff AMD Features=0x2c100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 137438953472 (131072 MB) avail memory = 128714944512 (122752 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 cpu4 (AP): APIC ID: 32 cpu5 (AP): APIC ID: 34 cpu6 (AP): APIC ID: 36 cpu7 (AP): APIC ID: 38 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) acpi0: reservation of 400, 100 (3) failed cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 26 at device 1.0 on pci0 pci1: on pcib1 pcib2: irq 26 at device 1.1 on pci0 pci2: on pcib2 igb0: port 0x8020-0x803f mem 0xdfa2-0xdfa3,0xdfa44000-0xdfa47fff irq 27 at device 0.0 on pci2 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:91:e6:44 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb1: port 0x8000-0x801f mem 0xdfa0-0xdfa1,0xdfa4-0xdfa43fff irq 30 at device 0.1 on pci2 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:91:e6:45 igb1: Bound queue 0 to cpu 0 igb1: Bound queue 1 to cpu 1 igb1: Bound queue 2 to cpu 2 igb1: Bound queue 3 to cpu 3 igb1: Bound queue 4 to cpu 4 igb1: Bound queue 5 to cpu 5 igb1: Bound queue 6 to cpu 6 igb1: Bound queue 7 to cpu 7 pcib3: irq 32 at device 2.0 on pci0 pci4: on pcib3 pcib4: irq 40 at device 3.0 on pci0 pci5: on pcib4 pcib5: irq 40 at device 3.2 on pci0 pci6: on pcib5 pci0: at device 4.0 (no driver attached) pci0: at device 4.1 (no driver attached) pci0: at device 4.2 (no driver attached) pci0: at device 4.3 (no driver attached) pci0: at device 4.4 (no driver attached) pci0: at device 4.5 (no driver attached) pci0: at device 4.6 (no driver attached) pci0: at device 4.7 (no driver attached) pci0: at device 5.0 (no driver attached) pci0: at device 5.2 (no driver attached) pcib6: irq 16 at device 17.0 on pci0 pci7: on pcib6 isci0: port 0x7000-0x70ff mem 0xde47c000-0xde47,0xde00-0xde3f irq 16 at device 0.0 on pci7 pci7: at device 0.3 (no driver attached) pci0: at device 22.0 (no driver attached) pci0: at device 22.1 (no driver attached) ehci0: mem 0xdfb23000-0xdfb233ff irq 16 at device 26.0 on pci0 ehci_init: start usbus0: EHCI version 1.0 ehci_init: sparams=0x22 ehci_init: usbus0: resetting QH(0xff9f2d9ff000) at 0x7e215000: link=0x7e215002 endp=0xa000 addr=0x00 inact=0 endpt=0 eps=2 dtc=0 hrecl=1 mpl=0x0 ctl=0 nrl=0 endphub=0x4000 smask=0x00 cmask=0x00 huba=0x00 port=0 mult=1 curqtd=0x000
Re: USB Keyboard not worked on current (r251681)
Just some update 10.0-CURRENT FreeBSD 10.0-CURRENT #7 r253358: Mon Jul 15 15:03:06 EEST 2013 USB keyboard's still not working (liteon controler on logitech keyboard) Have no idea how to realy locate problem. Maybe some one can help? Vitalij Satanivskij wrote: VS> Hello VS> VS> There is system - VS> VS> CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2400.05-MHz K8-class CPU) VS> real memory = 137438953472 (131072 MB) VS> avail memory = 132517056512 (126378 MB) VS> VS> motherboard - X9DR3-F VS> VS> Keyboard is VS> VS> Logitech, identified as VS> VS> ugen1.2: at usbus1 VS> ukbd0: on usbus1 VS> kbd2 at ukbd0 VS> kbd2: ukbd0, generic (0), config:0x0, flags:0x3d VS> VS> VS> Messages from dmesg (with "-v" in boot.conf ) VS> VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM VS> Jan 1 03:21:31 fmst-test kernel: Root mount waiting for: usbus1 usbus0 VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_NOMEM, ignored) VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_NOMEM, ignored) VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test last message repeated 4 times VS> Jan 1 03:21:31 fmst-test kernel: Root mount waiting for: usbus1 usbus0 VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test last message repeated 8 times VS> VS> and at the end - VS> VS> Jan 1 03:21:31 fmst-test last message repeated 2 times VS> Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM VS> Jan 1 03:21:31 fmst-test kernel: ugen0.2: at usbus0 (disconnected) VS> Jan 1 03:21:31 fmst-test kernel: uhub_reattach_port: could not allocate new device VS> Jan 1 03:21:31 fmst-test kernel: usbd_ctrl_transfer_setup: could not setup default USB transfer VS> Jan 1 03:21:31 fmst-test kernel: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_NOMEM VS> Jan 1 03:21:31 fmst-test kernel: ugen1.2: at usbus1 (disconnected) VS> Jan 1 03:21:31 fmst-test kernel: uhub_reattach_port: could not allocate new device VS> VS> VS> VS> I see thread's about same problem early, but patch alredy in head and no changes. VS> VS> What and how can i debug to see where is problem? VS> VS> VS> ___ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Mysql5.5 MariaDB 5.5 built fail after r253321
Hello. After changes commited in Revision 253321 diff is - http://svnweb.freebsd.org/base/head/lib/msun/src/math.h?r1=253319&r2=253321&sortby=rev Build of databases/mysql55-server and databases/mariadb55-server/ failed. Both with same error. [ 47%] Building CXX object sql/CMakeFiles/sql.dir/item_func.cc.o In file included from /usr/ports/databases/mysql55-server/work/mysql-5.5.32/sql/item_func.cc:27: In file included from /usr/ports/databases/mysql55-server/work/mysql-5.5.32/include/my_global.h:351: /usr/include/include/sys/timeb.h:42:2: warning: "this file includes which is deprecated" [-W#warnings] #warning "this file includes which is deprecated" ^ /usr/ports/databases/mysql55-server/work/mysql-5.5.32/sql/item_func.cc:2344:29: error: controlling expression type 'volatile double' not compatible with any generic association type else if (!dec_negative && my_isinf(value_mul_tmp)) ^~~ /usr/ports/databases/mysql55-server/work/mysql-5.5.32/include/my_global.h:814:21: note: expanded from macro 'my_isinf' #define my_isinf(X) isinf(X) ^~~~ /usr/include/include/math.h:107:18: note: expanded from macro 'isinf' #define isinf(x) __fp_type_select(x, __isinff, __isinf, __isinfl) ^~~~ /usr/include/include/math.h:86:49: note: expanded from macro '__fp_type_select' #define __fp_type_select(x, f, d, ld) _Generic((0,(x)), \ ^ 1 warning and 1 error generated. --- sql/CMakeFiles/sql.dir/item_func.cc.o --- *** [sql/CMakeFiles/sql.dir/item_func.cc.o] Error code 1 make: stopped in /usr/ports/databases/mysql55-server/work/mysql-5.5.32 3 warnings generated. 1 warning generated. 1 warning generated. 1 error make: stopped in /usr/ports/databases/mysql55-server/work/mysql-5.5.32 *** Error code 2 Stop. make: stopped in /usr/ports/databases/mysql55-server/work/mysql-5.5.32 --- all --- *** [all] Error code 1 make: stopped in /usr/ports/databases/mysql55-server/work/mysql-5.5.32 1 error make: stopped in /usr/ports/databases/mysql55-server/work/mysql-5.5.32 After manual rollback, build finished successfuly. Question - is ports must be fixed or maybe system? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Memory leack in Wired?
Hello Some time ago, after update system on several servers I'm notice strange memory behavior For example Mem: 1245M Active, 937M Inact, 4093M Wired, 13M Cache, 1670M Free ARC: 495M Total, 50M MFU, 192M MRU, 17M Anon, 29M Header, 208M Other For zfs configures is vm.kmem_size="3G" vfs.zfs.arc_max="2G" vfs.zfs.arc_min="1G" prefetcher disabled. All machines have zfs only fs. Wired slowly grow until processes begin killed by "out of swap". Stoping all active software, unmount and export main zfs pool's (with dbs or another kind of data) don't free any memory. That's behavior notice on current begin from May. Older system eg r245701 working just fine. Untile 18/07/2013 (last system upgrade) no change in behavior. Memory usage grow very slowly on system with 32gb around two-there weeks. First it ets all free memory, than begin eating zfs arc memory, and than we must reboot machine. First question - hove to detect is it realy problem (vmstat -m don't tell any sumsystem to eat memory) and for what this memory allocated? Second - is anybody have similar problem ? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: devel/gettext build error in jail i386 environment on amd64 host
Hello. Have same problem. Clear enviroment (just new installed system + i386 jail) When building gettext and libiconv find system "uniq" crashing pid 88854 (uniq), uid 0: exited on signal 11 (core dumped) pid 88859 (uniq), uid 0: exited on signal 11 (core dumped) pid 88864 (uniq), uid 0: exited on signal 11 (core dumped) pid 88869 (uniq), uid 0: exited on signal 11 (core dumped) core dump is useless as jail build without debug symbols gdb /usr/bin/uniq uniq-88869.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols found)... Core was generated by `uniq'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0001 in ?? () (gdb) bt #0 0x0001 in ?? () #1 0x281a2434 in _CurrentRuneLocale () from /lib/libc.so.7 #2 0x2819f338 in ?? () from /lib/libc.so.7 #3 0xc7f8 in ?? () #4 0x28153321 in verrc () from /lib/libc.so.7 Previous frame identical to this frame (corrupt stack?) Have You same problem? Ivan Klymenko wrote: IK> В Sat, 24 Aug 2013 13:26:01 +0200 IK> Hans Petter Selasky пишет: IK> IK> > On 08/23/13 23:14, Ivan Klymenko wrote: IK> > > wing error: IK> > > http://privatepaste.com/46f9477022 IK> > IK> > Not sure if this helps: IK> > IK> > https://wiki.freebsd.org/PkgPrimer IK> > IK> > Using portbuilder inside a jail IK> > IK> > When building 9-stable ports in a 9-stable jail under -current you IK> > might want to set the UNAME_r enviroment variable to fake the FreeBSD IK> > version in /path_to_my_jail/root/.cshrc . Some examples: IK> > IK> > setenv UNAME_r 9-STABLE IK> > setenv UNAME_r 8-STABLE IK> > setenv UNAME_r 7-STABLE IK> > IK> > Else some ports won't build properly. IK> IK> Something tells me the intuition that the problem appeared after the IK> addition of iconv in base... IK> ___ IK> freebsd-current@freebsd.org mailing list IK> http://lists.freebsd.org/mailman/listinfo/freebsd-current IK> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
I386 jail on amd64 CURRENT core dump in libc?
Hello. On fresh installed system - 10.0-CURRENT FreeBSD 10.0-CURRENT #3 r255173: Tue Sep 3 13:31:22 EEST 2013 With fresh i386 builded jail. I'm found some bug with core dumped uniq (/usr/bin/uniq) After recompile whole system with debug symbols found some trace gdb /usr/bin/uniq uniq-1676.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Core was generated by `uniq'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0001 in ?? () (gdb) bt #0 0x0001 in ?? () #1 0x281a1e94 in __default_hash () from /lib/libc.so.7 #2 0xcba8 in ?? () #3 0x28153361 in verrc (eval=, fmt=, ap=) at /usr/src/lib/libc/gen/err.c:112 Previous frame identical to this frame (corrupt stack?) (gdb) I's any change to fix problem ? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: I386 jail on amd64 CURRENT core dump in libc?
KB> KB> Your installed libraries do not have proper debugging symbols. KB> Since the issue seems to be in the compat32 layer, you may try to start KB> with taking the ktrace of the failing program and see what syscall failed, KB> if any. For me problem gone after disabling options CAPABILITY_MODE # Capsicum capability mode options CAPABILITIES# Capsicum capabilities in kernel conf I'm found it when roll backing system to previos revisions. On r254268 uniq inside i386 jail say that = "unable to limit rights for " So I decide to check without Capsicum features... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: devel/gettext build error in jail i386 environment on amd64 host
IK> Thank you :) IK> I watch the mailing list. ;) IK> http://docs.freebsd.org/cgi/mid.cgi?20130903172529.GA9 IK> Unfortunately I did not have time to check the problem with uniq... Gettext build failed because of failed uniq, so if u steel have problem u know what to do. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: devel/gettext build error in jail i386 environment on amd64 host
Try to disable options options CAPABILITY_MODE # Capsicum capability mode options CAPABILITIES# Capsicum capabilities in kernel conf, for me it's resolve problem Ivan Klymenko wrote: IK> В Sat, 24 Aug 2013 13:26:01 +0200 IK> Hans Petter Selasky пишет: IK> IK> > On 08/23/13 23:14, Ivan Klymenko wrote: IK> > > wing error: IK> > > http://privatepaste.com/46f9477022 IK> > IK> > Not sure if this helps: IK> > IK> > https://wiki.freebsd.org/PkgPrimer IK> > IK> > Using portbuilder inside a jail IK> > IK> > When building 9-stable ports in a 9-stable jail under -current you IK> > might want to set the UNAME_r enviroment variable to fake the FreeBSD IK> > version in /path_to_my_jail/root/.cshrc . Some examples: IK> > IK> > setenv UNAME_r 9-STABLE IK> > setenv UNAME_r 8-STABLE IK> > setenv UNAME_r 7-STABLE IK> > IK> > Else some ports won't build properly. IK> IK> Something tells me the intuition that the problem appeared after the IK> addition of iconv in base... IK> ___ IK> freebsd-current@freebsd.org mailing list IK> http://lists.freebsd.org/mailman/listinfo/freebsd-current IK> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
Hello. We have a kernel panic when loading current or 11.1 snapshot As while booting from usb steck or from hdd/ssd with installed system Kernel - GENERIC DUMP can be found here http://hell.ukr.net/panic/panic.jpg or even video record from screen http://hell.ukr.net/panic/recorder.webm Hardware is - 2x AMD EPYC 7251 Processor on Supermicro H11DSI mother board. Only way to boot system is - disable HPET in bios and set hw.pci.enable_msix=0 hw.pci.enable_msi=0 We already try different's loader.conf setting like machdep.disable_msix_migration=1 hint.hpet.0.clock=0 hint.hpet.0.per_cpu=0 #hw.pci.enable_msix=0 #hw.pci.enable_msi=0 #dev.igb.1.iflib.disable_msix=1 #dev.igb.0.iflib.disable_msix=1 #machdep.disable_msix_migration = 1 #hw.pci.msix_rewrite_table=1 #hw.pci.honor_msi_blacklist=0 In differents combination with no success. Any suggestion we can try to test? ANy additional information from ower side? Thank you. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
igb0@pci0:1:0:0:class=0x02 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 512(512) FLR RO NS link x4(x4) speed 5.0(5.0) ASPM L1(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 ac1f6b620e0c ecap 000e[150] = ARI 1 ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled 0 VFs configured out of 8 supported First VF RID Offset 0x0180, VF RID Stride 0x0004 VF Device ID 0x1520 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304 ecap 0017[1a0] = TPH Requester 1 ecap 0018[1c0] = LTR 1 ecap 000d[1d0] = ACS 1 It's info from system booted with HPET disabled and hw.pci.enable_msix: 0 hw.pci.enable_msi: 0 If one of this parameters not set as described system not boot ^( Stephen Hurd wrote: SH> Hrm, it should be trying to allocate three msi-x vectors there, and it SH> appears that it's reported that 10 are available. What's the output of SH> ``pciconf -lcv pci1:0:0''? SH> SH> On Mon, Apr 16, 2018 at 1:27 PM, Conrad Meyer wrote: SH> SH> > Hi Vitalij, SH> > SH> > On Mon, Apr 16, 2018 at 3:27 AM, Vitalij Satanivskij SH> > wrote: SH> > > DUMP can be found here http://hell.ukr.net/panic/panic.jpg SH> > > or even video record from screen http://hell.ukr.net/panic/recorder.webm SH> > SH> > Looks like the panic message is printed directly after: "igb0: using 2 SH> > rx queues 2 tx queues" (iflib_msix_init(), called by SH> > iflib_device_register()). SH> > SH> > And stack is indeed coming from iflib in probe (0:17 in linked video): SH> > SH> > panic() SH> > nexus_add_irq() SH> > msix_alloc() SH> > pci_alloc_msix_method() SH> > iflib_device_register() SH> > iflib_device_attach() SH> > device_attach() SH> > ... SH> > SH> > Stephen, Matt, or Sean might be able to help diagnose further. SH> > SH> > Best, SH> > Conrad SH> > SH> SH> SH> SH> -- SH> [image: Limelight Networks] <http://www.limelight.com> SH> Stephen Hurd* Principal Engineer* SH> EXPERIENCE FIRST. SH> +1 616 848 0643 <+1+616+848+0643> SH> www.limelight.com SH> [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> Twitter] <https://twitter.com/llnw> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
Dear Stephen I'm disable msix on igb both 1 and 0 and enable HPET in bios get hpet_attach panic. http://hell.ukr.net/panic/recorder_hpet.webm so i disable hpet again and get msi_alloc and so on http://hell.ukr.net/panic/recorder_msi.webm So for test I'm set hw.pci.enable_msi=0 and get panic in cpp_hw_attach wich autoloaded later wile system run rc scripts panic here - http://hell.ukr.net/panic/recorder_ccp.webm For me it's look like some kind of resource menegment problem? Stephen Hurd wrote: SH> If you disable msix just for igb0, does it crash somewhere else? SH> SH> On Mon, Apr 16, 2018 at 3:13 PM, Stephen Hurd wrote: SH> SH> > Oh, you may need to disable msix to boot... SH> > SH> > dev.igb.0.iflib.disable_msix=1 SH> > SH> > On Mon, Apr 16, 2018 at 3:02 PM, Stephen Hurd wrote: SH> > SH> >> Hrm, it should be trying to allocate three msi-x vectors there, and it SH> >> appears that it's reported that 10 are available. What's the output of SH> >> ``pciconf -lcv pci1:0:0''? SH> >> SH> >> On Mon, Apr 16, 2018 at 1:27 PM, Conrad Meyer wrote: SH> >> SH> >>> Hi Vitalij, SH> >>> SH> >>> On Mon, Apr 16, 2018 at 3:27 AM, Vitalij Satanivskij SH> >>> wrote: SH> >>> > DUMP can be found here http://hell.ukr.net/panic/panic.jpg SH> >>> > or even video record from screen http://hell.ukr.net/panic/reco SH> >>> rder.webm SH> >>> SH> >>> Looks like the panic message is printed directly after: "igb0: using 2 SH> >>> rx queues 2 tx queues" (iflib_msix_init(), called by SH> >>> iflib_device_register()). SH> >>> SH> >>> And stack is indeed coming from iflib in probe (0:17 in linked video): SH> >>> SH> >>> panic() SH> >>> nexus_add_irq() SH> >>> msix_alloc() SH> >>> pci_alloc_msix_method() SH> >>> iflib_device_register() SH> >>> iflib_device_attach() SH> >>> device_attach() SH> >>> ... SH> >>> SH> >>> Stephen, Matt, or Sean might be able to help diagnose further. SH> >>> SH> >>> Best, SH> >>> Conrad SH> >>> SH> >> SH> >> SH> >> SH> >> -- SH> >> [image: Limelight Networks] <http://www.limelight.com> SH> >> Stephen Hurd* Principal Engineer* SH> >> EXPERIENCE FIRST. SH> >> +1 616 848 0643 <+1+616+848+0643> SH> >> www.limelight.com SH> >> [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> >> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> >> Twitter] <https://twitter.com/llnw> SH> >> SH> > SH> > SH> > SH> > -- SH> > [image: Limelight Networks] <http://www.limelight.com> SH> > Stephen Hurd* Principal Engineer* SH> > EXPERIENCE FIRST. SH> > +1 616 848 0643 <+1+616+848+0643> SH> > www.limelight.com SH> > [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> > LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> > Twitter] <https://twitter.com/llnw> SH> > SH> SH> SH> SH> -- SH> [image: Limelight Networks] <http://www.limelight.com> SH> Stephen Hurd* Principal Engineer* SH> EXPERIENCE FIRST. SH> +1 616 848 0643 <+1+616+848+0643> SH> www.limelight.com SH> [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> Twitter] <https://twitter.com/llnw> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
Oh bios. It's already lastest bios for now with agesa 1.0.0.5 in it. It's dated 2/14/2018 So most likely new version will not appear soon Stephen Hurd wrote: SH> Yeah, this looks like some sort of general MSI issue, not igb specific. SH> I'm not familiar with that part of the kernel, but maybe check if there's a SH> BIOS update available? SH> SH> On Mon, Apr 16, 2018 at 3:51 PM, Vitalij Satanivskij wrote: SH> SH> > Dear Stephen SH> > SH> > I'm disable msix on igb both 1 and 0 SH> > and enable HPET in bios SH> > SH> > get hpet_attach panic. http://hell.ukr.net/panic/recorder_hpet.webm SH> > so i disable hpet again and get msi_alloc and so on SH> > http://hell.ukr.net/panic/recorder_msi.webm SH> > SH> > So for test I'm set hw.pci.enable_msi=0 and get panic in cpp_hw_attach SH> > wich autoloaded later wile system run rc scripts SH> > SH> > panic here - http://hell.ukr.net/panic/recorder_ccp.webm SH> > SH> > For me it's look like some kind of resource menegment problem? SH> > SH> > SH> > Stephen Hurd wrote: SH> > SH> If you disable msix just for igb0, does it crash somewhere else? SH> > SH> SH> > SH> On Mon, Apr 16, 2018 at 3:13 PM, Stephen Hurd wrote: SH> > SH> SH> > SH> > Oh, you may need to disable msix to boot... SH> > SH> > SH> > SH> > dev.igb.0.iflib.disable_msix=1 SH> > SH> > SH> > SH> > On Mon, Apr 16, 2018 at 3:02 PM, Stephen Hurd SH> > wrote: SH> > SH> > SH> > SH> >> Hrm, it should be trying to allocate three msi-x vectors there, and SH> > it SH> > SH> >> appears that it's reported that 10 are available. What's the SH> > output of SH> > SH> >> ``pciconf -lcv pci1:0:0''? SH> > SH> >> SH> > SH> >> On Mon, Apr 16, 2018 at 1:27 PM, Conrad Meyer SH> > wrote: SH> > SH> >> SH> > SH> >>> Hi Vitalij, SH> > SH> >>> SH> > SH> >>> On Mon, Apr 16, 2018 at 3:27 AM, Vitalij Satanivskij < SH> > sa...@ukr.net> SH> > SH> >>> wrote: SH> > SH> >>> > DUMP can be found here http://hell.ukr.net/panic/panic.jpg SH> > SH> >>> > or even video record from screen http://hell.ukr.net/panic/reco SH> > SH> >>> rder.webm SH> > SH> >>> SH> > SH> >>> Looks like the panic message is printed directly after: "igb0: SH> > using 2 SH> > SH> >>> rx queues 2 tx queues" (iflib_msix_init(), called by SH> > SH> >>> iflib_device_register()). SH> > SH> >>> SH> > SH> >>> And stack is indeed coming from iflib in probe (0:17 in linked SH> > video): SH> > SH> >>> SH> > SH> >>> panic() SH> > SH> >>> nexus_add_irq() SH> > SH> >>> msix_alloc() SH> > SH> >>> pci_alloc_msix_method() SH> > SH> >>> iflib_device_register() SH> > SH> >>> iflib_device_attach() SH> > SH> >>> device_attach() SH> > SH> >>> ... SH> > SH> >>> SH> > SH> >>> Stephen, Matt, or Sean might be able to help diagnose further. SH> > SH> >>> SH> > SH> >>> Best, SH> > SH> >>> Conrad SH> > SH> >>> SH> > SH> >> SH> > SH> >> SH> > SH> >> SH> > SH> >> -- SH> > SH> >> [image: Limelight Networks] <http://www.limelight.com> SH> > SH> >> Stephen Hurd* Principal Engineer* SH> > SH> >> EXPERIENCE FIRST. SH> > SH> >> +1 616 848 0643 <+1+616+848+0643> SH> > SH> >> www.limelight.com SH> > SH> >> [image: Facebook] <https://www.facebook.com/LimelightNetworks SH> > >[image: SH> > SH> >> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[ SH> > image: SH> > SH> >> Twitter] <https://twitter.com/llnw> SH> > SH> >> SH> > SH> > SH> > SH> > SH> > SH> > SH> > SH> > -- SH> > SH> > [image: Limelight Networks] <http://www.limelight.com> SH> > SH> > Stephen Hurd* Principal Engineer* SH> > SH> > EXPERIENCE FIRST. SH> > SH> > +1 616 848 0643 <+1+616+848+0643> SH> > SH> > www.limelight.com SH> > SH> > [image: Facebook] <https://www.facebook.com/LimelightNetworks SH> > >[image: SH> > SH> > LinkedIn] <http://www.linkedin.com/company/limelight-networks>[ SH> > image: SH> > SH> > Twitter] <https://twitter.com/llnw> SH> > SH> > SH> > SH> SH> > SH> SH> > SH> SH> > SH> -- SH> > SH> [image: Limelight Networks] <http://www.limelight.com> SH> > SH> Stephen Hurd* Principal Engineer* SH> > SH> EXPERIENCE FIRST. SH> > SH> +1 616 848 0643 <+1+616+848+0643> SH> > SH> www.limelight.com SH> > SH> [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> > SH> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> > SH> Twitter] <https://twitter.com/llnw> SH> > SH> SH> SH> SH> -- SH> [image: Limelight Networks] <http://www.limelight.com> SH> Stephen Hurd* Principal Engineer* SH> EXPERIENCE FIRST. SH> +1 616 848 0643 <+1+616+848+0643> SH> www.limelight.com SH> [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: SH> LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: SH> Twitter] <https://twitter.com/llnw> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
Dear John I'm try patch with no success http://hell.ukr.net/panic/recorder_patch165.webm Also I'm enable verbose boot and record boot process (hpet was disabled so crash in another driver atach) http://hell.ukr.net/panic/recorder_patch_verbose.webm root@test:/usr/src # svnlite diff Index: sys/x86/x86/msi.c === --- sys/x86/x86/msi.c (revision 332650) +++ sys/x86/x86/msi.c (working copy) @@ -404,7 +404,7 @@ /* Do we need to create some new sources? */ if (cnt < count) { /* If we would exceed the max, give up. */ - if (i + (count - cnt) > FIRST_MSI_INT + NUM_MSI_INTS) { + if (i + (count - cnt) >= FIRST_MSI_INT + NUM_MSI_INTS) { mtx_unlock(&msi_lock); free(mirqs, M_MSI); return (ENXIO); @@ -645,7 +645,7 @@ /* Do we need to create a new source? */ if (msi == NULL) { /* If we would exceed the max, give up. */ - if (i + 1 > FIRST_MSI_INT + NUM_MSI_INTS) { + if (i + 1 >= FIRST_MSI_INT + NUM_MSI_INTS) { mtx_unlock(&msi_lock); return (ENXIO); } root@test:/usr/src If you need any aditional information please tell me about. JB> > If one of this parameters not set as described system not boot ^( JB> JB> Please try the patch from here https://reviews.freebsd.org/P165 JB> JB> -- JB> John Baldwin JB> ___ JB> freebsd-hack...@freebsd.org mailing list JB> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers JB> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
JB> > If you need any aditional information please tell me about. JB> JB> Can you perhaps turn off the stack trace on boot to not lose the panic messages JB> (remove KDB_TRACE from kernel config) and maybe modify the panic message to JB> include the IRQ number passed to nexus_add_irq? Hm looks like it's always irq with number 256 eg hpet - 256 igb - 256 Chenged made for it was Index: sys/x86/x86/nexus.c === --- sys/x86/x86/nexus.c (revision 332663) +++ sys/x86/x86/nexus.c (working copy) @@ -698,7 +698,7 @@ { if (rman_manage_region(&irq_rman, irq, irq) != 0) - panic("%s: failed", __func__); + panic("%s: failed irq is: %lu", __func__, irq); } ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Current panic on boot on H11DSI motherboard with epyc cpu (nexus_add_irq: failed)
JB> O, this is a different issue. Sorry. As a hack, try changing JB> 'FIRST_MSI_INT' to 512 in sys/amd64/include/intr_machdep.h. The issue JB> is that some systems now include more than 256 interrupt pins on I/O JB> APICs, so IRQ 256 is already reserved for use by one of those JB> interrupt pins. The real fix is that I need to make FIRST_MSI_INT JB> dynamic instead of a constant and just define it as the first free IRQ JB> after the I/O APICs have probed. JB> Yep. That it. But just one note irq585: ccp14:721 @cpu0(domain0): 0 irq586: ccp14:723 @cpu0(domain0): 0 irq587: ccp15:725 @cpu0(domain0): 0 If I understand correctly number of irq's even more then 512, so better to change to real number in system ? Or this is another case ? Any way thank you for help. Now I can use system with msix and msi enabled. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Call for Testing: 12.0-CURRENT amd64 memstick installer boot-testing wanted
Hi Tested both images in both modes on: Supermicro X9SCL-F E3-1230 CPU Work perfectly Glen Barber wrote: GB> Hi, GB> GB> Could folks please help boot-test the most recent 12.0-CURRENT amd64 GB> memstick images on various hardware? Note, this is not a request to GB> install 12.0-CURRENT, only a boot-test with various system knobs GB> tweaked. GB> GB> The most recent images are available at: GB> https://download.freebsd.org/ftp/snapshots/amd64/amd64/ISO-IMAGES/12.0/FreeBSD-12.0-CURRENT-amd64-20180529-r334337-mini-memstick.img GB> https://download.freebsd.org/ftp/snapshots/amd64/amd64/ISO-IMAGES/12.0/FreeBSD-12.0-CURRENT-amd64-20180529-r334337-memstick.img GB> GB> We are interested in testing both UEFI and CSM/BIOS/legacy mode, as we GB> would like to get this included in the upcoming 11.2-RELEASE if the GB> change that had been committed addresses several boot issues reported GB> recently. GB> GB> Please help test, and report back (both successes and failures). GB> GB> Thanks, GB> GB> Glen GB> ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"