Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics
On 9/8/23 15:09, Alexander Motin wrote: Thank you, Martin. I was able to reproduce the issue with your script and found the cause. I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`. It also got copy_file_range() support, but later than `cp`. That is probably why it slipped through testing. This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 . Mark, could you please try the patch? Thank you Alex for the fix! -- Pawel Jakub Dawidek
Re: another crash and going forward with zfs
On 4/18/23 05:14, Mateusz Guzik wrote: On 4/17/23, Pawel Jakub Dawidek wrote: Correct me if I'm wrong, but from my understanding there were zero problems with block cloning when it wasn't in use or now disabled. The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly avoid mess like this and give us more time to sort all the problems out while making it easy for people to try it. If there is no plan to revert the whole import, I don't see what value removing just block cloning will bring if it is now disabled by default and didn't cause any problems when disabled. The feature definitely was not properly stress tested and what not and trying to do it keeps running into panics. Given the complexity of the feature I would expect there are many bug lurking, some of which possibly related to the on disk format. Not having to deal with any of this is can be arranged as described above and is imo the most sensible route given the timeline for 14.0 Block cloning doesn't create, remove or modify any on-disk data until it is in use. Again, if we are not going to revert the whole merge, I see no point in reverting block cloning as until it is enabled, its code is not executed. This allow people who upgraded the pools to do nothing special and it will allow people to test it easily. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/17/23 21:28, José Pérez wrote: Hi Pawel, thank you for your reply and for the fixes. I think there is a 4th issue that needs to be addressed: how do we recover from the worst case scenario which is a machine with a kernel > 2a58b312b62f and ZFS root upgraded with block cloning enabled. In particular, is it safe to turn such a machine on in the first place, and what are the risks involved in doing so? Any potential data loss? Would such a machine be able to fix itself by compiling a kernel, or would compilation fail and might data be corrupted in the process? I have two poudriere builders powered off (I am not alone in this situation) and I need to recover them, ideally minimizing data loss. The builders are also hosting current and used to build kernels and worlds for 13 and current: as of now all my production machines are stuck on the 13 they run, I cannot update binaries nor packages and I would like to be back online. José, I can only speak of block cloning in details, but I'll try to address everything. The easiest way to avoid block_cloning-related corruption on the kernel after the last OpenZFS merge, but before e0bb199925 is to set the compress property to 'off' and the sync property to something other than 'disabled'. This will avoid the block_cloning-related corruption and zil_replaying() panic. As for the other corruption, unfortunately I don't know the details, but my understanding is that it is happening under higher load. Not sure I'd trust a kernel built on a machine with this bug present. What I would do is to compile the kernel as of 068913e4ba somewhere else, boot the problematic machine in single-user mode and install the newly built kernel. As far as I can tell, contrary to some initial reports, none of the problems introduced by the recent OpenZFS merge corrupt the pool metadata, only file's data. You can locate the files modified with the bogus kernel using find(1) with a proper modification time, but you have to decide what to do with them (either throw them away, restore them from backup or inspect them). -- Pawel Jakub Dawidek
Re: another crash and going forward with zfs
On 4/18/23 03:51, Mateusz Guzik wrote: After bugfixes got committed I decided to zpool upgrade and sysctl vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very quickly got a new crash: panic: VERIFY(arc_released(db->db_buf)) failed cpuid = 9 time = 1681755046 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0 vpanic() at vpanic+0x152/frame 0xfe0a90b8e640 spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0 dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0 dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame 0xfe0a90b8e700 dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780 dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0 zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960 zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980 VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90 vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20 vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80 vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0 vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40 dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90 sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00 amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30 --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp = 0x103cdc85dd48, rbp = 0x103cdc85dd80 --- KDB: enter: panic [ thread pid 95000 tid 135035 ] Stopped at kdb_enter+0x32: movq$0,0x9e4153(%rip) The posted 14.0 schedule which plans to branch stable/14 on May 12 and one cannot bet on the feature getting beaten up into production shape by that time. Given whatever non-block_clonning and not even zfs bugs which are likely to come out I think this makes the feature a non-starter for said release. I note: 1. the current problems did not make it into stable branches. 2. there was block_cloning-related data corruption (fixed) and there may be more 3. there was unrelated data corruption (see https://github.com/openzfs/zfs/issues/14753), sorted out by reverting the problematic commit in FreeBSD, not yet sorted out upstream As such people's data may be partially hosed as is. Consequently the proposed plan is as follows: 1. whack the block cloning feature for the time being, but make sure pools which upgraded to it can be mounted read-only 2. run ztest and whatever other stress testing on FreeBSD, along with restoring openzfs CI -- I can do the first part, I'm sure pho will not mind to run some tests of his own 3. recommend people create new pools and restore data from backup. if restoring from backup is not an option, tar or cp (not zfs send) from the read-only mount block cloning beaten into shape would use block_cloning_v2 or whatever else, key point that the current feature name would be considered bogus (not blocking RO import though) to prevent RW usage of the current pools with it enabled. Comments? Correct me if I'm wrong, but from my understanding there were zero problems with block cloning when it wasn't in use or now disabled. The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly avoid mess like this and give us more time to sort all the problems out while making it easy for people to try it. If there is no plan to revert the whole import, I don't see what value removing just block cloning will bring if it is now disabled by default and didn't cause any problems when disabled. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/17/23 18:15, Pawel Jakub Dawidek wrote: There were three issues that I know of after the recent OpenZFS merge: 1. Data corruption unrelated to block cloning, so it can happen even with block cloning disabled or not in use. This was the problematic commit: https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9 It was reverted in 63ee747febbf024be0aace61161241b53245449e. 2. Data corruption with embedded blocks when block cloning is enabled. It can happen when compression is enabled and the block contains between 60 to 112 bytes (this might be hard to determine). Fix exists, it is merged to OpenZFS already, but isn't in FreeBSD yet. OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739 3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered when block cloning is enabled, the sync property is set to disabled and copy_file_range(2) is used. Easy fix exists, it is not yet merged to OpenZFS and not yet in FreeBSD HEAD. OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758 Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur. As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are fixed, as far as I can tell. Block cloning remains disabled for now just to be on the safe side, but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1. Don't relay on this sysctl as it will be removed in 2-3 weeks. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/16/23 01:07, Florian Smeets wrote: On the pool that has block_cloning enabled I see the above insta panic when poudriere starts building. I found a workaround though: --- /usr/local/share/poudriere/include/fs.sh.orig 2023-04-15 18:03:50.090823000 +0200 +++ /usr/local/share/poudriere/include/fs.sh 2023-04-15 18:04:04.144736000 +0200 @@ -295,7 +295,6 @@ fi zfs clone -o mountpoint=${mnt} \ - -o sync=disabled \ -o atime=off \ -o compression=off \ ${fs}@${snap} \ With this workaround I was able to build thousands of packages without panics or failures due to data corruption. Thank you, Florian, that was very helpful! This should fix the problem: https://github.com/openzfs/zfs/pull/14758 -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 09:23, Charlie Li wrote: Pawel Jakub Dawidek wrote: Here is the change that reverts most of the modifications and disables cloning new blocks. It does retain ability to free existing cloned blocks and keeps block_cloning feature around, so upgraded pools can be imported and existing cloned blocks freed. It does not handle replaying ZIL with block-cloning logs, so make sure you import pools that were cleanly exported. I'd appreciate if someone who can reproduce those corruptions could try it. https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103 Does not apply to sys/contrib/openzfs tip, conflicts in module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c. This should work: https://people.freebsd.org/~pjd/patches/brt_revert.patch -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 07:52, Charlie Li wrote: Pawel Jakub Dawidek wrote: thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. Testing with mjg@ earlier today revealed that block_cloning was not the cause of poudriere bulk build (and similar cp(1)/install(1)-based) corruption, although may have exacerbated it. Can you please elaborate how were you testing and what exactly did you exclude? Thanks. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/14/23 07:40, Pawel Jakub Dawidek wrote: On 4/13/23 22:56, Cy Schubert wrote: I'm in the process of building a branch reverting the merge altogether and will test it on my sandbox machine later today. Cy, thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. I'd also greatly appreciate if you could provide a procedure for me to reproduce the corruption, ideally without the internet access, as I'll be on the plane(s) for the next ~24h. Here is the change that reverts most of the modifications and disables cloning new blocks. It does retain ability to free existing cloned blocks and keeps block_cloning feature around, so upgraded pools can be imported and existing cloned blocks freed. It does not handle replaying ZIL with block-cloning logs, so make sure you import pools that were cleanly exported. I'd appreciate if someone who can reproduce those corruptions could try it. https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103 Thank you guys for your help! -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/13/23 22:56, Cy Schubert wrote: I'm in the process of building a branch reverting the merge altogether and will test it on my sandbox machine later today. Cy, thank you for your testing and patience so far. I'm working on a patch to revert block cloning without affecting people who already upgraded their pools. I'd also greatly appreciate if you could provide a procedure for me to reproduce the corruption, ideally without the internet access, as I'll be on the plane(s) for the next ~24h. -- Pawel Jakub Dawidek
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
On 4/13/23 23:05, Shawn Webb wrote: I've learned over the years downstream that it's not really my place to tell upstream what to do or how to do it. However, I think given the seriousness of this, upstream might do well to revert the commit until a solid fix is in place. Upstream might want to consider the impacts this is having not just with downstream projects, but also regular users. Really bad timing to have a lot of new tax documentation that I really don't want to lose. I'd really like to have an up-to-date, security patched OS, but I guess I'll stay behind so that I don't risk losing critical financial documentation. Shawn, I'm working on a patch to safely revert this that would also work for people who already upgraded their pools. I'm sorry for this mess. -- Pawel Jakub Dawidek
Re: dumpdev in loader.conf vs rc.d/dumpon
On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote: > > > On 23.09.2015 19:57, Andriy Gapon wrote: > > > I do not have a strong opinion. Either option, rc.d/dumpon change or > > > geom_dev > > > change, is fine with me. > > > > I added the ability to set dumpdev via loader. But I wasn't aware that > > it was used in rc.d script. > > > > If you have set dumpdev kenv, it will be already enabled in the time > > when rc.d/dumpon will be run. So, I think it is useless to try to > > enable dumpdev again. I prefer remove this old code from rc.d script. > > rc.d script can redirect dump to device, not available at boot time, > iSCSI disk, for examle. No. Dump device is very special. It runs in an environment when kernel already paniced, there are no interrupt, so there is no networking. Storage controllers have special methods to handle dumping kernel memory - it doesn't go through GEOM, it cannot go through GEOM as the scheduler doesn't work too. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpfeiUIcx0t9.pgp Description: PGP signature
Re: dumpdev in loader.conf vs rc.d/dumpon
On Fri, Sep 25, 2015 at 12:11:51AM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 24, 2015 at 10:58:00PM +0200, Pawel Jakub Dawidek wrote: > > > On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote: > > > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote: > > > > > > > On 23.09.2015 19:57, Andriy Gapon wrote: > > > > > I do not have a strong opinion. Either option, rc.d/dumpon change or > > > > > geom_dev > > > > > change, is fine with me. > > > > > > > > I added the ability to set dumpdev via loader. But I wasn't aware that > > > > it was used in rc.d script. > > > > > > > > If you have set dumpdev kenv, it will be already enabled in the time > > > > when rc.d/dumpon will be run. So, I think it is useless to try to > > > > enable dumpdev again. I prefer remove this old code from rc.d script. > > > > > > rc.d script can redirect dump to device, not available at boot time, > > > iSCSI disk, for examle. > > > > No. Dump device is very special. It runs in an environment when kernel > > already paniced, there are no interrupt, so there is no networking. > > Storage controllers have special methods to handle dumping kernel memory > > - it doesn't go through GEOM, it cannot go through GEOM as the scheduler > > doesn't work too. > > Can be ZFS VOL act as dump device? I don't think so. IIRC there was a hack in Illumos to allocate contiguous space for dump in one of the vdevs (then I think it was extended to multiple vdevs). I don't think any ZFS feature has worked for such a ZVOL (no checksumming, no compression, etc.). Others may have more up-to-date info about that. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpSlioURbq3y.pgp Description: PGP signature
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 03:32:35PM -0700, NGie Cooper wrote: On Tue, Aug 25, 2015 at 3:21 PM, Xin Li delp...@delphij.net wrote: On 08/25/15 14:55, Pawel Jakub Dawidek wrote: Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. Yes, I can now reproduce it with source updated to -p2. Yes, that's because freebsd-version.sh is generated from the files (but it's not clear to me whether if it's a bug or a feature that 'make install' checks if it's up-to-date and decides to regenerate it...). It's a quirk for sure. If you change the behavior, people will definitely complain as they will now need to go back and rebuild everything. What we have now is misleading. People should recompile. It is rather rare to see security advisory which bumps only patch level and something that doesn't require recompilation (eg. a shell script). Current behaviour would make people think they are running latest patch level because freebsd-version says so, eventhough they only did 'make installworld' without rebuilding affected binaries. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpRsLnByGkaA.pgp Description: PGP signature
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote: On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Read-only /usr/obj/ no longer kosher?
On Tue, Aug 25, 2015 at 11:53:47PM +0200, Pawel Jakub Dawidek wrote: On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote: On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. Now that I think of it, it might have been that I did buildworld/buildkernel before -p1. Then freebsd-update updated newvers.sh and then I was trying to do installworld. Yes, I can now reproduce it with source updated to -p2. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Read-only /usr/obj/ no longer kosher?
On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote: On 8/23/15 14:55, Pawel Jakub Dawidek wrote: I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 What's the modification times of /usr/obj/usr/bin/freebsd-version/freebsd-version.sh, /usr/src/bin/freebsd-version/freebsd-version.sh and /usr/src/sys/conf/newvers.sh? I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE, I've send it to current@ by mistake. All in all my expectation is that we shouldn't modify obj/ during installworld. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Read-only /usr/obj/ no longer kosher?
I used to build world and kernel on one machine and export both /usr/src/ and /usr/obj read-only to other machines. It doesn't work anymore (this is from 'make installworld'): === bin/freebsd-version (install) eval $(egrep '^(TYPE|REVISION|BRANCH)=' /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ; if ! sed -e s/@@TYPE@@/${TYPE}/g; s/@@REVISION@@/${REVISION}/g; s/@@BRANCH@@/${BRANCH}/g; /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; then rm -f freebsd-version.sh ; exit 1 ; fi cannot create freebsd-version.sh: Permission denied rm: freebsd-version.sh: Read-only file system *** Error code 1 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp0DzHE4AU2t.pgp Description: PGP signature
Re: Memory modified after free, seemingly geli related
On Thu, Aug 06, 2015 at 04:06:40AM +0200, Pawel Jakub Dawidek wrote: On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote: I've encountered a few memory modified after free panics recently, which seem to be from geli. I don't yet have any debugging to completely confirm it's geli, but it has not happened on my other test laptop which configured similarly but without geli. This has a few local patches from my to-commit-to-HEAD queue. FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10 r284409+6a002d9(staging): Tue Jul 7 17:57:01 EDT 2015 panic: Memory modified after free 0xf80009d504d8(248) val=0 @ 0xf80009d50518 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm investigating further and will let you know here once I find the cause. Ok. It was bio from ZFS in my case, but it was GELI which modified bio_error after delivering bio. This patch fixes the race: http://people.freebsd.org/~pjd/patches/geom_eli.patch Using bio after calling crypto_dispatch() is a bug. 'done' callbacks might have already called g_io_deliver() and upper layer might have already freed the bio. I'm not fully convinced that panic is the right response to crypto_dispatch() failure. It means that the driver failed our request and didn't call our callback, which is bad as we never complete the I/O. The crypto drivers tend to return errors only if the request itself is bogus, but that is program's bug and not a runtime condition. In other words panic should be fine here. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpRu2X0EJLDP.pgp Description: PGP signature
Re: Memory modified after free, seemingly geli related
On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote: I've encountered a few memory modified after free panics recently, which seem to be from geli. I don't yet have any debugging to completely confirm it's geli, but it has not happened on my other test laptop which configured similarly but without geli. This has a few local patches from my to-commit-to-HEAD queue. FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10 r284409+6a002d9(staging): Tue Jul 7 17:57:01 EDT 2015 panic: Memory modified after free 0xf80009d504d8(248) val=0 @ 0xf80009d50518 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm investigating further and will let you know here once I find the cause. cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe011414a880 vpanic() at vpanic+0x189/frame 0xfe011414a900 panic() at panic+0x43/frame 0xfe011414a960 trash_ctor() at trash_ctor+0x48/frame 0xfe011414a970 uma_zalloc_arg() at uma_zalloc_arg+0x573/frame 0xfe011414a9e0 g_clone_bio() at g_clone_bio+0x1d/frame 0xfe011414aa00 g_eli_start() at g_eli_start+0xbd/frame 0xfe011414aa30 g_io_schedule_down() at g_io_schedule_down+0xe6/frame 0xfe011414aa60 g_down_procbody() at g_down_procbody+0x7d/frame 0xfe011414aa70 fork_exit() at fork_exit+0x84/frame 0xfe011414aab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe011414aab0 --- trap 0, rip = 0, rsp = 0xfe011414ab70, rbp = 0 --- -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpMR9ZeEaVYN.pgp Description: PGP signature
Re: diskid documentation
On Mon, Jun 02, 2014 at 03:27:06PM -0700, John-Mark Gurney wrote: Pawel Jakub Dawidek wrote this message on Mon, Jun 02, 2014 at 22:26 +0200: The problem is that GPT labels (or GPT IDs for that matter) should not be implemented within GLABEL. This is wrong. It should be implemented as part of GPART, so that GPART would create ada0p1, gpt/label and gptid/whatever. Opening one of those should not make the others disappear then. Only opening ada0 for writting would make them disappear. even gpart would be wrong IMO... What happens if there is another provider like GPART, but different, do they need to implement diskid creation too to prevent the same issue? Shouldn't geom be updated to say, this ident is an alias, everything you do w/ this, it's exactly the same as the other one? This would also have the advantage of possibly removing one layer in the call chain when dealing w/ IO. (or does GEOM has a pass-through flag that says, I don't do anything, just skip me?) As for disk IDs it definitely shouldn't be implemented in GPART or GLABEL. IMHO the right place is the DISK class - both ada0 and diskid-of-ada0 should exist on the same rights (two providers of one geom). This also would address your concern about additional layer. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpUBPO31HwbW.pgp Description: PGP signature
Re: diskid documentation
On Mon, Jun 02, 2014 at 11:01:08AM -0700, John-Mark Gurney wrote: Michael W. Lucas wrote this message on Mon, Jun 02, 2014 at 11:36 -0400: On Mon, Jun 02, 2014 at 10:45:52AM -0400, Ryan Stone wrote: On Mon, Jun 2, 2014 at 9:26 AM, Allan Jude allanj...@freebsd.org wrote: It also tends to sometimes hide the gpt label provider on me (not sure in which cases it does this, but it is annoying) This happens when something (e.g. zfs) happens to open the diskid provider instead of the gpt label. For me this ended up being a bit more than annoying; my swap was mounted in /etc/fstab via a gpt label so I silently lost my swap when I did an upgrade. Wait-- one type of one label can hide another? I thought a big point of labels was to remove ambiguity... Surprisingly, yes... I didn't think about this, but it's true... A disk will get exported via two different devices, diskid and normal da/ada... The tasting will go through and create all the necessary sub devices, but the problem is that we now have two different paths, and if something opens the diskid path, then the da/ada paths all disappear... This sounds like we need to fix geom to bind the two together so that when one opens, the other doesn't disappear... The problem is that geom views them as two separate disks when in fact they are the same... someone who knows geom well should think about how to solve this problem, as diskid isn't the first time this has happened, just most prevalent w/ ZFS and diskid. The problem is that GPT labels (or GPT IDs for that matter) should not be implemented within GLABEL. This is wrong. It should be implemented as part of GPART, so that GPART would create ada0p1, gpt/label and gptid/whatever. Opening one of those should not make the others disappear then. Only opening ada0 for writting would make them disappear. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp8HNxiuiLpJ.pgp Description: PGP signature
Re: [PATCH RFC] Disable save-entropy in jails
On Tue, Dec 24, 2013 at 12:44:34PM -0800, Xin Li wrote: Hi, I think we shouldn't save entropy inside jails, as the data is not going to be used by rc script (pjd@126744). If there is no objections, I will commit this changeset on January 1, 2014. I agree we shouldn't do it. I have this line in my crontab that I wanted to commit at some point: 1,310-5 * * * root[ `sysctl -n security.jail.jailed` -eq 0 ] adjkerntz -a It prevents executing adjkerntz from within a jail, but allows to keep the same crontab in and outside jails. We could do the same for save-entropy. It would be even nicer to have some flag so that even sysctl(8) is not executed. Index: libexec/save-entropy/save-entropy.sh === --- libexec/save-entropy/save-entropy.sh (revision 259828) +++ libexec/save-entropy/save-entropy.sh (working copy) @@ -42,6 +42,10 @@ elif [ -r /etc/rc.conf ]; then . /etc/rc.conf 2/dev/null fi +if [ `/sbin/sysctl -n security.jail.jailed` -eq 1 ]; then + exit 0 +fi + case ${entropy_dir} in [Nn][Oo]) exit 0 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpiq6LgOgUJX.pgp Description: PGP signature
Re: dhclient can't limit bpf descriptor?
On Sat, Dec 14, 2013 at 12:12:23PM -0800, Tim Kientzle wrote: Opened up an old VM from a month or so ago (r257910) and dhclient won’t start. Specifically, dhclient complains (when run by root): “can’t limit bpf descriptor: Bad address” and then immediately exits. What does this mean? I don’t know anything about the capabilities framework and certainly haven’t configured it in any way. Maybe your userland and kernel are out of sync? There was backward-incompatible change in Capsicum that could result in error like that (capability rights are now passed as a pointer to a structure and not uint64_t bitfield). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpOTGPPh0ZvY.pgp Description: PGP signature
Re: r253070 and disappearing zpool
On Wed, Jul 24, 2013 at 02:47:11PM +0300, Andriy Gapon wrote: on 22/07/2013 23:38 Pawel Jakub Dawidek said the following: The /boot/ has to be unencrypted and can be stored on eg. USB pendrive which is never left unattended, unlike laptop which can be left in eg. a hotel room, but with entire disk encrypted. As we discussed elsewhere, there are many options of configuring full disk encryption. Including decisions whether root filesystem should be separate from boot filesystem, choice of filesystem type for boot fs, ways of tying various pieces together, and many more. I do not believe that my change is incompatible with full disk encryption in general. Maybe you can imagine many ways of configuring it, but definiately the most typical one is to have separate /boot/ from /, where /boot/ is unencrypted and where you use one file system type for both (UFS or ZFS). Let's also recall that the system was not created / configured by any of the existing official or semi-official tools and thus it does not represent any recommended way of setting up such systems. Glen configured it this way, but it doesn't mean that that is the way. Note that there are no official tools to install FreeBSD on ZFS. Is that enough reason to stop supporting it? What Glen did is the recommended way of setting up full disk encryption with ZFS. I'd do it the same way and I'd recommend this configuration to anyone who will (or did) ask me. I think that there are many of ways of changing configuration of that system to make behave as before again. Three I mentioned already. Another is to add rc script to import the boot pool, given that it is a special, designated pool. Yet another is to place zpool.cache onto the root pool and use nullfs (instead of a symlink) to make /boot be from the boot pool but /boot/zfs be from the root pool. Come on... BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's fine by me, although the migration might be tricky. Yes, that's migration that's scary to me too. Now, about the postponed points. I will reproduce a section from my email that you've snipped. P.S. ZFS/FreeBSD boot process is extremely flexible. For example zfsboot can take zfsloader from pool1/fsA, zfsloader can boot kernel from pool2/fsB and kernel can mount / from pool3/fsC. Of these 3 filesystems from where should zpool.cache be taken? My firm opinion is that it should be taken from / (pool3/fsC in the example above). Because it is the root filesystem that defines what a system is going to do ultimately: what daemons are started, with what configurations, etc. And thus it should also determine what pools to auto-import. We can say that zpool.cache is analogous to /etc/fstab in this respect. So do you or do you not agree with my reasoning about from where zpool.cache should be taken? If you do not, then please explain why. If you do, then please explain how this would be compatible with the old way of loading zpool.cache. I don't have a strong opinion about this. As I said above I'm fine with moving zpool.cache to /etc/zfs/ if we can ensure it won't break existing installations. Still I'm not sure this was your initial goal, because you weren't aware of systems with separate boot pool until recently (if you were aware of this I hope you wouldn't commit the change without prior discussion). Which means in your eyes zpool.cache was always part of the root pool, because /boot/ was. I think that ensuring that zpool.cache is always loaded from a root filesystem is the gain from my change. Were people complaining about zpool.cache being loaded from /boot/zfs/ and not from /etc/zfs/? I don't think so. But people do complain about boot pool not being autoimported. In my opinion for the end user it doesn't really matter if it is /etc/zfs/zpool.cache or /boot/zfs/zpool.cache, as both directories are available once the system is booted. For most people those two directories are placed on the same file system. For some people who actually care if this is /etc/zfs/ or /boot/zfs/, because those are separate file systems the latter works, the former doesn't. In my opinion the gain, if any, is only theoretical. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpF_b3WkFXBB.pgp Description: PGP signature
Re: r253070 and disappearing zpool
On Mon, Jul 22, 2013 at 10:29:40AM +0300, Andriy Gapon wrote: I think that this setup (on ZFS level) is quite untypical, although not impossible on FreeBSD (and perhaps only FreeBSD). It's untypical because you have separate boot pool (where loader, loader.conf and kernel are taken from) and root pool (where / is mounted from). As I said elsewhere, it is pretty typical when full disk encryption is used. The /boot/ has to be unencrypted and can be stored on eg. USB pendrive which is never left unattended, unlike laptop which can be left in eg. a hotel room, but with entire disk encrypted. So, I see three ways of resolving the problem that my changes caused for your configuration. 1. [the easiest] Put zpool.cache loading instructions that used to be in defaults/loader.conf into your loader.conf. This way everything should work as before -- zpool.cache would be loaded from your boot pool. 2. Somehow (I don't want to go into any technical details here) arrange that your root pool has /boot/zfs/zpool.cache that describes your boot pool. This is probably hard given that your /boot is a symlink at the moment. This probably would be easier to achieve if zpool.cache lived in /etc/zfs. 3. [my favorite] Remove an artificial difference between your boot and root pools, so that they are a single root+boot pool (as zfs gods intended). As far as I understand your setup, you use GELI to protect some sensitive data. Apparently your kernel is not sensitive data, so I wonder if your /bin/sh or /sbin/init are really sensitive either. So perhaps you can arrange your unencrypted pool to hold all of the base system (boot + root) and put all your truly sensitive filesystems (like e.g. /home or /var/data or /opt/xyz) onto your encrypted pool. If all you care about is laptop being stolen, then that would work. If you however want to be protected from someone replacing your /sbin/init with something evil then you use encryption or even better integrity verification also supported by GELI. Remember, tools not policies. There is also option number 4 - backing out your commit. When I saw your commit removing those entries from defaults/loader.conf, I thought it is fine, as we now don't require zpool.cache to import the root pool, which was, BTW, very nice and handy improvement. Now that we know it breaks existing installations I'd prefer the commit to be backed out. This is because apart from breaking some existing installations it doesn't gain us anything. So I understand that my change causes a problem for a setup like yours, but I believe that the change is correct. The change is clearly incorrect or incomplete as it breaks existing installations and doesn't allow for full disk encryption configuration on ZFS-only systems. BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's fine by me, although the migration might be tricky. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpG8GeaQjVQd.pgp Description: PGP signature
Re: HEADSUP! dhclient(8) sandboxing.
On Wed, Jul 03, 2013 at 11:04:21PM -0700, Alfred Perlstein wrote: On 7/3/13 3:52 PM, Pawel Jakub Dawidek wrote: Hi. I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. The work was sponsored by the FreeBSD Foundation. It broke running dhclient on igb0 for me. It says interface not found or something to that effect. Can I help somehow? Basically just ifconfig down igb0 then try to run dhclient. It will not work. If you up the interface and then run it, it is OK. See attached image. Thanks for the report. Could you try this patch? http://people.freebsd.org/~pjd/patches/dhclient.c.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgp0Gs_Z1C8ZM.pgp Description: PGP signature
Re: HEADSUP! dhclient(8) sandboxing.
On Thu, Jul 04, 2013 at 04:55:14PM +0400, Andrey Chernov wrote: On 04.07.2013 2:52, Pawel Jakub Dawidek wrote: I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. I don't test one your very recent commit yet, but whole previous commits chain case dhclient broken: Starting dhclient. em0: no link .. got link em0: not found exiting. /etc/rc.d/dhclient: WARNING: failed to start dhclient and a bit later in rc Starting dhclient. em0: not found exiting. /etc/rc.d/dhclient: WARNING: failed to start dhclient It should be fixed in r252697. Could you give it a try? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpvt0u0Xs1b4.pgp Description: PGP signature
HEADSUP! dhclient(8) sandboxing.
Hi. I've just committed Capsicum sandboxing for the dhclient(8). Let me know (ideally by sending e-mail to current@ and CCing me) if you notice any weird behaviour. The work was sponsored by the FreeBSD Foundation. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpG83WhmJLcx.pgp Description: PGP signature
Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168
On Sun, Jun 30, 2013 at 01:18:36PM +0200, Mateusz Guzik wrote: On Sun, Jun 30, 2013 at 05:21:42PM +1000, Kubilay Kocak wrote: I'm seeing what I believe is related panic, reliably being generated by the Python regression test suite on a newly created FreeBSD 10-CURRENT buildbot. Symptoms first seen in an freebsd.org FTP snapshot dated Thu May 30 20:01:46 UTC 2013 and also reproducible on a freshly updated r252400 It is additionally reproducible after checking out pure upstream python sources, using the following steps: hg clone http://hg.python.org/cpython cd cpython configure make buildbottest An interesting possible correlation is that it seems to drop out during/around test_socket Turns out the bug is quite funny ;) Try this: diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c index 5d8e814..7a4db04 100644 --- a/sys/kern/uipc_usrreq.c +++ b/sys/kern/uipc_usrreq.c @@ -1764,8 +1764,8 @@ unp_externalize(struct mbuf *control, struct mbuf **controlp, int flags) } for (i = 0; i newfds; i++, fdp++) { fde = fdesc-fd_ofiles[*fdp]; - fde-fde_file = fdep[0]-fde_file; - filecaps_move(fdep[0]-fde_caps, + fde-fde_file = fdep[i]-fde_file; + filecaps_move(fdep[i]-fde_caps, fde-fde_caps); if ((flags MSG_CMSG_CLOEXEC) != 0) fde-fde_flags |= UF_EXCLOSE; Thanks for tracking it down before I had time to get to it! The change looks good. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpHVKRcu5rHH.pgp Description: PGP signature
Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168
, v_mount = 0x0, v_nmntvnodes = { tqe_next = 0xfe014fd95760, tqe_prev = 0xfe011d500958}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = { lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe01967007b0}, v_cache_dd = 0x0, v_lock = {lock_object = {lo_name = 0x80dddbb1 zfs, lo_flags = 91881472, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = { lock_object = {lo_name = 0x807bfbb9 vnode interlock, lo_flags = 16908288, lo_data = 0, lo_witness = 0x0}, mtx_lock = 6}, v_vnlock = 0xfe01967007c8, v_actfreelist = { tqe_next = 0xfe0031985b10, tqe_prev = 0xfe014fd95820}, v_bufobj = {bo_mtx = {lock_object = { lo_name = 0x807bfbc9 bufobj interlock, lo_flags = 16908288, lo_data = 0, lo_witness = 0x0}, mtx_lock = 6}, bo_ops = 0x80a5af10, bo_object = 0x0, bo_synclist = { le_next = 0x0, le_prev = 0x0}, bo_private = 0xfe0196700760, __bo_vnode = 0xfe0196700760, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe0196700880}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe01967008a0}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfe01967008e8}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 0, v_iflag = 128, v_vflag = 4, v_writecount = 0, v_hash = 26636295, v_type = VBAD} # kgdb -n 0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: negative refcount 0xfe0059a400c8 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff823aff8770 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff823aff8820 vpanic() at vpanic+0x127/frame 0xff823aff8860 kassert_panic() at kassert_panic+0x136/frame 0xff823aff88d0 closef() at closef+0x1ff/frame 0xff823aff8960 closefp() at closefp+0xa0/frame 0xff823aff89b0 amd64_syscall() at amd64_syscall+0x1f9/frame 0xff823aff8ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff823aff8ab0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80aeaaa8a, rsp = 0x7fffbd28, rbp = 0x7fffbd40 --- Uptime: 21m3s [...] (kgdb) bt #0 doadump (textdump=1) at pcpu.h:231 #1 0x804f5827 in kern_reboot (howto=260) at /freebsd-src/local/sys/kern/kern_shutdown.c:447 #2 0x804f5d36 in vpanic (fmt=value optimized out, ap=value optimized out) at /freebsd-src/local/sys/kern/kern_shutdown.c:754 #3 0x804f5bc6 in kassert_panic (fmt=value optimized out) at /freebsd-src/local/sys/kern/kern_shutdown.c:642 #4 0x804b900f in closef (fp=value optimized out, td=value optimized out) at refcount.h:66 #5 0x804b7030 in closefp (fdp=0xfe018dc79800, fd=value optimized out, fp=0xfe0059a400a0, td=0xfe016dfca920, holdleaders=value optimized out) at /freebsd-src/local/sys/kern/kern_descrip.c:1136 #6 0x806e26c9 in amd64_syscall (td=0xfe016dfca920, traced=0) at subr_syscall.c:134 #7 0x806cb13b in Xfast_syscall () at exception.S:387 #8 0x00080aeaaa8a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) Thanks, Shawn Webb ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgprAfmBcAPgt.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13.03.13 23:08, Pawel Jakub Dawidek wrote: I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? http://erdgeist.org/arts/software/Code/pidfile.c.diff Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true on FreeBSD, but is not portable. Also in case pidptr is NULL you compare errno three times instead of just one (not a big deal of course, just something that could be done a bit more optimal:)). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpdpnLR5mSxt.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 09:42:40AM -0400, John Baldwin wrote: On Thursday, March 14, 2013 4:44:20 am Pawel Jakub Dawidek wrote: On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13.03.13 23:08, Pawel Jakub Dawidek wrote: I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? http://erdgeist.org/arts/software/Code/pidfile.c.diff Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true on FreeBSD, but is not portable. Also in case pidptr is NULL you compare errno three times instead of just one (not a big deal of course, just something that could be done a bit more optimal:)). Geeze, why not just add an else. That's the really short diff: Heh, I did consider that as well, but here you check errno twice, instead of once. Guys, is there anything wrong with the patch I proposed? Index: pidfile.c === --- pidfile.c (revision 248162) +++ pidfile.c (working copy) @@ -140,7 +140,8 @@ pidfile_open(const char *path, mode_t mode, pid_t *pidptr = -1; if (errno == 0 || errno == EAGAIN) errno = EEXIST; - } + } else if (errno == EWOULDBLOCK) + errno = EEXIST; free(pfh); return (NULL); } -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpXFp9L0bjdx.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Thu, Mar 14, 2013 at 10:11:07AM -0700, Chuck Swiger wrote: Hi-- On Mar 14, 2013, at 9:50 AM, John Baldwin wrote: On Thursday, March 14, 2013 12:29:58 pm Pawel Jakub Dawidek wrote: [ ... ] Heh, I did consider that as well, but here you check errno twice, instead of once. Guys, is there anything wrong with the patch I proposed? I'm sure the compiler can work that out just fine and it should do whatever is most readable to the programmer. I don't care either way. Strong +1. Having the code be correct and readable is much more important then trying to hand-optimize a single-digit # of integer compares in startup code that usually runs ~once per process. Well, I think my version is more obvious, just the diff is larger. Anyway, I think enough has been said already about this crucial change:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpeAnHpmUc3i.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Wed, Mar 13, 2013 at 11:18:36AM -0400, John Baldwin wrote: On Tuesday, March 12, 2013 4:16:32 pm Dirk Engling wrote: While debugging my own daemon I noticed that pidfile_open does not perform the appropriate checks for a running daemon if the caller does not provide a pidptr to pidfile_open fd = flopen(pfh-pf_path, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode); fails when another daemon holds the lock and flopen sets errno to EAGAIN, the check 4 lines below in if (errno == EWOULDBLOCK pidptr != NULL) { means that the pidfile_read is never executed. This results in my second daemon receiving an EAGAIN which clearly was meant to report a race condition between two daemons starting at the same time and the first one not yet finishing pidfile_write. The expected behavior would be to set errno to EEXIST, even if no pidptr was passed. Yes, I think it should actually perform the same logic even if pidptr is NULL of waiting for the other daemon to finish starting up. Something like this: Index: lib/libutil/pidfile.c === --- pidfile.c (revision 248162) +++ pidfile.c (working copy) @@ -100,6 +100,7 @@ pidfile_open(const char *path, mode_t mode, pid_t struct stat sb; int error, fd, len, count; struct timespec rqtp; + pid_t dummy; pfh = malloc(sizeof(*pfh)); if (pfh == NULL) @@ -126,7 +127,9 @@ pidfile_open(const char *path, mode_t mode, pid_t fd = flopen(pfh-pf_path, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode); if (fd == -1) { - if (errno == EWOULDBLOCK pidptr != NULL) { + if (errno == EWOULDBLOCK) { + if (pidptr == NULL) + pidptr = dummy; count = 20; rqtp.tv_sec = 0; rqtp.tv_nsec = 500; I agree EEXIST should be returned, but I don't like reading existing pidfile (including waiting for the other process to write its PID) just to throw read PID away. How about this patch? http://people.freebsd.org/~pjd/patches/pidfile.c.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp85B_RZoSMW.pgp Description: PGP signature
Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked
On Wed, Mar 13, 2013 at 10:59:17PM +0100, Dirk Engling wrote: On Wed, 13 Mar 2013, Pawel Jakub Dawidek wrote: How about this patch? http://people.freebsd.org/~pjd/patches/pidfile.c.patch If you move the lines + if (errno == 0 || errno == EAGAIN) + errno = EEXIST; out of the else branch, you can get rid of the if branch, guard the else branch by a + if (pidptr) { and let the if (errno == 0 || errno == EAGAIN) fix the errno I think I considered something similar at first, but the change I proposed was optimal, IMHO at the cost of producing pretty large diff, because of indentation change. But to be sure, can you send a patch of your proposed change? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp1futIji1g8.pgp Description: PGP signature
Re: r247839: broken pipe - for top, sudo and ports
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote: On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote: On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems (3) the same symptoms)), many services drop a sporadic broken pipe This happesn to system's top (I have to type it several times to get finally a top), it happens to sudo su -, it happens to SSH (drops connection with broken pipe) and as I reported earlier, it seems to affect the entire port system, since I can not build any port, I receive *** [do-extract] Signal 13 This is dramatic for me, because several modules (rtc, linux_adobe ...) can not be recompiled as it is required by the last /usr/src/UPDATING entry 20130304. Since dbus fails to start and even the nVidia driver (which is a kernel module, it canot be built and therefore ... ). Dimitry, I put you into CC, just in case. It seems that the last commits (not only the new DRM2 mess) broke something. I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\ Have you tried backing up to just before all of pjd@'s file descriptor and capsicum commits? It broke some other stuff initially related to fd passing, so I don't think it is beyond imagination that it broke something with UNIX domain sockets in general. Is there a consensus already if this is result of my changes or davide's r247804? I just upgraded my laptop to today's HEAD and I don't see any weird behaviour yet. If someone can provide a way to reproduce the problem, I'd be happy to investigate. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp7F62niw8U0.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote: Pawel Jakub Dawidek p...@freebsd.org writes: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. tmux started to behave weirdly, sometimes failing to attach: $ printenv PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin OLDPWD=/ DISPLAY=:0 PWD=/home/foo TERM=xterm USER=foo HOME=/home/foo SHELL=/bin/sh $ ktrace -i tmux -L test -f /dev/null $ echo $? 1 $ kdump -r | pastebinit -a 'tmux fails to attach' http://pastebin.com/U3nCPrFY $ env -i TERM=$TERM ktrace -i /usr/local/bin/tmux -L test -f /dev/null $ ^D [exited] $ kdump -r | pastebinit -a 'tmux fails to attach (workaround)' http://pastebin.com/w1dsUAU4 I've tried so far: * booting allbsd.org snapshot - no joy * enabling capsicum options - no joy * reverting recent capsicum commits - works fine Yes, it was already reported to me and I'm investigating the problem. Thanks. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpg0QzEgXbEG.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote: Pawel Jakub Dawidek p...@freebsd.org writes: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. tmux started to behave weirdly, sometimes failing to attach: I committed a work-around in r247740, but the root of the problem is yet to be found. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp7tAe7RGpg5.pgp Description: PGP signature
Re: kernel build failure
On Sun, Mar 03, 2013 at 06:47:00PM -0500, Michael Butler wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 SVN r247736 prompts this .. cc -c -O2 -pipe -fno-strict-aliasing -march=pentium4 -std=c99 -Wall - -Wredundant-decls -Wnested-externs -Wstrict-prototypes - -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef - -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs - -fdiagnostics-show-option -Wno-error-tautological-compare - -Wno-error-empty-body -Wno-error-parentheses-equality -nostdinc -I. - -I/usr/src/sys -I/usr/src/sys/contrib/altq -D_KERNEL - -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -mno-aes -mno-avx - -mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror /usr/src/sys/kern/uipc_usrreq.c /usr/src/sys/kern/uipc_usrreq.c:1689:18: error: use of undeclared identifier 'fdep'; did you mean 'fde'? filecaps_free(fdep-fde_caps); ^~~~ fde /usr/src/sys/kern/uipc_usrreq.c:1682:36: note: 'fde' declared here unp_freerights(struct filedescent *fde, int fdcount) ^ 1 error generated. This was because I divided larger change into smaller changes. r247738 should be fine. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp3VnbiqBD_t.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Fri, Mar 01, 2013 at 10:24:01PM -0500, Michael Butler wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/01/13 21:05, Pawel Jakub Dawidek wrote: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. Two things I noted with a kernel @ SVN r247608: On reboot .. newsyslog: can't mv /var/log/debug.log.zGwQSBb to /var/log/debug.log: Capabilities insufficient .. and X/intel/KMS fails to start reverting to SVN r247497 (my previous compilation) performs as expected, I've fixed the rename problem in r247616. Not sure if this will fix X. Could you give it a try? Thank you for the report! -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpNcW5DpTt_W.pgp Description: PGP signature
Re: HEADS UP: Capsicum overhaul.
On Fri, Mar 01, 2013 at 09:45:02PM -0600, Larry Rosenman wrote: On Sat, 2 Mar 2013, Pawel Jakub Dawidek wrote: I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. Similar to another post: vn up Updating '.': Udatabases/py-sqlite3/Makefile Udatabases/py-sqlite3/files/setup.py Udatabases/py-sqlite3/files/setup3.py svn: E93: Can't move '/usr/ports/.svn/tmp/svn-X6U5KQ' to '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient # svn up svn: E155037: Previous operation has not finished; run 'cleanup' if it was interrupted # svn cleanup svn: E93: Can't move '/usr/ports/.svn/tmp/svn-Bb1iSM' to '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient This should be now fixed in r247616. Thank you for the report! -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpRi27mevghK.pgp Description: PGP signature
HEADS UP: Capsicum overhaul.
I just committed pretty large change that affects not only Capsicum, but also descriptor handling code in the kernel. If you will find some strange problems after r243611 (like panics, unexpected application errors, etc.) I may be at fault. I'll be looking at current@ mailing list closly, so report here if you find problems that look related to my change. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpD8eAHegq2k.pgp Description: PGP signature
Re: geli(8) breaks after a couple hours of uptime
On Sun, Feb 10, 2013 at 09:50:58AM +0200, Andriy Gapon wrote: on 10/02/2013 01:35 Pawel Jakub Dawidek said the following: geli(8) almost exclusively deals with sensitive data. Even mlocking MAXPHYS would fail with current limits, but this is bad idea. With mlockall() I am sure I didn't miss anything - be it forgetting about mlocking some buffer or zeroing it before munlock. I'm also sure someone else who can modify geli(8) in the future won't miss anything too. Well, the geli is not such a complex program really. It seems to use only two or so buffers for sensitive data. [...] Maybe it isn't very complex, but complex enough that you missed a dozen or so buffers that would need mlocking (almost everything that is bzero'ed), not to mention internal states for hash and encryption algorithms that operate on blocks, so they can keep plain data until their update method gather entire block. Encryption and HMAC calculation is done by API used by both userland and kernel parts, so it would need some ifdefs to make it work, thus further complicating entire thing. [...] As far as I can see geli deals only with some key management (reading keys, generating key from key material, etc). There is definitely no need to mlock the code, etc. I fully agree there is no need to mlock the code and I'd be happy to use mlockall(2) flag that protects only the data. Until such flag is introduced I'll keep mlocking everything. I think that PAGE_SIZE (or at most a small multiple of it) should be sufficient. I don't think that we currently have (or expect to see in the near future) algorithms where keys with more than 4096 size provide any additional security. geli(8) deals just fine with files that are larger than buffers, so even with smaller buffer it can read the data in few steps. The proposed patch is here if someone would like to give it a try: http://people.freebsd.org/~pjd/patches/geom_eli.c.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpm_Gk3PBtuD.pgp Description: PGP signature
Re: geli(8) breaks after a couple hours of uptime
On Sat, Feb 09, 2013 at 06:00:53PM +0200, Andriy Gapon wrote: on 09/02/2013 17:04 Andrey Zonov said the following: On 2/9/13 5:07 PM, Fabian Keil wrote: This would at least prevent the segfault. I see two possibilities to get segfault: - no checking for result from memory allocation functions - too big stack I have no found any broken memory allocation checking, but I found two big objects on the stack. One is buf[MAXPHYS] in eli_genkey_files() and another is passbuf[MAXPHYS] in eli_genkey_passphrase(). If we change these two to malloc(), then we can handle error from malloc(), print some useful message and prevent segfault. I'd rather do what Kostik suggested and Fabian mentioned: instead of mlockall(MCL_FUTURE) the code should mlock only the (explicitly designated) buffers that can contain sensitive data. geli(8) almost exclusively deals with sensitive data. Even mlocking MAXPHYS would fail with current limits, but this is bad idea. With mlockall() I am sure I didn't miss anything - be it forgetting about mlocking some buffer or zeroing it before munlock. I'm also sure someone else who can modify geli(8) in the future won't miss anything too. geli(8) is relatively simple program, it doesn't allocate megabytes of memory for different pruposes and just needs few pages for sensitive data. As I said most of the memory it uses is for sensitive data. The obvious problem is allocating MAXPHYS on the stack. This has to be changed, especially that we may want to rise MAXPHYS in the future. Other than that I expect thing would be tuned properly so that geli(8) can work by default. I'm happy to use smaller buffers than MAXPHYS - keyfiles are far smaller usually than 128kB, so there shouldn't be any issue with this. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpjcefoSL7Mf.pgp Description: PGP signature
Re: acpi resume related patch
On Sat, Jan 26, 2013 at 04:18:57PM +0200, Andriy Gapon wrote: on 25/01/2013 18:08 Andriy Gapon said the following: on 25/01/2013 15:51 John Baldwin said the following: On Friday, January 25, 2013 3:43:33 am Andriy Gapon wrote: If you have ACPI suspend/resume working, if it used to work but stopped working at some time, if it never worked, but you are still hoping, could you please test the following patch and report back? http://svn.freebsd.by/files/acpi-apic-wakeup-final.patch This will break systems not using the local APIC since you unconditionally call lapic_setup() on resume.This was part of the feature of the previous code that by using a dummy pic it could register it only when the local APIC was used. Thank you for drawing my attention to this. I will try to fix this issue. The reason I want to remove lapic from 'pics' (and I already described it in a private email) is that Local APIC is a special kind of PIC. It's already explicitly initialized by APs. Putting it into 'pics' tailq just obfuscates the code. It should also be registered before any of the I/O APICs are by the design of the local_apic.c code. In fact, as I see in the code, Local APIC is always registered _after_ I/O APICs. And thus lapic_resume was called after ioapic_resume. Additionally, currently there is no synchronization between initialization of Local APICs on APs and initialization of I/O APICs at the wakeup/resume time. Here is an updated version of the patch: http://people.freebsd.org/~avg/acpi-apic-wakeup.2.patch FYI, it doesn't change anything for me. Resume seems to work, but suspend just reset my laptop. I unload all driver modules (including if_em, sound, nvidia, usb). The only driver I keep is ahci. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpeGRk3y8xZd.pgp Description: PGP signature
Re: acpi resume related patch
On Mon, Jan 28, 2013 at 03:07:59PM +0200, Andriy Gapon wrote: on 28/01/2013 14:45 Pawel Jakub Dawidek said the following: FYI, it doesn't change anything for me. Resume seems to work, but suspend just reset my laptop. Sorry, I am a little confused by this description. How can you know that resume works if suspend resets the machine? Because everything turns off, including display and the power led starts to throb and it does that until I try to resume it. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp9FvUaKpvfH.pgp Description: PGP signature
Re: acpi resume related patch
On Mon, Jan 28, 2013 at 03:40:18PM +0200, Andriy Gapon wrote: on 28/01/2013 15:24 Pawel Jakub Dawidek said the following: On Mon, Jan 28, 2013 at 03:07:59PM +0200, Andriy Gapon wrote: on 28/01/2013 14:45 Pawel Jakub Dawidek said the following: FYI, it doesn't change anything for me. Resume seems to work, but suspend just reset my laptop. Sorry, I am a little confused by this description. How can you know that resume works if suspend resets the machine? Because everything turns off, including display and the power led starts to throb and it does that until I try to resume it. Ah, so suspend and resume got accidentally reverted... I see, thank you for the report. I just noticed that, yes, sorry. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpZdzy5Vr92A.pgp Description: PGP signature
Re: ACPI panic on unplugging the power cord.
On Fri, Jan 25, 2013 at 04:19:44PM -0500, Jung-uk Kim wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2013-01-25 04:26:02 -0500, Pawel Jakub Dawidek wrote: On Thu, Jan 24, 2013 at 05:18:48PM +0100, Pawel Jakub Dawidek wrote: One is when I leave laptop idle for some time (few hours?): http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg Small update. This panic doesn't happen when the system is idle, it happens we I close the lid to the point when display is turned off (which is almost closed, but not entirely closed). Closing lid doesn't trigger suspend or anything like that. It just turns off the display. Please try the attached patch (with my previous patch). Also, available from here: http://people.freebsd.org/~jkim/acpi_exit.diff Your patch fixes panic triggered by unplugging power cable as well the one triggered by closing the lid. I haven't tried booting from battery and connecting power cable, but I expect happy end there as well. Thank you very much for the fix. BTW. When I'm closing the lid and opening it again (not cosing it entirely, just to the point when display is turned off) I see this message to be printed twice on the console: CPU0: local APIC error 0x80 Although I must admit I see this message from time to time, but I haven't found a pattern. Closing and opening the lid always make it appear. Not sure how much it is related to ACPI, though. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpgOFfnWqmSv.pgp Description: PGP signature
Re: ACPI panic on unplugging the power cord.
On Thu, Jan 24, 2013 at 05:18:48PM +0100, Pawel Jakub Dawidek wrote: One is when I leave laptop idle for some time (few hours?): http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg Small update. This panic doesn't happen when the system is idle, it happens we I close the lid to the point when display is turned off (which is almost closed, but not entirely closed). Closing lid doesn't trigger suspend or anything like that. It just turns off the display. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpt_6zGlZaPi.pgp Description: PGP signature
Re: ACPI panic on unplugging the power cord.
On Wed, Jan 23, 2013 at 07:54:57PM -0500, Jung-uk Kim wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2013-01-22 12:56:29 -0500, Pawel Jakub Dawidek wrote: I just upgraded to HEAD today and was wondering what will explode. Now I know. When I unplug power cord from my laptop, ACPI panics. Pictures here: http://people.freebsd.org/~pjd/misc/acpi_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_panic_1.jpg Let me know if you need more info. Can you please try the attached patch? It is also available from here: http://people.freebsd.org/~jkim/utcache.diff Please note the patch may or may not fix the problem but I think I found an ancient bug. :-( This patch didn't fix the panic: http://people.freebsd.org/~pjd/misc/acpi_unplug_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_unplug_panic_1.jpg In the meantime I found two other panics. One is when I leave laptop idle for some time (few hours?): http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg And when is when I boot laptop without power connected and I connect the power: http://people.freebsd.org/~pjd/misc/acpi_power_connect_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_power_connect_panic_1.jpg BTW. On the acpi_power_connect_panic_0.JPG photo, at the top of the screen you can see error messages that are logged every second when I have power disconnected. This is not a new problem. I had this problem when I bought this laptop, but now that I'm reporting those bugs, I can as well let you know about this one. I'm running this on Thinkpad T530. I understand that those problems are specific to my laptop and that on your laptop all of the above work just fine? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpEcz4SnSEh8.pgp Description: PGP signature
Re: ZFS + usb in trouble?
On Wed, Jan 23, 2013 at 09:11:23PM +0900, Alexander Nedotsukov wrote: And now both are failing, although zfs in a different way. Yes, but now we are sure this is not ZFS issue. Sequential read simply doesn't trigger the corruption, but ZFS access patterns do. The read errors you are seeing are due to GELI returning EIO on integrity verification error. # zpool status pool: testpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 1M in 0h2m with 0 errors on Wed Jan 23 20:57:11 2013 config: NAME STATE READ WRITE CKSUM testpool ONLINE 0 0 0 raidz1-0ONLINE 0 0 0 da5.elid ONLINE 0 0 0 da5.elie ONLINE 21 0 0 da5.elif ONLINE 0 0 0 # dmesg | tail -20 GEOM_ELI: da1.eli: Failed to authenticate 131072 bytes of data at offset 1010827264. ath0: bb hang detected (0x4), resetting GEOM_ELI: Device da5.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Integrity: HMAC/SHA1 GEOM_ELI: Crypto: software GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 4096. GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 0. GEOM_ELI: da5.eli: Failed to authenticate 5792 bytes of data at offset 229288288. GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 229429248. GEOM_ELI: da5.eli: Failed to authenticate 131072 bytes of data at offset 229298176. GEOM_ELI: da5.eli: Failed to authenticate 26496 bytes of data at offset 229494784. GEOM_ELI: da5.eli: Failed to authenticate 12288 bytes of data at offset 270299136. GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 270319616. GEOM_ELI: da5.eli: Failed to authenticate 5792 bytes of data at offset 273095488. GEOM_ELI: da5.eli: Failed to authenticate 56864 bytes of data at offset 273105376. GEOM_ELI: da5.eli: Failed to authenticate 2400 bytes of data at offset 273326080. GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 272560128. GEOM_ELI: da5.eli: Failed to authenticate 131072 bytes of data at offset 272429056. GEOM_ELI: da5.eli: Failed to authenticate 32768 bytes of data at offset 272396288. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpN4AvBRoAtD.pgp Description: PGP signature
ACPI panic on unplugging the power cord.
I just upgraded to HEAD today and was wondering what will explode. Now I know. When I unplug power cord from my laptop, ACPI panics. Pictures here: http://people.freebsd.org/~pjd/misc/acpi_panic_0.jpg http://people.freebsd.org/~pjd/misc/acpi_panic_1.jpg Let me know if you need more info. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpaejQVh5GkP.pgp Description: PGP signature
Re: ZFS + usb in trouble?
On Tue, Jan 22, 2013 at 11:47:29PM +0900, Alexander Nedotsukov wrote: Hi Pawel, Here what I did. # geli onetime -a hmac/sha1 -s 4096 /dev/da5 # dmesg | tail -15 wlan0: link state changed to UP ugen5.4: USBest Technology at usbus5 umass2: USBest Technology USB Mass Storage Device, class 0/0, rev 2.00/1.00, addr 4 on usbus5 umass2: SCSI over Bulk-Only; quirks = 0x0100 umass2:4:2:-1: Attached to scbus4 da5 at umass-sim2 bus 2 scbus4 target 0 lun 0 da5: USB2.0 FlashDisk 0.00 Removable Direct Access SCSI-2 device da5: 40.000MB/s transfers da5: 983MB (2015231 512 byte sectors: 64H 32S/T 983C) GEOM_ELI: Device da5.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Integrity: HMAC/SHA1 GEOM_ELI: Crypto: software GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 4096. GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 0. # dd if=/dev/random of=/dev/da5.eli bs=1m dd: /dev/da5.eli: short write on character device dd: /dev/da5.eli: end of device 875+0 records in 874+1 records out 917151744 bytes transferred in 1530.831674 secs (599120 bytes/sec) # dd if=/dev/da5.eli of=/dev/null bs=1m 874+1 records in 874+1 records out 917151744 bytes transferred in 178.874312 secs (5127353 bytes/sec) All clear. No new errors. Zpool created on the same usb stick is still failing with cksum errors. # usbconfig -d 5.4 add_quirk UQ_MSC_NO_SYNC_CACHE umass2: SCSI over Bulk-Only; quirks = 0x4000 Re-created zpool is still failing with cksum errors. Every new scrub run is triggering another repairing. Interesting. Can you put ZFS on top of GELI with authentication after filling da0.eli with random random and see if GELI will report corruption when ZFS will or if only ZFS will report corruption? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpwnenyZmG1u.pgp Description: PGP signature
Re: ZFS + usb in trouble?
On Sat, Jan 19, 2013 at 11:26:39PM +0900, Alexander Nedotsukov wrote: Let's use some space out of it. #dd if=/dev/zero of=/tank/foo ^C250939+0 records in 250938+0 records out 128480256 bytes transferred in 30.402453 secs (4225983 bytes/sec) Oops... #zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 5K in 0h0m with 0 errors on Sat Jan 19 23:11:20 2013 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da1 ONLINE 0 0 1 da2 ONLINE 0 0 0 da3 ONLINE 0 0 1 This seems like umass/usb issue (hps@ CCed). Can you also try GELI with data integrity verification? # geli onetime -a hmac/sha1 -s 4096 /dev/da0 It will report some problems due to GEOM tasting, but ignore them and run: # dd if=/dev/random of=/dev/da0.eli bs=1m Once that done try to read the data back and look at the logs to see if any corruptions are reported. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpRaNMtMVNc2.pgp Description: PGP signature
Re: kmem_map auto-sizing and size dependencies
On Fri, Jan 18, 2013 at 08:26:04AM -0800, m...@freebsd.org wrote: Should it be set to a larger initial value based on min(physical,KVM) space available? It needs to be smaller than the physical space, [...] Or larger, as the address space can get fragmented and you might not be able to allocate memory even if you have physical pages available. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgphr9pcEGGow.pgp Description: PGP signature
Re: ZFS + usb in trouble?
I used wrong Hans' login, resending with the proper e-mail. On Mon, Jan 21, 2013 at 09:58:52PM +0100, Pawel Jakub Dawidek wrote: On Sat, Jan 19, 2013 at 11:26:39PM +0900, Alexander Nedotsukov wrote: Let's use some space out of it. #dd if=/dev/zero of=/tank/foo ^C250939+0 records in 250938+0 records out 128480256 bytes transferred in 30.402453 secs (4225983 bytes/sec) Oops... #zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 5K in 0h0m with 0 errors on Sat Jan 19 23:11:20 2013 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da1 ONLINE 0 0 1 da2 ONLINE 0 0 0 da3 ONLINE 0 0 1 This seems like umass/usb issue (hps@ CCed). Can you also try GELI with data integrity verification? # geli onetime -a hmac/sha1 -s 4096 /dev/da0 It will report some problems due to GEOM tasting, but ignore them and run: # dd if=/dev/random of=/dev/da0.eli bs=1m Once that done try to read the data back and look at the logs to see if any corruptions are reported. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgphVtGYdmMmd.pgp Description: PGP signature
Re: auditdistd config file location
On Tue, Dec 04, 2012 at 04:38:13PM +0100, Johan Hendriks wrote: Hello all. I had some spare time so i thought it was a good moment to look at auditdistd . One thing i noticed was the default config file location. The man page and the wiki tells me it is /etc/security/auditdistd I enabled audistd by placing the following in the rc.conf file auditdistd_enable=YES However if i want to start the deamon, it tells me the config file is not present. And that is correct, because my config is in /etc/security/ and not /etc/ [root@smb-filer01 ~]# /etc/rc.d/auditdistd start /etc/rc.d/auditdistd: WARNING: /etc/auditdistd.conf is not readable. /etc/rc.d/auditdistd: WARNING: failed precmd routine for auditdistd [root@smb-filer01 ~]# I think the default location of the config file needs to be modified to match that of the wiki page and the man page. The one in the man page is correct (/etc/security/auditdistd.conf). This was last minute change, which I just fixed (r244181). Thank you for the report! If you have any other questions, don't hesitate to ask. Feel free to CC me when sending a question to the mailing list. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp59eN4q5ojB.pgp Description: PGP signature
Re: [head tinderbox] failure on ia64/ia64
On Sat, Dec 01, 2012 at 03:17:57AM +, FreeBSD Tinderbox wrote: cc -c -O2 -pipe -fno-strict-aliasing -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -nostdinc -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 -mfixed-range=f32-f127 -fpic -ffreestanding -Werror /src/sys/kern/vfs_lookup.c /src/sys/kern/vfs_lookup.c:214:54: error: macro AUDIT_ARG_UPATH1 passed 4 arguments, but takes just 3 /src/sys/kern/vfs_lookup.c: In function 'namei': /src/sys/kern/vfs_lookup.c:214: error: 'AUDIT_ARG_UPATH1' undeclared (first use in this function) /src/sys/kern/vfs_lookup.c:214: error: (Each undeclared identifier is reported only once /src/sys/kern/vfs_lookup.c:214: error: for each function it appears in.) /src/sys/kern/vfs_lookup.c:216:54: error: macro AUDIT_ARG_UPATH2 passed 4 arguments, but takes just 3 /src/sys/kern/vfs_lookup.c:216: error: 'AUDIT_ARG_UPATH2' undeclared (first use in this function) *** [vfs_lookup.o] Error code 1 [...] My recent commits broke the build in two ways, both should be fixed now, I'm very sorry about that. The code did compile for me on my development box, but it turned out source code on my laptop didn't have latest changes. I tried to compile it on my laptop before the commit, but run into problems with my environment (my laptop is running HEAD from before the switch to clang and the compilation was failing very early for other reasons), it is late at night to figure out the fix (which was trivial: CC=gcc), so I assumed that if the code builds on dev box it is fine. It wasn't. My mistake, I should've waited till moring. Once again, sorry about that! -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpMxZDeqDTng.pgp Description: PGP signature
Re: [HEADSUP] zfs root pool mounting
On Fri, Nov 30, 2012 at 08:51:48AM +0200, Daniel Braniss wrote: Recently some changes were made to how a root pool is opened for root filesystem mounting. Previously the root pool had to be present in zpool.cache. Now it is automatically discovered by probing available GEOM providers. The new scheme is believed to be more flexible. For example, it allows to prepare a new root pool at one system, then export it and then boot from it on a new system without doing any extra/magical steps with zpool.cache. It could also be convenient after zpool split and in some other situations. The change was introduced via multiple commits, the latest relevant revision in head is r243502. The changes are partially MFC-ed, the remaining parts are scheduled to be MFC-ed soon. I have received a report that the change caused a problem with booting on at least one system. The problem has been identified as an issue in local environment and has been fixed. Please read on to see if you might be affected when you upgrade, so that you can avoid any unnecessary surprises. You might be affected if you ever had a pool named the same as your current root pool. And you still have any disks connected to your system that belonged to that pool (in whole or via some partitions). And that pool was never properly destroyed using zpool destroy, but merely abandoned (its disks re-purposed/re-partitioned/reused). If all of the above are true, then I recommend that you run 'zdb -l disk' for all suspect disks and their partitions (or just all disks and partitions). If this command reports at least one valid ZFS label for a disk or a partition that do not belong to any current pool, then the problem may affect you. The best course is to remove the offending labels. If you are affected, please follow up to this email. GREATE in a diskless environment, /boot is read only, and the zpool.cache issue has been bothering me ever since, there was no way (and I tried) to re route it. I believe zpool.cache is not required only for root pool anymore and that you still need it if you want non-root pools to be automatically configured after reboot. Am I right, Andriy? Zpool.cache basically tells ZFS which pools should be automatically imported and file systems mounted. You can have disks in your system with ZFS pools that should not be auto-imported and zpool.cache is the way to tell the difference. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpbgMyrchJJv.pgp Description: PGP signature
Re: ZFS TRIM support committed to HEAD.
On Sun, Sep 23, 2012 at 08:20:41PM -0400, Glen Barber wrote: Hi Pawel, On Sun, Sep 23, 2012 at 09:53:58PM +0200, Pawel Jakub Dawidek wrote: FYI, I just committed TRIM support to ZFS, especially useful for SSD-only pools. This is something I implemented long time ago, but was now motivated to get back to it and commit it finally by some great fixes and improvements from the zfsonlinux project (made by Etienne Dechamps). Great! Thanks for this. Any chance you can document the following sysctls? None od the kstat sysctls are documented, they emulate kstat framework from Solaris. We would need to modify a lot of vendor code to document those in 'sysctl -d' output. It still would be good to have them documented even elsewhere, eventhough most are rather self-explanatory. root@kaos:/root # sysctl -d kstat.zfs.misc.zio_trim kstat.zfs.misc.zio_trim: kstat.zfs.misc.zio_trim.zio_trim_bytes: kstat.zfs.misc.zio_trim.zio_trim_success: kstat.zfs.misc.zio_trim.zio_trim_unsupported: kstat.zfs.misc.zio_trim.zio_trim_failed: -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpYMnkgz0Sv6.pgp Description: PGP signature
Re: ZFS TRIM support committed to HEAD.
On Mon, Sep 24, 2012 at 04:55:20PM +0100, Steven Hartland wrote: - Original Message - From: Bob Bishop r...@gid.co.uk To: Pawel Jakub Dawidek p...@freebsd.org Cc: freebsd-current@freebsd.org; freebsd...@freebsd.org; Steven Hartland kill...@multiplay.co.uk Sent: Monday, September 24, 2012 3:17 PM Subject: Re: ZFS TRIM support committed to HEAD. Hi, On 23 Sep 2012, at 23:25, Pawel Jakub Dawidek wrote: I have a patch against stable/8, but not stable/9: http://people.freebsd.org/~pjd/patches/zfstrim8.patch Running with that in an otherwise 8-STABLE GENERIC amd64 kernel, I'm getting: kstat.zfs.misc.zio_trim.zio_trim_bytes: 0 kstat.zfs.misc.zio_trim.zio_trim_success: 0 kstat.zfs.misc.zio_trim.zio_trim_unsupported: 0 kstat.zfs.misc.zio_trim.zio_trim_failed: 2742 which doesn't look like it's working. The SSDs are: ad4: 114473MB ADATA SSD S510 120GB 3.3.2 at ata2-master UDMA100 SATA 3Gb/s ad6: 114473MB ADATA SSD S510 120GB 3.3.2 at ata3-master UDMA100 SATA 3Gb/s Any suggestions? Thanks Don't think ad supports TRIM, switch to ada (ahci) and you should be good. Although I'm surprised your seeing that many reported failures as it should have disabled it on a pool level after the first few failures. Is it still increasing? Note that 'failed' count is increasing, not the 'unsupported' count. We disable TRIM automatically if we get EOPNOTSUPP and ATA is returning some other error(s). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpNdLRO6o97M.pgp Description: PGP signature
Re: ZFS TRIM support committed to HEAD.
On Tue, Sep 25, 2012 at 12:14:24AM +0100, Bob Bishop wrote: Hi, Still seems to be working OK, but: seagoon# zpool status pool: m1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub repaired 0 in 0h2m with 0 errors on Mon Sep 24 23:52:08 2012 config: NAME STATE READ WRITE CKSUM m1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk1 ONLINE109M 0 0 gpt/disk0 ONLINE109M 0 0 errors: No known data errors seagoon# sysctl -a |grep _trim kstat.zfs.misc.zio_trim.zio_trim_bytes: 228731904 kstat.zfs.misc.zio_trim.zio_trim_success: 19406 kstat.zfs.misc.zio_trim.zio_trim_unsupported: 0 kstat.zfs.misc.zio_trim.zio_trim_failed: 0 seagoon# No device errors logged in messages, and scrub comes up clean as you can see. The read error count is increasing, but otherwise everything appears to work OK. Are you sure your world and kernel are in sync? I remember seeing similar problem when my userland was updated. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpEsD6sK3yL7.pgp Description: PGP signature
ZFS TRIM support committed to HEAD.
FYI, I just committed TRIM support to ZFS, especially useful for SSD-only pools. This is something I implemented long time ago, but was now motivated to get back to it and commit it finally by some great fixes and improvements from the zfsonlinux project (made by Etienne Dechamps). Note that this functionality is turned off by default for now. To turn it on you need to add vfs.zfs.trim_disable=0 to /boot/loader.conf. You can see some statistics under kstat.zfs.misc.zio_trim sysctl. Big thanks to multiplay.co.uk for sponsoring this work! BTW. If you find this useful, so tell me about it during EuroBSDcon 2012 in a month that will be held this year in Warsaw, Poland: http://eurobsdcon.org :) On Sun, Sep 23, 2012 at 07:40:58PM +, Pawel Jakub Dawidek wrote: Author: pjd Date: Sun Sep 23 19:40:58 2012 New Revision: 240868 URL: http://svn.freebsd.org/changeset/base/240868 Log: Add TRIM support. The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten. Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default). There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM. Sponsored by: multiplay.co.uk This work includes some important fixes and some improvements obtained from the zfsonlinux project, including TRIMming entire vdevs on pool create/add/attach and on pool import for spare and cache vdevs. Obtained from: zfsonlinux Submitted by: Etienne Dechamps etienne.decha...@ovh.net -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpuE8lmhRoHn.pgp Description: PGP signature
Re: ZFS TRIM support committed to HEAD.
On Sun, Sep 23, 2012 at 10:24:53PM +0100, Bob Bishop wrote: Hi, On 23 Sep 2012, at 20:53, Pawel Jakub Dawidek wrote: FYI, I just committed TRIM support to ZFS, especially useful for SSD-only pools. [etc] Is any of this applicable to -STABLE or 8.x? I have a patch against stable/8, but not stable/9: http://people.freebsd.org/~pjd/patches/zfstrim8.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpO35odYgQfi.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote: On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: All of the above is ugly, U'm afraid :( Indeed. The only sane way is to put the metadata in a partition of its own. Every compliant OS will respect that and consequently will not scribble over the data unintentionally. Any other scheme that puts valuable data in some undocumented or unregistered location is violating the GPT spec right away and is susceptible to being clobbered unintentionally. If the user runs: # gpart create -s GPT /dev/mirror/foo for me it is obvious that he wants to partition the mirror device and not individual disks. Because the mirror was configured earlier, do you expect gmirror to somehow detect that someone is writting GPT metadata later and magically place GPT metadata on the raw disk and move mirror's metadata to some magic partition? Not to mention that the mirror itself doesn't have to be configured on top of raw disks. And not to mention that the mirror may never be partitioned. If GPT in your opinion is limited only to raw disks then I guess the best way to fix that is to refuse to configure GPT on anything except raw disks (which was already proposed by Andrey?). In my opinion this is unacceptable, but I think this is what you are suggesting. One of the GEOM design goals was to be flexible. Let the user decide in what order he wants to configure various layers. How do you know that in every possible scenerio software mirroring should come after partitioning and encryption after mirroring? Why can't we provide flexible tools to the user and let him decide? Maybe GPT nesting violates standards, but why can't we support it as an extention, really? I recognize the need to warn users if they use FreeBSD-specific features. We do that with non-standard APIs. So how about this. Let's modify gpart(8) to print a warning if GPT is configured on something else than raw disk. Let's the warning say that such configuration is non-standard and problems are expected if the disk is shared between other OSes. In my opinion that's fair. With such a warning in place, I think we can allow users to decide on their own if they really want that or not. Then, we can also improve FreeBSD boot loader to play nice with FreeBSD-specific extensions. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpk4jIfg3rJt.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote: On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote: Or are you suggesting to convince all BIOS vendors to include the ability to boot from some kind of FreeBSD private partitioning scheme (not MBR as it is not suitable, not GPT as you are not OK to use it on a gmirror)? I would be having less problems if the mirroring didn't force the backup GPT header in anything but the last sector. [...] GPT backup header is placed in the last sector of the mirror device, just like the user asked. Gmirror doesn't force anything. User decides to put GPT partitioning on the mirror device instead of raw disk. Gmirror doesn't even know and doesn't have to know how the user uses data area on the mirror device. [...] If the metadata was somewhere else, then we wouldn't need to kluge various places to deal with the ambiguity and visible interoperability problems of the various tools and OSes. [...] Where is somewhere else, exactly? If somewhere else on this disk, then where? At the begining of the disk? Then you would complain that it keeps metadata where the primary header should be located and also MBR metadata, BSDlabel metadata, etc. Somewhere in the middle of the disk? Some future GPTng may want to use the same spot, but also gmirror-unaware boot loader will see corrupted data (shifted by one sector). Come on... If somewhere else is not on this disk, then I'm sorry, but this is totally impractical. Disks are the place you store stuff. In 99% of the cases there is no other place to store it, but the disk itself. Should we ask users to use additional disk to keep mirror's metadata? [...] Thus, it's not that I object to the mirroring per se, just to the mirroring as it is currently implemented with gmirror. Do you know software RAID (=1) or volume manager that doesn't keep metadata on component disks? PS. We are discussing two totally different things here: 1. Is placing GPT on anything but raw disk violates the spec? I can agree that it does and I'm happy with gpart(8) growing a warning. 2. How to do software mirroring. Besides trying really hard I'm not sure what alternative are you proposing. Could you be more specific and describe how gmirror should be implemented in your opinion? What about multipathing? In case the disk is attached via two paths but multipath is not enabled, the OS sees the same disk (and the same identical unique disk identifier) multiple times. Is this a violation of the spec too? It's the same disk, isn't it? The OS can actually use the property of the ID to infer that it has already seen this disk and not create multiple device nodes. You cannot trust some id that is found on disk to be unique, as all your assumptions break when the user decides to dd(1)-copy content of this disk to another disk, for example. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp4nneOO8jiW.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote: I don't think so. Most common case is to configure partitions on top of a mirror. Mirroring partitions is less common. Mostly because of hardware RAIDs being popular. You don't expect hardware RAID vendor to mirror partitions. Partition editors for other OS's won't work, but only because they don't support gmirror. If they wouldn't recognize and support some hardware (or pseudo-hardware) RAIDs there will be the same problem. Hardware RAIDs hide the metadata from the disk that the BIOS (and disk editors) see. Thus, putting a GPT on a hardware RAID volume works fine as the logical volume is always seen by all OS's consistently. [...] Only if you won't connect this disk to a different controller. [...] The same is even true of the software RAID that graid supports since the metadata is defined by the vendor and thus the logical volume is always seen other OS's consistently. But is it seen without metadata by the boot loader? What I'm trying to say is that it is fair to expect from the user to not use gmirror-configured disk on different OS. If the user wants to use this disk in different OS then he has to use format that is recognized by both. Because gmirror is supported by FreeBSD we should improve the support by teaching boot loader about it. Pretending gmirror is special and recommending to mirror partitions with it instead of raw disks is not the solution. I really can't see how gmirror is different in this regard from any other software RAID or volume manager. If you try to use disk that contains unrecognized metadata the behaviour is undefined (but hopefully not a panic). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpQmYWVuPgKs.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote: On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: GPT really wants the backup header at the last LBA. I know you can set it, but I've interpreted that as a way to see if the primary header is correct or not. It seems to me that GPT tables created in this fashion (inside a GEOM provider) will not work properly with partition editors for other OS's. I'm hesitant to encourage the use of this as I do think putting GPT inside of a gmirror violates the GPT spec. Agreed. Guys. This doesn't violate the GPT spec in any way. The spec is narrow-minded if it talks only about raw disks, but you should think about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top of RAID array is spec violation, then I guess we just have to live with it. While it is a nice trick to use the last sector for meta data, it does create 2 problems. 1 is mentioned above. [...] It doesn't really matter where gmirror puts its metadata. If gmirror would keep its metadata in the first sector, gpart/gpt will find its metadata in the last sector and will complain about missing primary header. [...] The second is that when there's different metadata in the first *and* the last sector, you can't decide which is to take precedence without also looking at the other and know how to interpret it. We have not solved this second problem at all. We do get reports about the problems though. At best we're handwaving or kluging. This is different kind of problem. It took me a while to realize that, but now I know:) The real problem is that not all metadata formats are suitable for autodetection. That's all. The metadata I use in my GEOM classes play nice with autodetection. The solution is very easy - keep size of the disk device within metadata. This allows gmirror to figure out if it is configured on raw disk, last slice or last partition within last slice, etc. If GPT would keep disk size in its metadata the second problem you mentioned would not exist. And to be honest GPT kinda does that by having backup header's LBA stored in the primary header. And this is fine as long the primary header is valid. The same problem is with things like UFS labels. There is no way to properly support them using GEOM autodetection, because there is no provider size in UFS superblock. UFS superblock contains file system size, but it is not the same, as one can create smaller file system than the underlying disk device. I think it's unwise to depend on FreeBSD-specific extensions or features in industry-standard partitioning schemes and as such make the use of foreign tools hard if not impossible. If you plan to use the given disk with FreeBSD only, what's the problem? Partitioning is not the end of the world. Even if you use industry-standard partitioning schemes what file system are you going to use to actually access your data? FAT? Of course if you do share your disk between various OSes then probably your best bet is to use MBR or GPT on raw disk and FAT file system. But if you use your disk with FreeBSD only, then I see no reason to not to leverage FreeBSD-specific features (be it gmirror, geli or zfs). A much more flexible approach is to support out-of-band configuration data. This allows us to mirror GPT disks without having to become non- standard as it removes the need to use the last sector for meta-data. The ability to construct GEOM hierarchies unambiguously is very important and our current approach has proven to not deliver on that. This is actually impacting existing FreeBSD consumers already, like Juniper. So, se should not go deeper into this rabbit hole. We should finally solve this problem for real... Marcel, nothing stops anyone from implementing GEOM mirror class that uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does provide mechanism for autoconfiguration, but it is totally optional and GEOM class might choose not to use it. As an example you can take a look at two other GEOM classes of mine: gconcat(8) and gstripe(8). You can use 'label' subcommand to store metadata on component disks, which will take advantage of GEOM autodetection and autoconfiguration. You can also use 'create' subcommand to create ad hoc provider that stores no metadata and makes use of entire disks, which also means it won't be automatically created on next boot. For Juniper it might be more handy to use out-of-band configuration as you know the hardware you are running on, so you know where the disks are exactly, etc. My company build appliances too, so I have been there. For most of our users automatic configuration is simply better, as they can shuffle disks around and not wonder if the system will boot or not. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote: On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote: As for sharing disk with other OS. If you share the disk with OS that doesn't support gmirror, you shouldn't use gmirror in the first place. You probably want to use only formats that are recognized by all your OSes. This statement is ridicuous by virtue of not being in touch with reality and by making gmirror useless for such wide range of cases that one can question why we have it at all. Put differently: a mirroring class is a fairly basic and useful thing to have. Limiting it's use is nothing but artificial and follows from having to use the underlying provider to store metadata. This then changes the view of the underlying providing to consumers above gmirror in a way that makes the presence or absence of gmirror visible. Solving the visibility problem makes gmirror useful all the time. I see that as a better way of looking at it than simply blurting out that you shouldn't use gmirror when certain awkward and artifical conditions apply. I'm sorry, Marcel, but what you describe here has nothing to do with reality. To be able to implement realiable mirroring you have to use on-disk metadata. There is no way around that. You can implement non-redundant GEOM classes without using on-disk metadata, but out-of-band configuration in case of mirroring is simply naive. How do you detect that components are out of sync, for example? And when it comes to visablity. Are you suggesting that gmirror should present entire underlying provider to upper layers? Including its metadata? I hope not, because we went through that hell already (remember skipping first 16 sectors by UFS, as BSDlabel metadata might be there? The same for swap?). I think I did pretty good job by making the metadata as simple as possible - I use exactly one sector at the end of the target device. I'm really having a hard time to think of a simpler format. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpwcpFwrh1lk.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: Hi All, Some time ago i have started reading the code in the sys/boot. Especially i'm interested in the partition tables handling. I found several problems: 1. There are several copies of the same code in the libi386/biosdisk.c and common/disk.c, and partially libpc98/biosdisk.c. 2. ZFS probing is very slow, because the ZFS code doesn't know how many disks and partitions the system has: http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 3. The GPT support doesn't check CRC and even doesn't know anything about the secondary GPT header/table. Just a quick note here. At some point when I was adding GPT attributes to allow for test starts I greatly improved, at least parts of, the GPT implementation. I did implement support for both CRC checksum verification and fallback to backup GPT header when primary is broken. And the code is still in sys/boot/common/gpt.c. So my question would be what do you mean by this sentence? So, i have created the branch and committed the changes: http://svnweb.freebsd.org/base/user/ae/bootcode/ The patch is here: http://people.freebsd.org/~ae/boot.diff What i already did: 1. The partition tables handling now is machine independent, and it is compatible with the kernel's GEOM_PART implementation. There is new API for disk drivers in the loader to get information about partitions and tables: common/Makefile.inc common/part.c common/part.h 2. The similar and general code from the disk drivers merged in the disk.c: common/disk.c common/disk.h i386/libi386/libi386.h i386/libi386/biosdisk.c userboot/test/test.c userboot/userboot/userboot_disk.c userboot/userboot.h 3. ZFS code now uses new API and probing on the systems with many disks should be greatly increased: zfs/zfs.c i386/loader/main.c 4. The gptboot now searches the backup GPT header in the previous sectors, when it finds the GEOM:: signature in the last sector. PMBR code also tries to do the same: common/gpt.c i386/pmbr/pmbr.s 5. Also the pmbr image now contains one fake partition record. When several first sectors are damaged the kernel can't detect GPT (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) command, but the old pmbr image has an empty partition table and loader doesn't able to boot from GPT, when there is no partition record in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode' command, the kernel correctly modifies this partition record. So, this is only for the first rescue step. 6. I have changed userboot interface. I guess there is none consumers except the one test program. But if it isn't that, i can make it compatible. Any comments are welcome. -- WBR, Andrey V. Elsukov -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpWMg9tnH9tN.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote: On 26.06.2012 16:57, Pawel Jakub Dawidek wrote: On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: Hi All, Some time ago i have started reading the code in the sys/boot. Especially i'm interested in the partition tables handling. I found several problems: 1. There are several copies of the same code in the libi386/biosdisk.c and common/disk.c, and partially libpc98/biosdisk.c. 2. ZFS probing is very slow, because the ZFS code doesn't know how many disks and partitions the system has: http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 3. The GPT support doesn't check CRC and even doesn't know anything about the secondary GPT header/table. Just a quick note here. At some point when I was adding GPT attributes to allow for test starts I greatly improved, at least parts of, the GPT implementation. I did implement support for both CRC checksum verification and fallback to backup GPT header when primary is broken. And the code is still in sys/boot/common/gpt.c. So my question would be what do you mean by this sentence? Yes, gptboot does that, but the loader/zfsloader doesn't. So there might be a situation when gptboot does boot, but loader(8) can't. I see. I don't know if I'll find time for a proper review, but it is really great that you are working on cleaning up this huge mess. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp62kXgtS21u.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote: 4. The gptboot now searches the backup GPT header in the previous sectors, when it finds the GEOM:: signature in the last sector. PMBR code also tries to do the same: common/gpt.c i386/pmbr/pmbr.s GPT really wants the backup header at the last LBA. I know you can set it, but I've interpreted that as a way to see if the primary header is correct or not. [...] My interpretation is different: The way to verify if the header is valid is to check its checksum, not to check if the backup header location in the primary header points at the last LBA. Of course if primary header's checksum is incorrect it is hard to trust that the backup header location is correct. And we need the backup header when the primary header is invalid... [...] It seems to me that GPT tables created in this fashion (inside a GEOM provider) will not work properly with partition editors for other OS's. I'm hesitant to encourage the use of this as I do think putting GPT inside of a gmirror violates the GPT spec. I don't think so. Most common case is to configure partitions on top of a mirror. Mirroring partitions is less common. Mostly because of hardware RAIDs being popular. You don't expect hardware RAID vendor to mirror partitions. Partition editors for other OS's won't work, but only because they don't support gmirror. If they wouldn't recognize and support some hardware (or pseudo-hardware) RAIDs there will be the same problem. In other words, IMHO, our problem is that FreeBSD's boot code doesn't recognize/support gmirror's metadata. What Andrey is proposing is to recognize the metadata and act accordingly - in case of a gmirror we simply need to skip it. In the future we will have the same problem with graid - until we add support for it to the boot code, we won't be able to boot from it. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpxIGC3e4HIp.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote: Long ago I saw a proposal to create a dedicated partition on GPT to hold the metadata. With the large number of partitions available on GPT, tying up one just for GEOM seems like a low price and it moves the device GEOM out of the realm of FreeBSD unique and subject to serious issues when/if a disk is shared with some other OS. I have seen little comment on this and have never seen any argument that that it could not work. I think this is an issue that will continue to bite users unless it is fixed. I don't really see how dedicating a partition for metadata can work or is good idea, sorry. As for sharing disk with other OS. If you share the disk with OS that doesn't support gmirror, you shouldn't use gmirror in the first place. You probably want to use only formats that are recognized by all your OSes. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgprySQFgONfF.pgp Description: PGP signature
Re: trim/discard success story
On Tue, Apr 03, 2012 at 06:18:43PM -0700, Julian Elischer wrote: for flash drives this is great news.. Now if ZFS would get trim support, that too would be great. I'm working on it. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpnW1D0zD64a.pgp Description: PGP signature
Re: gptboot rewrite, bootonce, etc.
On Tue, Jan 31, 2012 at 06:49:51PM +0300, Andrey Fesenko wrote: This work if use ZFS? My issues Root on ZFS GPT and boot to ufs partition http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013514.html I test # gpart show = 34 625142381 ada0 GPT (298G) 34128 1 freebsd-boot (64k) 162 26621952 2 freebsd-ufs [bootonce,bootme] (12G) 266221148388608 3 freebsd-swap (4.0G) 35010722 590131693 4 freebsd-zfs (281G) system ada0p2 not boot. This functionality only works with UFS (gptboot). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpgkHKPPasLZ.pgp Description: PGP signature
Snapshot listing speedup.
If you have many snapshots and you were complaining that listing them takes a lot of time, you may find the commit below useful. It only works if your listing is limited to snapshot names and you want to sort also by snapshot name (by default snapshots are sorted by creation time). On Sat, Jan 21, 2012 at 09:12:53PM +, Pawel Jakub Dawidek wrote: Author: pjd Date: Sat Jan 21 21:12:53 2012 New Revision: 230438 URL: http://svn.freebsd.org/changeset/base/230438 Log: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name /dev/null cold cache: 1.7s warm cache: 1.1s -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpCh8Ntx3o73.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Thu, Dec 29, 2011 at 11:12:59AM -0500, John Baldwin wrote: On Sunday, December 25, 2011 11:01:33 am Коньков Евгений wrote: Здравствуйте, John. Вы писали 20 декабря 2011 г., 16:52:44: JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg JB Ok, the one use case I was worried about is happening regularly before your JB panic, so that is good. Can you use gdb to figure out which call to JB tcp_output() is actually panic'ing? I wonder if it is this case: JB /* JB * Return any desired output. JB */ JB if (needoutput || (tp-t_flags TF_ACKNOW)) { JB (void) tcp_output(tp); JB /* XXX: Debug */ JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), JB (tcp_input: negative window after ACK)); JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides JB to not do anything. I've updated tcp_negwin_test.patch to not panic if that call JB to tcp_output() doesn't actually send a packet. Please re-test. # uname -a FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10 i386 rebooting once per day. Now I compile kernel with debug options. Can you advice me which and where I find debug info when it will reboting next time? so I can help to debug problem Are you using the patch at the URL above (tcp_negwin_test.patch)? If not, can you try applying that patch and seeing if you still get any panics? I applied 1.5 days ago, so far now panics and no other messages. I modified the patch a bit to not panic, but print a message when panic was suppose to happen. This box is too valuable for me to panic it too often. Because there were no debug messages I understand that the scenerio didn't happen yet and not that the problem is fixed, right? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp9eZdmwuxI5.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpdhigg3fyKF.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Sorry for the delay, but I'm just rebooting the box that triggered the panic with John's patch. I should know more in a day or two. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp4LFFZCMTh1.pgp Description: PGP signature
Re: zfs-related(?) panic in cache_enter: wrong vnode type
On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote: (kgdb) bt #0 doadump (textdump=1) at pcpu.h:224 #1 0x804f6d3b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:635 #3 0x80585f46 in cache_enter (dvp=0xfe003d4763c0, vp=0xfe0142517000, cnp=0xff82393b3708) at /usr/src/sys/kern/vfs_cache.c:726 #4 0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0, nm=0xff82393b3140 .., vpp=0xff82393b36e0, cnp=0xff82393b3708, nameiop=0, cr=0xfe0042e88100, td=0xfe000fdfa480, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470 Which FreeBSD version is it? Because lines doesn't seem to match neither to HEAD nor to stable/8. It would be best of you could send me few lines around zfs_vnops.c:1470 and vfs_cache.c:726. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpI7RzOpvuwY.pgp Description: PGP signature
Re: [RFC] winbond watchdog driver for FreeBSD/i386 and FreeBSD/amd64
On Tue, Jun 28, 2011 at 03:32:41PM -0700, Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, I'd like to request for comments on the attached driver, which supports watchdogs on several Winbond super I/O chip models and have been tested on a few of recent Supermicro motherboards. Is there any reason this is not yet committed? Please commit, pretty please. One minor problem I found is described below. Other than that it works for me, thanks! [...] +static void +winbondwd_event(void *arg, unsigned int cmd, int *error) +{ + struct winbondwd_softc *sc = arg; + unsigned char rtimeout; + uint64_t timeout; + + if (cmd == 0) + winbondwd_set_timeout(sc, 0); + else { + timeout = (uint64_t)1 (cmd WD_INTERVAL); + if (timeout (uint64_t)0xff * 1000 * 1000 * 1000) { + rtimeout = timeout / (1000 * 1000 * 1000); + if (rtimeout == 0) + rtimeout = 0xff; + winbondwd_set_timeout(sc, rtimeout); + } else { + device_printf(sc-device, + Value %u too big, disabling\n, cmd WD_INTERVAL); + /* Proposed timeout can not be satisified */ + winbondwd_set_timeout(sc, 0); + } You should add '*error = 0;' right here. Without it we return some random error, in my case watchdgod is able to arm the watchdog but exits when trying to pat it, as it gets an error, which leads to a reset short aferwards. + } +} -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpOBGKr7YKAI.pgp Description: PGP signature
Re: zfs i/o hangs on 9-PRERELEASE
On Fri, Nov 25, 2011 at 01:20:01PM -0600, Mark Felder wrote: 13:14:32 nas:~ uname -a FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3 r227971M: Fri Nov 25 10:07:48 CST 2011 r...@nas.feld.me:/usr/obj/tank/svn/sys/GENERIC amd64 This seemed to start happening sometime after RC1. I tried 8-STABLE and it's happening there too right now. I think whatever caused this was MFC'd. I've also reproduced this on completely different hardware running a single disk ZFS pool. I'm getting this output in dmesg after these hangs I keep seeing. Mark, those backtrace are not related to ZFS, but to PF. Not sure if they are at all related to your hangs. Most cases where ZFS I/O seems to hang are hardware problems, where I/O requests are not completed. 'procstat -kk -a' output might be useful once the hang happens. uma_zalloc_arg: zone pfrktable with the following non-sleepable locks held: exclusive sleep mutex pf task mtx (pf task mtx) r = 0 (0x8199af20) locked @ /tank/svn/sys/modules/pf/../../contrib/pf/net/pf_ioctl.c:1589 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_warn() at witness_warn+0x2c4 uma_zalloc_arg() at uma_zalloc_arg+0x335 pfr_create_ktable() at pfr_create_ktable+0xd8 pfr_ina_define() at pfr_ina_define+0x12b pfioctl() at pfioctl+0x1c5a devfs_ioctl_f() at devfs_ioctl_f+0x7a kern_ioctl() at kern_ioctl+0xcd sys_ioctl() at sys_ioctl+0xfd amd64_syscall() at amd64_syscall+0x3ac Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800da711c, rsp = 0x7fff9d28, rbp = 0x7fffa1f0 --- -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp98vrUuromO.pgp Description: PGP signature
Are there any users of accelerated asymmetric cryptography?
Hi. I'm in the process of rewriting large parts of the opencrypto framework and was wondering if there are any users that actually make use of asymmetric cryptography. From what I see only ubsec(4) driver supports it and from what I remember those cards are hard to find. If there are no users of this functionality I plan to remove support for asym crypto from the opencrypto framework. It is just too hard to maintain this without test hardware and also quite pointless if nobody uses it. If some users can be found and they really need it, we have two options: 1. Keep asym crypto around, but whoever steps up have to be ready to test patches. 2. Use alternative way of accessing it (eg. via direct ioctl to ubsec device), at least until we start to support more cards. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpfM9n21ZWWb.pgp Description: PGP signature
Re: ZFS: invalid zap_type=134218628
On Tue, Nov 08, 2011 at 08:02:13PM +0100, Sebastian Chmielewski wrote: I've got following error after upgrading my 9.0-RC1 to RC2 using source and got this error at loader prompt: ZFS: invalid zap_type=134218628 when I tried zfsboottest from 9.0-STABLE it failed ./zfsboottest /boot/kernel/kernel /dev/ada0p4 returns pool status correctly but ends with the same message zfsboottest from 10.0-CURRENT returns pool status, zfsboottest.sh zroot printed OK message at the end (very useful zfsboottest.sh wasn't yet merged from current to stable). I've installed gptzfsboot, zfsloader, loader from -CURRENT and it boots fine. I'm running ZFS on GPT partitions, zfs version 28 with dedup on. File system can be successfully mount using FreeBSD memstick. I think that patches for boot should be merged to 9.0-RELEASE otherwise users may have the same problems I had. Yes, I plan to include those fixes in 9.0-RELEASE. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpWkozxezUrm.pgp Description: PGP signature
Re: RFC: GEOM MULTIPATH rewrite
On Mon, Oct 31, 2011 at 10:10:14PM +0200, Alexander Motin wrote: Hi. Attempt to fix some GEOM MULTIPATH issues made me almost rewrite it. So I would like to present my results and request for testing and feedback. The main changes: - Improved locking and destruction process to fix crashes in many cases. - Improved automatic configuration method to make it safe by reading metadata back from all specified paths after writing to one. - Added provider size check to reduce chance of conflict with other GEOM classes. - Added manual configuration method without using on-disk metadata. - Added add and remove commands to manage paths manually. - Failed paths no longer dropped from GEOM, but only marked as FAIL and excluded from I/O operations. - Automatically restore failed paths when all others paths are marked as failed, for example, because of device-caused (not transport) errors. - Added fail and restore commands to manually control FAIL flag. - GEOM is now destroyed on last provider disconnection. IMHO it is right to do if device was completely removed. - Added optional Active/Active mode support. Unlike Active/Passive mode, load evenly distributed between all working paths. If supported by device, it allows to significantly improve performance, utilizing bandwidth of all paths. It is controlled by -A option during creation. Disabled by default now. - Improved `status` and `list` commands output. Latest patch can be found here: http://people.freebsd.org/~mav/gmultipath4.patch Feedbacks are welcome! Sponsored by: iXsystems, Inc. There are two possible issues that comes to my mind, not sure if you address them. 1. When configuration is based on on-disk metadata, GEOM spoil/taste is not fully helpful - if you have two paths: da0 and da1 and I write to da0, gmultipath won't be informed by GEOM that da1 changed as well. One solution is to basically keep all paths open exclusively all the time, even if gmultipath provider is not open or emulate spoil/taste for other paths if any path was modified. 2. In active/active mode do you do anything to handle possible reordering? Ie. if you have overlapping writes and send both of them using different paths, you cannot be sure that order will be preserved. Most of the time that's not a problem, as file systems rarely if at all send overlapping writes to device, but this is weak assumption. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpJEqgC1I0SO.pgp Description: PGP signature
Re: umass(4) regression in 9.0-RC1.
On Fri, Oct 28, 2011 at 09:11:42AM +0200, Hans Petter Selasky wrote: On Thursday 27 October 2011 20:51:15 Pawel Jakub Dawidek wrote: On Thu, Oct 27, 2011 at 08:42:09PM +0200, Hans Petter Selasky wrote: This is the root HUB. Can you also show the actual device? Sorry, it wasn't connected, here it goes: ugen0.2: USB2.0-CRW Generic at usbus0, cfg=255 md=HOST spd=HIGH (480Mbps) pwr=ON bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0200 bDeviceClass = 0x bDeviceSubClass = 0x bDeviceProtocol = 0x bMaxPacketSize0 = 0x0008 idVendor = 0x0bda idProduct = 0x0119 bcdDevice = 0x1981 iManufacturer = 0x0001 retrieving string failed iProduct = 0x0002 retrieving string failed iSerialNumber = 0x0003 retrieving string failed bNumConfigurations = 0x0001 Hi, The control request in question is mandatory according to the UMASS specification, and I wonder why it times out and all other control requests aswell. Could you try setting the no-synchronize cache quirk instead, and then plug your device. I'm sorry, but this problem needs further investigation before we can make a patch. It wasn't immediately obvious for me how to set the no-synchronize cache quirk, but I think I found it: # usbconfig add_quirk UQ_MSC_NO_SYNC_CACHE And it seems to work: umass0: Generic USB2.0-CRW, class 0/0, rev 2.00/19.81, addr 2 on usbus0 (probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:0): SCSI status: Check Condition (probe0:umass-sim0:0:0:0): SCSI sense: UNIT ATTENTION asc:28,0 (Not ready to ready change, medium may have changed) da0 at umass-sim0 bus 0 scbus13 target 0 lun 0 da0: Generic- SD/MMC 1.00 Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: 30799MB (63076352 512 byte sectors: 255H 63S/T 3926C) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpU6OrpvyK5i.pgp Description: PGP signature
umass(4) regression in 9.0-RC1.
Hi. My SDHC card (via adapter) is no longer being detected after upgrade to 9.0-RC1. The same card with this adapter works on my laptop with older HEAD. usbus0: EHCI version 1.0 usbus0: NVIDIA nForce MCP55 USB 2.0 controller on ehci0 usbus0: 480Mbps High Speed USB v2.0 ugen0.1: nVidia at usbus0 uhub0: nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1 on usbus0 uhub0: 10 ports with 10 removable, self powered # usbdevs -v usbdevs: no USB controllers found ehci0@pci0:0:10:1: class=0x0c0320 card=0x81fb1043 chip=0x036d10de rev=0xa2 hdr=0x00 vendor = 'nVidia Corporation' device = 'MCP55 USB Controller' class = serial bus subclass = USB After connecting the adapter with the card inside: kernel: ugen0.2: Generic at usbus0 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpsXGKnZ7vPK.pgp Description: PGP signature
Re: umass(4) regression in 9.0-RC1.
On Thu, Oct 27, 2011 at 08:30:44PM +0200, Hans Petter Selasky wrote: What does usbconfig dump_device_desc, say about this device? ugen0.1: EHCI root HUB nVidia at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0200 bDeviceClass = 0x0009 bDeviceSubClass = 0x bDeviceProtocol = 0x0001 bMaxPacketSize0 = 0x0040 idVendor = 0x idProduct = 0x bcdDevice = 0x0100 iManufacturer = 0x0001 nVidia iProduct = 0x0002 EHCI root HUB iSerialNumber = 0x no string bNumConfigurations = 0x0001 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpP7jchJhwkx.pgp Description: PGP signature
Re: umass(4) regression in 9.0-RC1.
On Thu, Oct 27, 2011 at 08:42:09PM +0200, Hans Petter Selasky wrote: This is the root HUB. Can you also show the actual device? Sorry, it wasn't connected, here it goes: ugen0.2: USB2.0-CRW Generic at usbus0, cfg=255 md=HOST spd=HIGH (480Mbps) pwr=ON bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0200 bDeviceClass = 0x bDeviceSubClass = 0x bDeviceProtocol = 0x bMaxPacketSize0 = 0x0008 idVendor = 0x0bda idProduct = 0x0119 bcdDevice = 0x1981 iManufacturer = 0x0001 retrieving string failed iProduct = 0x0002 retrieving string failed iSerialNumber = 0x0003 retrieving string failed bNumConfigurations = 0x0001 -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpR8r24wP9lu.pgp Description: PGP signature
Re: [PATCH] Default scrub interval to whole weeks
On Thu, Oct 27, 2011 at 12:02:22PM -0700, Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, I think it might be better if we set scrub interval to whole weeks (proposed changeset changes it to 5 weeks). The reason for this is to make it easier for system administrators to estimate when the scrub happens, for instance, if a scrub was done on Saturday, it always happen on Saturdays. [...] Sounds reasonable. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpRonlwMWRTb.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote: On 10/26/11 22:53, John Baldwin wrote: The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. Pawel, have you been able to confirm John's hypothesis? [...] Yeah, sorry. I moved the debug to the points where we drop the t_inpcb lock and I still see rcv_nxt being greater than rcv_adv: tcp_do_segment:2970 negative window: tp 0xfe00685ee3d0 rcv_nxt 1312878324 rcv_adv 1312878187 This is just before the INP_WUNLOCK(tp-t_inpcb) under 'check_delack' label. I see this a lot (it was logged 545 times for 11 different tp pointers during 24h period). tcp_do_segment:3009 negative window: tp 0xfe005cfc6000 rcv_nxt 1442546453 rcv_adv 1442545722 This is just before calling tcp_output(). This one was logged 65 times for 3 different tp pointers. I placed a debug also after tcp_output() call, but it is not logged, so once we return from tcp_output() everything is fine. The panic would be triggered 115 times for 5 different tp pointers during that time. I write 'tp pointers' as I'm not 100% sure if the same pointer always represents the same connection or if it is reused. [...] What I don't quite get is why we haven't had a lot more reports of this issue... Maybe because my TCP/IP stack is heavly modified? ...not:) No idea to be honest. Ask Ken to turn on INVARIANTS in 9.0-RC2 and we will see:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpN4xRhoFGYE.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpjTSToeUp9v.pgp Description: PGP signature
Re: Problems with ada0: Previously was known as ad6 functionality
On Tue, Oct 25, 2011 at 05:01:31PM +0200, Jason Edwards wrote: On Tue, Oct 25, 2011 at 4:35 PM, Nikolay Denev nde...@gmail.com wrote: I was struck with this problem too yesterday. I was able to import the pool by specifying the directory where zpool should look for devices. Like : zpool import -d /dev/gptid Alternatively you can try putting kern.cam.ada.legacy_aliases=0 in your /boot/loader.conf Hello Nikolay, Thanks for your reply! The kern.cam.ada.legacy_aliases=0 indeed makes the /dev/ad6 device go away. However, unlike I suspected, it did not solve the zpool import problem. It still lists my pool as corrupted. If I use the zpool import -d /dev/gpt as you suggested, the pool displays as ONLINE and I can import it just fine. Using -d /dev works too, so I guess this works fine as a workaround. Hmm, that's really strange. 'zpool import -d /dev pool' works, but 'zpool import pool' doesn't? Does, eg. 'geom disk status' work or returns an error? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpdz8xMIULHW.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, Oct 23, 2011 at 12:35:15PM +1100, Lawrence Stewart wrote: On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? It is my local file server. It is doing NFS and AFP over LAN and also downloads files from the internet. It is triggered after few hours. I changed the KASSERT() into printf() and added printing 'win' variable and this is what got logged during the night: 05:16:24 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107827269 rcv_adv 1107826256 win=242 05:16:29 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107833451 rcv_adv 1107832977 win=880 05:16:41 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107849563 rcv_adv 1107848860 win=639 05:20:02 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108108230 rcv_adv 1108107331 win=567 05:24:30 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108433302 rcv_adv 1108432272 win=974 05:24:46 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108450385 rcv_adv 1108450060 win=751 05:26:44 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108574818 rcv_adv 1108573851 win=71 05:28:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108654103 rcv_adv 1108653166 win=0 05:28:43 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108692396 rcv_adv 1108691451 win=0 05:30:06 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108781258 rcv_adv 1108780372 win=235 05:35:05 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109067578 rcv_adv 1109067335 win=663 05:37:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109180403 rcv_adv 1109179411 win=0 05:41:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109428265 rcv_adv 1109427375 win=170 And the systems seems to be fine. I'm happy to test patches, but one round would take 24h. My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpcUNlslP3cT.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpe47qzpCChn.pgp Description: PGP signature
9.0-RC1 panic in tcp_input: negative winow.
The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpo3XZOvEVzL.pgp Description: PGP signature
Re: incorrect use of pidfile(3)
On Thu, Oct 13, 2011 at 04:11:40PM +0200, Dag-Erling Smørgrav wrote: Pawel Jakub Dawidek p...@freebsd.org writes: I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same value on FreeBSD) should be converted to EEXIST on pidfile_open() return. The historical (and documented) behavior is to return EAGAIN. We don't want to duplicate the code of handling EAGAIN into every single pidfile(3) consumer. This is why we hav pidfile(3) API in the first place - to make it easy for people to use. Also if we now have for loop, why not to put count in there? Because if we do, there will be a nanosleep after the last pidfile_read() attempt. We need to break the loop after pidfile_read() failed but before nanosleep(). Right, ok. I'm not very happy about touching pidptr in case of error other than EEXIST. This is not documented, but a bit unexpected anyway. Well, it was your idea, I just moved it to before the loop :) In my patch *pidptr was set to -1 only in the case of EAGAIN from pidfile_read(), not for every other error. BTW. With your patch we will continue even when flopen(3) failed for other reasons, instead of returning NULL. Checking for fd being -1 should not be done in the same statement with other checks. After proposed changes it would look like this, what do you think? http://people.freebsd.org/~pjd/patches/pidfile.3.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpNGouMOHFTt.pgp Description: PGP signature
Re: incorrect use of pidfile(3)
On Thu, Oct 13, 2011 at 12:54:38PM +0200, Dag-Erling Smørgrav wrote: I looked at some of the programs that use pidfile(3) in base, and they pretty much all get it wrong. Consider these two scenarios: 1) common case process A process B main() pidfile_open() - success perform_initialization() daemon() pidfile_write() - success perform_work() main() pidfile_open() - EEXIST exit() 2) very unlikely but still possible case process A process B main() pidfile_open() - success main() perform_initialization()pidfile_open() - EAGAIN daemon()perform_initialization() pidfile_write() - successdaemon() perform_work() perform_work() The problem is that most of them (at least the ones I checked) ignore a pidfile_open() failure unless errno == EEXIST. How do we fix this? My suggestion is to loop until pidfile_open() succeeds or errno != EAGAIN. Does anyone have any objections to that approach? I think we already do that internally in pidfile_open(). Can you take a look at the source and confirm that this is what you mean? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp8t7uJHg24p.pgp Description: PGP signature
Re: incorrect use of pidfile(3)
On Thu, Oct 13, 2011 at 02:54:16PM +0200, Dag-Erling Smørgrav wrote: After discussing this with pjd@ on IRC, I arrived at the attached patch, which increases the length of time pidfile_open() itself waits (I hadn't noticed that it already looped) and sets *pidptr to -1 if it fails to read a pid. I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same value on FreeBSD) should be converted to EEXIST on pidfile_open() return. Also if we now have for loop, why not to put count in there? I'm not very happy about touching pidptr in case of error other than EEXIST. This is not documented, but a bit unexpected anyway. I'm happy with increasing the timeout. Index: lib/libutil/pidfile.c === --- lib/libutil/pidfile.c (revision 226271) +++ lib/libutil/pidfile.c (working copy) @@ -118,22 +118,19 @@ */ fd = flopen(pfh-pf_path, O_WRONLY | O_CREAT | O_TRUNC | O_NONBLOCK, mode); - if (fd == -1) { - count = 0; + if (fd == -1 errno == EWOULDBLOCK pidptr != NULL) { + *pidptr = -1; + count = 20; rqtp.tv_sec = 0; rqtp.tv_nsec = 500; - if (errno == EWOULDBLOCK pidptr != NULL) { - again: + for (;;) { errno = pidfile_read(pfh-pf_path, pidptr); - if (errno == 0) - errno = EEXIST; - else if (errno == EAGAIN) { - if (++count = 3) { - nanosleep(rqtp, 0); - goto again; - } - } + if (errno != EAGAIN || --count == 0) + break; + nanosleep(rqtp, 0); } + if (errno == 0) + errno = EEXIST; free(pfh); return (NULL); } -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpIlSX6pcKvL.pgp Description: PGP signature