Re: veriexec(4) maintenance
Jan, I apologise for taking so long to get back to you. I do appreciate you taking the time to help with this. > > first it is difficult to read a diff that also introduces a "notyet" operation > on file pages. > OK, I will remove those sections to make things clearer. I still have a dream of getting per-page signatures in to the kernel some day... > Looks like you replace global and item rw-locks with refcounts/signals. > I tried to understand the rationale behind and failed. Your implementation > seems at least vulnerable to livelocks as the writer side is missing priority. > My aim was to improve multi-thread performance and remove some of the more horrible hacks I made to make things work - like looping on the acquisition of a rw lock. I also was lead to believe that rw locks are very heavy weight so I was trying to replace them with something less onerous. > In general, what problem(s) are you trying to solve here? > The general process for veriexec is when the kernel comes up the tables are blank. Early in the boot process the signatures are loaded into the kernel. After the signatures are loaded and a file is executed the signature for the file is evaluated and the result of the comparison between the loaded and calculated signatures is cached which improves the performance. The things that I think need to be protected are: 1) If the tables are being loaded then any evaluations should wait until the table is loaded. 2) If the signature for a file is being evaluated then any other accesses to that entry should wait until the evaulation is done. 3) At some securelevel settings, the signatures can be updated on the fly. The signature entry must not be updated while it is being accessed. 4) If the file system associated with a particular table of signatures is being unmounted then that table must be cleared. > What is wrong with a quite simple hashtable to hold fileid<->fingerprint > mapping accessed as: > > mutex_enter table lock > lookup fileid -> item > mutex_enter item > mutex_exit table lock > > while item state is busy > cv_wait item cv > > if item state is unknown > set item state busy > mutex_exit item > compute fingerprint > mutex_enter item > set item state valid/invalid > cv_broadcast item cv > > Here the item state is either valid or invalid > > mutex_exit item > well, that was pretty much what my intention was but clearly I failed in the implementation. > > lock_state is sometimes VERIEXEC_UNLOCKED, sometimes IO_NODELOCKED. > > 1629veriexec_dump(struct lwp *l, prop_array_t rarray) > 1630{ > 1631struct mount *mp; > 1632mount_iterator_t *mpiter; > 1633 > 1634mountlist_iterator_init(&mpiter); > 1635 > 1636while ((mp = mountlist_iterator_trynext(mpiter)) != > NULL) { > 1637/* If it fails, the file-system is [being] > unmounted. */ > 1638if (vfs_busy(mp) != 0) > 1639continue; > > This is completely wrong. Operation mountlist_iterator_trynext() already > returns the mount busy and skips mounts currently unmounting or remounting. > Operation mountlist_iterator_next() was right or does this function get > called from inside unmount? > No, not this one, when I first did this code there was no mountlist_iterator framework. This dump call should ignore file systems being unmounted. > 1653veriexec_flush(struct lwp *l) > > This is completely wrong, see above. > This one should be called on unmount. I will fix up and simplify the locking and make sure I use the mountlist_iterator stuff properly Thanks again for the feedback. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: veriexec(4) maintenance
On Sat, Sep 02, 2023 at 08:57:03AM +0930, Brett Lymn wrote: > On Thu, Aug 31, 2023 at 10:24:07AM +0200, J. Hannken-Illjes wrote: > > > > I'm short on time, some remarks: > > > > Thanks, I will have a closer look at these issues. This could be where the > deadlock I was > seeing is coming from. > OK, I have, finally, updated the diff in line with the comments (thanks again for the good feedback). -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh" Index: kern_veriexec.c === RCS file: /cvsroot/src/sys/kern/kern_veriexec.c,v retrieving revision 1.27 diff -u -r1.27 kern_veriexec.c --- kern_veriexec.c 9 Apr 2023 09:18:09 - 1.27 +++ kern_veriexec.c 18 Dec 2023 21:12:46 - @@ -1,4 +1,4 @@ -/* $NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $ */ +/* $NetBSD$*/ /*- * Copyright (c) 2005, 2006 Elad Efrat @@ -29,7 +29,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $"); +__KERNEL_RCSID(0, "$NetBSD$"); #include "opt_veriexec.h" @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -49,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -68,11 +70,6 @@ #define VERIEXEC_UNLOCKED 0x00/* Nothing locked, callee does it */ #define VERIEXEC_LOCKED0x01/* Global op lock held */ -/* state of file locking for veriexec_file_verify */ -#define VERIEXEC_FILE_UNLOCKED 0x02/* Nothing locked, callee does it */ -#define VERIEXEC_FILE_LOCKED 0x04/* File locked */ - -#define VERIEXEC_RW_UPGRADE(lock) while((rw_tryupgrade(lock)) == 0){}; struct veriexec_fpops { const char *type; @@ -86,11 +83,17 @@ /* Veriexec per-file entry data. */ struct veriexec_file_entry { - krwlock_t lock; /* r/w lock */ + kmutex_t vfe_lock; /* operations lock */ + int vfe_refcnt; /* number of refs */ + kcondvar_t vfe_cv; /* wait for no refs */ u_char *filename; /* File name. */ u_char type;/* Entry type. */ u_char status; /* Evaluation status. */ + u_char page_fp_status; /* Per-page FP status. */ u_char *fp; /* Fingerprint. */ + void *page_fp; /* Per-page fingerprints */ + size_t npages; /* Number of pages. */ + size_t last_page_size; /* To support < PAGE_SIZE */ struct veriexec_fpops *ops; /* Fingerprint ops vector*/ size_t filename_len;/* Length of filename. */ }; @@ -120,15 +123,69 @@ void *, void *, void *, void *); static struct veriexec_fpops *veriexec_fpops_lookup(const char *); static void veriexec_file_free(struct veriexec_file_entry *); +static void veriexec_busy(void); +static void veriexec_unbusy(void); +static void vfe_busy(struct veriexec_file_entry *vfe); +static void vfe_unbusy(struct veriexec_file_entry *vfe); static unsigned int veriexec_tablecount = 0; /* - * Veriexec operations global lock - most ops hold this as a read - * lock, it is upgraded to a write lock when destroying veriexec file - * table entries. + * Veriexec operations global lock. + */ +static kmutex_t veriexec_op_lock; +static kcondvar_t veriexec_cv; +static int veriexec_refcnt; + +/* + * Busy a file table entry. + */ +static void +vfe_busy(struct veriexec_file_entry *vfe) +{ + mutex_enter(&vfe->vfe_lock); + vfe->vfe_refcnt++; + mutex_exit(&vfe->vfe_lock); +} + +/* + * Drop a reference to a file table entry, if count zero then wake + * up any potential waiters. + */ +static void +vfe_unbusy(struct veriexec_file_entry *vfe) +{ + mutex_enter(&vfe->vfe_lock); + KASSERT(vfe->vfe_refcnt > 0); + if (--(vfe->vfe_refcnt) == 0) + cv_broadcast(&vfe->vfe_cv); + mutex_exit(&vfe->vfe_lock); +} + +/* + * Grab the oplock and bump the reference count. */ -static krwlock_t veriexec_op_lock; +static void +veriexec_busy(void) +{ + mutex_enter(&veriexec_op_lock); + veriexec_refcnt++; + mutex_exit(&veriexec_op_lock); +} + +/* + * Decrement the reference count, if we hit count zero then wake up + * any possible waiters. + */ +static void +veriexec_unbusy(void) +{ + mutex_enter(&veriexec_op_lock); + KASSERT(veriexec_refcnt > 0); + if (--veriexec_refcnt == 0) + cv_
Re: veriexec(4) maintenance
On Thu, Aug 31, 2023 at 10:24:07AM +0200, J. Hannken-Illjes wrote: > > I'm short on time, some remarks: > Thanks, I will have a closer look at these issues. This could be where the deadlock I was seeing is coming from. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: veriexec(4) maintenance
On Tue, Aug 15, 2023 at 07:52:46AM +0930, Brett Lymn wrote: > > I tried sending this with the diff but I think it was a bit big, > here is the text, the diff is at ftp.netbsd.org under > /pub/NetBSD/misc/blymn/veriexec_update > OK, it has been a couple of weeks since I posted this and I have had zero responses. I have tested the changes and they seem ok on my daily usage pattern. Does anyone have _any_ comments or should I just commit the changes? -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: veriexec(4) maintenance
I tried sending this with the diff but I think it was a bit big, here is the text, the diff is at ftp.netbsd.org under /pub/NetBSD/misc/blymn/veriexec_update On Mon, Aug 14, 2023 at 07:51:50AM +0930, Brett Lymn wrote: > On Fri, Aug 04, 2023 at 11:24:22PM +, David Holland wrote: > > > > I was looking at this last weekend and for veriexec I have one fix and > > some comments/notes, which I should probably commit. > > > > OK, that sounds good. > > > fileassoc I would recommend flushing and replacing with a proper > > specificdata implementation for vnodes, which there should be no > > particular barrier to. fileassoc is trying to be that but it's both > > bust and doing it wrong. :-| > > > > I think fileassoc was added with the view for using it for other things > but in the end only veriexec is using it. We can revert it back to > it being veriexec specific. > > I have attached a diff that is a version of veriexec that uses condvars > and ref counters to handle the access control. It compiles but > hasn't been run tested on a current kernel (it worked when i used it > before but it has been a long time). > -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: veriexec(4) maintenance
On Fri, Aug 04, 2023 at 11:24:22PM +, David Holland wrote: > > I was looking at this last weekend and for veriexec I have one fix and > some comments/notes, which I should probably commit. > OK, that sounds good. > fileassoc I would recommend flushing and replacing with a proper > specificdata implementation for vnodes, which there should be no > particular barrier to. fileassoc is trying to be that but it's both > bust and doing it wrong. :-| > I think fileassoc was added with the view for using it for other things but in the end only veriexec is using it. We can revert it back to it being veriexec specific. I have attached a diff that is a version of veriexec that uses condvars and ref counters to handle the access control. It compiles but hasn't been run tested on a current kernel (it worked when i used it before but it has been a long time). -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh" Index: kern_veriexec.c === RCS file: /cvsroot/src/sys/kern/kern_veriexec.c,v retrieving revision 1.27 diff -u -r1.27 kern_veriexec.c --- kern_veriexec.c 9 Apr 2023 09:18:09 - 1.27 +++ kern_veriexec.c 11 Aug 2023 07:17:50 - @@ -1,4 +1,4 @@ -/* $NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $ */ +/* $NetBSD$*/ /*- * Copyright (c) 2005, 2006 Elad Efrat @@ -29,7 +29,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $"); +__KERNEL_RCSID(0, "$NetBSD$"); #include "opt_veriexec.h" @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -49,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -68,11 +70,6 @@ #define VERIEXEC_UNLOCKED 0x00/* Nothing locked, callee does it */ #define VERIEXEC_LOCKED0x01/* Global op lock held */ -/* state of file locking for veriexec_file_verify */ -#define VERIEXEC_FILE_UNLOCKED 0x02/* Nothing locked, callee does it */ -#define VERIEXEC_FILE_LOCKED 0x04/* File locked */ - -#define VERIEXEC_RW_UPGRADE(lock) while((rw_tryupgrade(lock)) == 0){}; struct veriexec_fpops { const char *type; @@ -86,11 +83,17 @@ /* Veriexec per-file entry data. */ struct veriexec_file_entry { - krwlock_t lock; /* r/w lock */ + kmutex_t vfe_lock; /* operations lock */ + int vfe_refcnt; /* number of refs */ + kcondvar_t vfe_cv; /* wait for no refs */ u_char *filename; /* File name. */ u_char type;/* Entry type. */ u_char status; /* Evaluation status. */ + u_char page_fp_status; /* Per-page FP status. */ u_char *fp; /* Fingerprint. */ + void *page_fp; /* Per-page fingerprints */ + size_t npages; /* Number of pages. */ + size_t last_page_size; /* To support < PAGE_SIZE */ struct veriexec_fpops *ops; /* Fingerprint ops vector*/ size_t filename_len;/* Length of filename. */ }; @@ -120,15 +123,69 @@ void *, void *, void *, void *); static struct veriexec_fpops *veriexec_fpops_lookup(const char *); static void veriexec_file_free(struct veriexec_file_entry *); +static void veriexec_busy(void); +static void veriexec_unbusy(void); +static void vfe_busy(struct veriexec_file_entry *vfe); +static void vfe_unbusy(struct veriexec_file_entry *vfe); static unsigned int veriexec_tablecount = 0; /* - * Veriexec operations global lock - most ops hold this as a read - * lock, it is upgraded to a write lock when destroying veriexec file - * table entries. + * Veriexec operations global lock. + */ +static kmutex_t veriexec_op_lock; +static kcondvar_t veriexec_cv; +static int veriexec_refcnt; + +/* + * Busy a file table entry. + */ +static void +vfe_busy(struct veriexec_file_entry *vfe) +{ + mutex_enter(&vfe->vfe_lock); + vfe->vfe_refcnt++; + mutex_exit(&vfe->vfe_lock); +} + +/* + * Drop a reference to a file table entry, if count zero then wake + * up any potential waiters. */ -static krwlock_t veriexec_op_lock; +static void +vfe_unbusy(struct veriexec_file_entry *vfe) +{ + mutex_enter(&vfe->vfe_lock); + KASSERT(vfe->vfe_refcnt > 0); + if (--(vfe->vfe_refcnt) == 0) + cv_broadcast(&vfe->vfe_cv); + mutex_exit(&vfe->vfe_lock); +} + +/* + * Grab the oplock
Re: veriexec(4) maintenance
On Wed, Aug 02, 2023 at 07:47:48AM +, Taylor R Campbell wrote: > veriexec(4), and the fileassoc(9) API it uses internally, needs > maintenance and probably some serious rework to fix synchronization > problems on any MP and/or preemptible kernels (i.e., all x86 of the > past couple decades) with implications for any security properties it > is supposed to provide. > > If anyone cares about veriexec(4) and would like to volunteer to take > this on, we can discuss what needs to be done and how to proceed. > I will take this on since I was originally responsible for the mess. It does play fast and loose with synchronisation and should be fixed. I have some fixes in my private tree that went some way to addressing the issues but they have been languishing for many many years because I had an issue of a deadlock that would occaisionally occur. At the time I couldn't work out where the deadlock was coming from but maybe with some expert help and the improved tools I can nut it out. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: protect pmf from network drivers that don't provide if_stop
On Thu, Jul 01, 2021 at 06:47:08AM -0300, Jared McNeill wrote: > Not really a fan of this as it doesn't protect other potential if_stop users > (and "temporary fix" rarely is..). How about something like this instead? > Yes, that is a much better idea, thanks. I have backed out my change and committed the change to if.c you suggested. Martin, I guess this make the PR I raised redundant. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: protect pmf from network drivers that don't provide if_stop
On Tue, Jun 29, 2021 at 02:36:35PM +0200, Martin Husemann wrote: > > It is already merge hell and all drivers do it completley different on > the branch - so I'd prefer not to touch all drivers in head. > OK, I won't bash on the drivers then. > Not calling it if NULL is fine (we could do that now), but we should > make sure all drivers properly set it - only now is not a good time to > KASSERT for it, we should add that after the merge. > So, does this mean its ok to commit what I proposed to prevent the panic with a view to the drivers being fixed in the future? -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: protect pmf from network drivers that don't provide if_stop
On Tue, Jun 29, 2021 at 11:16:37AM +0200, Martin Husemann wrote: > On Tue, Jun 29, 2021 at 03:46:20PM +0930, Brett Lymn wrote: > > I turned up a fix I had put into my source tree a while back, I think at > > the time the wireless driver (urtwn IIRC) did not set an entry for > > if_stop. > > This is a driver bug, we should not work around it but catch it early > and fix it. > ok but I believe that it is a common issue with the usb wifi drivers which is a bit odd since they are more likely to be attached to platforms that sleep. > Side note: the wifi drivers fiddling with struct ifnet will stop > once the new wifi code gets into the main tree. > Is that going to be soon? I was thinking of doing a sweep of the usb network drivers to fix this. It is not affecting me at the moment because the driver for wireless I am using at the moment actually defines if_stop. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
protect pmf from network drivers that don't provide if_stop
Folks, I turned up a fix I had put into my source tree a while back, I think at the time the wireless driver (urtwn IIRC) did not set an entry for if_stop. This resulted in a kernel panic if you tried to sleep the machine pmf_class_network_suspend would try to call a null if_stop. I crafted the following to preserve my sanity, ok to commit? Index: kern_pmf.c === RCS file: /cvsroot/src/sys/kern/kern_pmf.c,v retrieving revision 1.45 diff -u -r1.45 kern_pmf.c --- kern_pmf.c 11 Jun 2020 02:30:21 - 1.45 +++ kern_pmf.c 29 Jun 2021 01:27:01 - @@ -892,11 +892,18 @@ struct ifnet *ifp = device_pmf_class_private(dev); int s; - s = splnet(); - IFNET_LOCK(ifp); - (*ifp->if_stop)(ifp, 0); - IFNET_UNLOCK(ifp); - splx(s); + if (ifp == NULL) + return true; + + if ((*ifp->if_stop) == NULL) + printf("device %s has no if_stop\n", ifp->if_xname); + else { + s = splnet(); + IFNET_LOCK(ifp); + (*ifp->if_stop)(ifp, 0); + IFNET_UNLOCK(ifp); + splx(s); + } return true; } -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: fixing coda in -current
On Sun, Nov 25, 2018 at 08:33:53PM +, David Holland wrote: > > So I have no immediate comment on the patch but I'd like to understand > better what it's doing -- the last time I crawled around in it > (probably 7-8 years ago) it appeared to among other things have an > incestuous relationship with ufs_readdir such that if you tried to use > anything under than ffs as the local container it would detonate > violently. But I never did figure out exactly what the deal was other > than it was confusing and seemed to violate a lot of abstractions. > The tricky thing about coda is that a lot of the processing that is usually done within the VFS kernel code is actually processed in userland the the coda daemon venus. What happens is a userland processes the coda fs, the coda kernel code "upcalls" that requests to the venus daemon which processess it and may ask the kernel module for information before returning results via the kernel module to the userlan process. So the FS sort of hangs half in and half out of the kernel - if both do not agree on structures and other details then nasal demons ensue. I haven't looked at how the VFS ops structure is filled in but I can easily see if the kernel module simply takes the underlying file system ops then things will lose because venus expects ufs style dirents. This is really the only thing at the moment that I can think of that would cause explosions IIRC the coda fs data is just kept in a single file blob sitting on the host fs (venus does the work to track writes that are pending server reconnection). > Can you clarify? It would be nice to have it working properly and > stuff like the above is only going to continue to fail in the future... > Now that you have made me think about this more I am thinking that the fix that has gone in is more or less a bandaid - I should track this back a bit further as it is more than likely that a more correct fix exists inside venus instead of the kernel code. What we have now won't hurt and at least stops the super annoying KASSERT when I touch /coda. I need to have a look at the coda pkgsrc anyway, my patches to support mount_coda are conflicting which I need to sort out. -- Brett Lymn Let go, or be dragged - Zen proverb.
Re: fixing coda in -current
On Wed, Nov 21, 2018 at 09:01:40AM +0100, Martin Husemann wrote: > > No, you can easily check for the presence of the userland binaries in ATF > and skip the test if they are missing. > OK, no problems > I can setup a server for my test environment and install the userland tools > on the test machines, so we could run the full fs tests suite against coda. > If you need any assistance with configuring coda then please let me know. It has been a long time but hopefully I can recall something useful... -- Brett Lymn Let go, or be dragged - Zen proverb.
Re: fixing coda in -current
On Tue, Nov 20, 2018 at 07:39:26PM +, m...@netbsd.org wrote: > On Tue, Nov 20, 2018 at 08:06:37PM +0100, Hauke Fath wrote: > > ISTR that somebody on the CODA mailing-list suggested a re-implementation > > as userland file-system, but I don't think much has happened on that front. > > note that you can cd /usr/tests/fs; atf-run as user, because > they're mostly not running on the kernel :-) You need the userland installed from pkgsrc and a coda server configured before it will work so setting up atf is challenging. I do have some private code that provides a mount_coda so you can have a fstab entry but it still relies on pkgsrc being there to work. -- Brett Lymn Let go, or be dragged - Zen proverb.
Re: fixing coda in -current
On Tue, Nov 20, 2018 at 10:18:56AM -0500, Greg Troxel wrote: > I used to use it, and may again. So I'd like to see it stay, partly > because I think it's good to keep NetBS relevant in the fileystem > research world. I am expecting to see new upstream activity. > I guess it is obvious that I use it too. There is still some activity going on at CMU around coda so it has not been abandoned by the upstream. It is difficult to automate the testing because you need the userland support there too - the kernel driver does a bunch of upcalls to a userland daemon to perform most of the work. I use coda to keep my work in progress stuff on so I can work on my laptop or my desktop, have the changes shared but then be able to take my laptop off network and keep working, when I get back home the changes can be synced without me having to do anything. Sure, there are other ways of doing this but trying to maintain a shared tree potentially changing either side is difficult when you don't have constant network. > But, I think it makes sense to remove it from GENERIC, and perhaps have > whatever don't-autoload treatment, so that people only get it if they > explicitly ask for it. That way it should not bother others. > Yes, this would be fine. Mind you, you have to explicitly install the userland from pkgsrc so the driver is dormant until that happens. -- Brett Lymn Let go, or be dragged - Zen proverb.
fixing coda in -current
I am guessing that not many people use coda (or they just haven't complained) as it seems like the coda kernel support has suffered some bit-rot. Trying to access a coda file system on -current results in a couple of KASSERTs firing - the first is easy, we need to lock the vnode on readdir but after doing that there is a check in ufs_readwrite.c to ensure the vnode is either a VDIR or VLNK type, neither of which is true for coda because the whole coda file system is contained in a regular file on the host file system so the vnode type is VREG. The following patch makes coda work for me. I don't think that manipulating the vnode type is really great but a lesser evil than trying to create a duplicate of the ufs readdir code and also not as bad as removing the KASSERTs which are useful sanity checks in the majority of use cases. Comments? Anyone really care? Index: coda_vnops.c === RCS file: /cvsroot/src/sys/coda/coda_vnops.c,v retrieving revision 1.106 diff -u -r1.106 coda_vnops.c --- coda_vnops.c26 May 2017 14:21:00 - 1.106 +++ coda_vnops.c20 Nov 2018 09:42:31 - @@ -1537,6 +1537,7 @@ /* upcall decl */ /* locals */ int error = 0; +enum vtype saved_type; MARK_ENTRY(CODA_READDIR_STATS); @@ -1569,7 +1570,13 @@ /* Have UFS handle the call. */ CODADEBUG(CODA_READDIR, myprintf(("%s: fid = %s, refcnt = %d\n", __func__, coda_f2s(&cp->c_fid), vp->v_usecount)); ) +saved_type = vp->v_type; + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); +vp->v_type = VDIR; /* pretend the container file is a dir */ error = VOP_READDIR(vp, uiop, cred, eofflag, cookies, ncookies); +vp->v_type = saved_type; + VOP_UNLOCK(vp); + if (error) MARK_INT_FAIL(CODA_READDIR_STATS); else -- Brett Lymn Let go, or be dragged - Zen proverb.
Re: 8.0 performance issue when running build.sh?
I would be interested to finish that off, I need to make some time to get to doing it though. I have been sitting on some changes to veriexec for ~ years that change it from locking everything to using reference counts and condition variables which removes some nasty hacks I did. I have not committed the changes because the kernel would sometimes deadlock and I was having trouble tracking down why. Perhaps I was looking in the wrong spot for the error and it was fileassoc all along that was causing the deadlock. - Original Message - From: "Mindaugas Rasiukevicius" To:"Jason Thorpe" Cc: Sent:Fri, 10 Aug 2018 00:12:23 +0100 Subject:Re: 8.0 performance issue when running build.sh? Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: veriexec
On Tue, Aug 29, 2017 at 10:36:57AM +0800, Paul Goyette wrote: > On Tue, 29 Aug 2017, Paul Goyette wrote: > > >While looking at Sevan's recent PR, I notice a couple of problems with the > >current code. > > > >In sys/kern_veriexec.c routine veriexec_file_add(), at line 1072 we > >allocate a vfe entry, and initialize the rw_lock contained therein. > > > >Then there are some error branches at lines 1082 and 1090 that simply > >"goto out" without ever destroying the rw_lock nor free()ing the vfe. > > > >(diffs removed, since they were wrong!) > > > >Comments or other suggestions? > > On IRC, riastradh@ pointed out that the above diffs are not quite right, > since in the success case we need to avoid destroying the rwlock and > free()ing the vfe entry. > LGTM. Thanks. -- Brett Lymn This email has been sent on behalf of one of the following companies within the BAE Systems Australia group of companies: BAE Systems Australia Limited - Australian Company Number 008 423 005 BAE Systems Australia Defence Pty Limited - Australian Company Number 006 870 846 BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 228 864 Our registered office is Evans Building, Taranaki Road, Edinburgh Parks, Edinburgh, South Australia, 5111. If the identity of the sending company is not clear from the content of this email please contact the sender. This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: bin/52512: Duplicate files prevent veriexecctl from loading signature file
On Tue, Aug 29, 2017 at 06:40:15PM +0800, Paul Goyette wrote: > >>It is unclear whether we should modify veriexecctl to not set the error > >>for EEXIST, or if we should modify veriexecgen to not generate multiple > >>entries (with different names) for the same file. (It seems to me > >>unreasonable to expect the user to remove the duplicates.) > > > >A third option would be to modify the kernel code to not complain when > >attempting to add a new entry, as long as the new fingerprint type and > >value match the existing values. > > The attached patch implements this option. > Looks good to me. Thanks for fixing this. -- Brett Lymn This email has been sent on behalf of one of the following companies within the BAE Systems Australia group of companies: BAE Systems Australia Limited - Australian Company Number 008 423 005 BAE Systems Australia Defence Pty Limited - Australian Company Number 006 870 846 BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 228 864 Our registered office is Evans Building, Taranaki Road, Edinburgh Parks, Edinburgh, South Australia, 5111. If the identity of the sending company is not clear from the content of this email please contact the sender. This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: Lockless IP input queue, the pktqueue interface
On Thu, May 29, 2014 at 08:12:35AM +1000, Darren Reed wrote: > > The method that I've seen used in Solaris (for example) is to use > foo_impl.h to providethe details of data structure that are essentially > private and those .h filesmay or may notbe shipped as part of the end > user system.Using pktqueue_private.h might be suitable. > curses.h already does this, for userland the WINDOW (and other) pointers are opaque handles. The actual definition is hidden in curses_private.h which is not installed. -- Brett Lymn This email has been sent on behalf of one of the following companies within the BAE Systems Australia group of companies: BAE Systems Australia Limited - Australian Company Number 008 423 005 BAE Systems Australia Defence Pty Limited - Australian Company Number 006 870 846 BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 228 864 Our registered office is Evans Building, Taranaki Road, Edinburgh Parks, Edinburgh, South Australia, 5111. If the identity of the sending company is not clear from the content of this email please contact the sender. This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: Porting Linux SCSI ioctl stuff (was: Re: LTFS)
On Wed, Dec 18, 2013 at 03:34:31PM +, Emmanuel Dreyfus wrote: > > With sensedata defined as: unsigned char sensedata[128] > > What does that mean? i woudl have expected to have buffer > lengths at the same size. > Sounds like someone is not following the scsi specification. -- Brett Lymn This email has been sent on behalf of one of the following companies within the BAE Systems Australia group of companies: BAE Systems Australia Limited - Australian Company Number 008 423 005 BAE Systems Australia Defence Pty Limited - Australian Company Number 006 870 846 BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 228 864 Our registered office is Evans Building, Taranaki Road, Edinburgh Parks, Edinburgh, South Australia, 5111. If the identity of the sending company is not clear from the content of this email please contact the sender. This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: bus_dmamap_destroy no longer callable from interrupt context?
On Fri, Nov 15, 2013 at 10:31:52PM +0100, Christoph Badura wrote: > > Bummer. It sure looks to me like bus_dmamap_destroy used to be callable > from interrupt context. And the Open/FreeBSD code being identical leads me > to believe they do support destroying in interrupt context. > You can't really go by what OpenBSD do, they still have the big kernel lock so their scope for race conditions is a lot smaller. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: NGROUPS/NGROUPS_MAX
On Wed, Oct 09, 2013 at 02:11:09PM +, Brian Ginsbach wrote: > > Recent versions of AIX, Solaris, and Linux contain work-arounds that > essentially by-pass the 16 group limitation (RFC 5531). The > degree of by-pass varies (the number of additional groups may not > necessarily be unlimited). this may well be but just FYI, samba will do an assert on a user in more that 16 unix groups. This may not be relevant in the current context but just putting it out there :) -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: Netbsd /avr32 update.
On Mon, Sep 30, 2013 at 10:06:46AM +0200, Martin Husemann wrote: > On Sun, Sep 29, 2013 at 08:36:15PM -0300, Tomas Niño Kehoe wrote: > > The next step, besides continuing / fixing the actual port is to start > > working on the toolchain. Particularly an llvm backend for NetBSD/avr32 > > seems > > the way to go. > > For those of us not knowing the details: isn't avr32 supported by stock gcc? > If not, why? If only in later versions, which? > This may or may not help you, this is what I did to build gcc-4.7.2 so I could rebuild some arduino firmware: http://implementality.blogspot.com.au/2013/01/building-arduino-firmware-on-netbsd.html -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: fixing the vnode lifecycle
On Wed, Sep 25, 2013 at 02:33:43PM +0200, J. Hannken-Illjes wrote: > On Sep 22, 2013, at 5:28 AM, David Holland wrote: > > > > First, obviously the vfs-level vnode cache code should provide vnode > > lookup so file systems don't need to maintain their own vnode > > tables. Killing off the fs-level vnode tables not only simplifies the > > world but also avoids exposing a number of states and operations > > required only to keep the fs-level table in sync with everything else. > > This will help a great deal to get rid of the bodgy locking and all > > the races. > > > > (If any file system does anything fancy with its vnode table, other > > than iterating it and doing lookups by inode number, I'd like to find > > out.) > > Expect some file systems to use a key size != sizeof(ino_t) -- nfs > for example uses file handles up to 64 bytes. > Right, as do a few other file systems. IIRC all file systems provide a filehandle generation routine, the vnodes should be looked up using the handle not the inode number. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: 4.0.1 i386 wedged
On Tue, May 21, 2013 at 12:25:40PM -0700, Brian Buhrow wrote: > hello. I was under the impression that vnconfigging an nfs-mounted > file continues to not work to today -- and that is why xen has such a hard > time supporting guests under NFS. In other words, "Don't do that". If you > need such a big file vnconfigged, use local disk. > I could be wrong on this, but I'm pretty sure I have it right. > It _used_ to work - in recent years I did a recovery of a linux lvm backup by nfs mounting the storage that had the file created by dd'ing the linux disk, doing a vnconfig and then a lvm change to get at the linux file systems. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: Questions about LVM compatibility
On Tue, Feb 26, 2013 at 04:07:40PM +0100, tech mailinglists wrote: > > Is the NetBSD implementation of LVM compatible with the Linux > implementation of LVM? > I was able to mount LVM volumes from a RHEL 5 system onto NetBSD without a problem, I didn't have success with ones from RHEL 6. If your LVM is one that NetBSD supports then you should be fine. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: fixing compat_12 getdents
On Mon, Dec 10, 2012 at 09:23:15PM +, David Laight wrote: > > Then people get upset because they say "function foo() isn't allowed > to set errno to 'bar'". > It is rather a shame that posix tries to list all errno a function > can return, not just those for explicit 'likely' (ie normal) > non-success returns froma function. > Then how do you sanely write error handling routines? The error returns form part of the interface and should be documented as such. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: Tips on programming usb devices?
On Mon, Sep 10, 2012 at 03:05:48PM +0100, Iain Hibbert wrote: > > > The first 4 interfaces present themselves as general communications ports, > > of which the first permits standard serial connections through which modem > > Hayes style AT commands can be sent to control the modem's behavior. The > > other 3 ports are available and, I believe, work, but I don't know how to > > use them. > > one of them may be a GPS port.. the uhso devices seem to also have these > (though mine does not) > One is probably a status monitor port for the HSDPA modem, this can be used to monitor signal strength and so forth (don't ask me how though...) -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: SAS scsibus target numbering
On Fri, Jul 27, 2012 at 09:49:38AM +1000, Simon Burge wrote: > > > > ISTR SPARC machines used an initiator address 6 for hysterical reasons. I > > forget why. > > Likewise for some early models of DECstation (2100 and 3100?). I never > understood why. > Priorities. When negotiating for the bus the highest numbered device wins if there is an arbitration clash... iirc ids greater than 7 have a priority lower than id 0, I presume for compat reasons. So for a scsi ID 0 - 7 the highest wins, it makes sense that the controller is a high number (as Mouse said it is normally 7 on old sun hardware) so that it can grab the bus when it wants to talk. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: DIRBLKSIZ differs between userland & kernel
On Mon, Feb 27, 2012 at 07:50:51AM +, David Holland wrote: > > > > Right. > > What the vfs layer for coda does for coda_readdir is wrap some goop > > around doing a VOP_READDIR on the underlying filesystem. In my case > > since the containers lived on ufs that is what was used. > > Sure. But then, as long as syntactically valid dirents come back it > shouldn't matter what DIRBLKSIZ is. > Well, syntactically correct they may be but if a dirent straddles the DIRBLKSIZE boundary then things will break. I think this is what happens. Venus carefully aligns dirents to the 1024 byte boundary but ufs_readdir() rounds down to the nearest 512 byte boundary when it is doing its processing - if that happens to be part way through a dirent that venus has provided then the nasal demons happen. At the moment I am wondering if I shouldn't just do a statvfs() on the filesystem and use the device block size instead of DIRBLKSIZE. > > > > Well venus pads them to 1024 - there is a chance that venus won't fill > > the entire block, if venus less than half fills the block then > > ufs_readdir() gets garbage. > > This I still don't understand. ufs_readdir doesn't read anything from > venus. > Actually it does. The way that coda works currently is a bit complicated, it does part of its work in the kernel but part of it is passed up to venus. A lot of the file system semantics are handled within venus, the fs data is actually stored in a container file and venus is the one that handles getting data into the container from the server (if it is not already cached in there), pushing data from the container back to the server (when there is a connection), satisfying reads/writes from the local system and a lot of other things. When you do a getdents(2) call on a coda directory the readdir syscall eventually gets called. This is directed to coda_readdir() by VOP_READDIR, coda_readdir() checks if the container file is open already and opens it if it is not. Once the container file is open then coda_readdir() calls VOP_READDIR on the underlying fs, in my case this gets directed to ufs_readdir(). Inside ufs_readdir() a VOP_READ is performed to get the directory data. In this case this is handled by coda_read() which forms up a request does an upcall to userland for venus to handle the request. Venus looks at the request and sees that it is a request for a read of a directory so it takes the coda directory entries and on the fly forms up a series of dirents to satisfy the read request and hands this back to the kernel. Then ufs_readdir() processes this data. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: DIRBLKSIZ differs between userland & kernel
On Mon, Feb 27, 2012 at 02:43:51AM +, David Holland wrote: > > I don't think any of those is the right answer. Coda is not limited to > running on top of ffs, so it shouldn't be doing only filesystem- > independent things when talking to the filesystem it uses for storage. > (Right?) > Right. What the vfs layer for coda does for coda_readdir is wrap some goop around doing a VOP_READDIR on the underlying filesystem. In my case since the containers lived on ufs that is what was used. > Therefore it should be using the value from in both the > kernel and in venus. If it's running on top of ffs, ffs will provide > dirents with padding at 512-byte intervals that it would think > unnecessary, but I would think it shouldn't notice or care. > Well venus pads them to 1024 - there is a chance that venus won't fill the entire block, if venus less than half fills the block then ufs_readdir() gets garbage. > Then again, maybe I don't understand what's going on, as there > shouldn't be any way for ufs_readdir to see, much less trip on, > dirents generated by venus. As above, the coda_readdir() in coda_vnops.c wraps some goop around a VOP_READDIR - pushing the handling of the dirents to the underlying fs method. > > Note that ffs needs DIRBLKSIZ to be the same as the underlying atomic > I/O size, or various unspecified bad things can happen in crashes. So > you can't change what ffs is doing. > Absolutely no thought in my mind about changing the size of DIRBLKSIZ in the kernel, I do understand the implications of doing that and until (if?) everyone has 4k sectors it probably will not change. What I was talking about in relation to this is that ufs_readdir() pulls the directory block size from a "ufs-ified" version of the mount structure attached as private data to the mount struct. I had thought changing that would fix the problem but as you rightly point out we are not just dealing with ufs here so I could easily fix the problem for me but still leave a lurking bug :( > Also, I have no idea why the userland value diverged from the ffs > value, but I doubt it's safe to change it without adding a large pile > of compat wibble. > Yep, this is why I asked :) > Finally, we should not not not have duplicate symbol names like this. > I guess now that we've branched I can go clean it up... > That would be great - we will still have a problem with venus but at least others won't be bitten by this weirdness. I am thinking that maybe the best thing would be to try and get venus to read the directory block size from the underlying file system and use that, if I can. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
DIRBLKSIZ differs between userland & kernel
I have been tracking through a bug with Coda where, basically, getdents(2) is not returning all the directory entries. The files exist in the directory but do not show up on a globbed listing. After some digging I found that things ended up in ufs_readdir() which was terminating early due to a bad dirent. What is happening is the userland part of Coda, venus, manufactures a the dirents for a directory when a read request comes up from the kernel. It has code to carefully avoid a dirent spanning a DIRBLKSIZ boundary by padding a dirent near the boundary. This data is returned to the kernel for processing. Where things come unstuck is DIRBLKSIZ is defined in /usr/include/dirent.h as 1024 bytes, inside the kernel ufs code sys/ufs/ufs/dir.h DIRBLKSIZE is set to DEV_BSIZE which is 512 bytes. This means that venus can produce a block of dirents that finish before the 512 byte mark, the kernel code tries to align back to a 512 byte boundary and fails to find a valid dirent - the fact that ufs_readdir() exits gracefully rather than causing a panic is more by luck than anything else. To fix it I think I have the following choices: 1) Patch venus so it ignores the userland DIRBLKSIZ and, instead, uses DEV_BSIZE (if available) or just hard code in 512 bytes as the dirent block size. 2) Change DIRBLKSIZ in dirent.h so it matches the kernel 3) Fix mount_coda so it updates the um_dirblksiz to match userland. I have done 1 to test and a previously misbehaving directory does seem to work a lot better. I see both 1 & 3 being a bit dangerous in the event that if the definitions change again. I am not entirely sure about the implications of 2 - a quick look only showed up initdir.c in libc using the define but there may be others. Does anyone have any views on which approach would be the most correct way of fixing this? -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: A port of tpm(4) from FreeBSD or OpenBSD?
On Fri, Jan 13, 2012 at 10:10:31AM -0800, Brian Buhrow wrote: > Hello. I'm wondering if anyone has worked on a port of the tpm(4) > driver from FreeBSD or OpenBSD? I realize it's a litle late, but it seems > like a nice thing to have in NetBSD-6. > Yes, I think it would be good to support the tpm device. It would give us a secure certificate store & method of validating a certificate in the kernel. > Thoughts? IMHO we shouldn't be hurrying this into NetBSD 6. Not just this, I have seen another mention of "it would be nice to have in 6". I think we need to be wary of trying to crunch lots of new things into 6 right at the last minute and risk having a bunch of broken drivers due to inadequate testing. I don't mean to pick on you in particular, Brian, it is just that your message happened to be the second "nice to have in 6" I saw. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: Lost file-system story
On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote: > > fsck is supposed to handle *all* corruptions to the file system that can > occur as part of normal file system operation in the kernel. It is doing > best effort for others. It's a bug if it doesn't do the former and a > potential missing feature for the latter. > There are a lot of slips twixt cup and lip. If you are really unlucky you can get an outage at just the wrong time that will cause the filesystem to be hosed so badly that fsck cannot recover it. Sure, fsck can run to completion but all you have is most of your FS in lost+found which you have to be really really desperate to sort through. I have been working with UNIX for over 20years now and I have only seen this happen once and it was with a commercial UNIX. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: RFC: New security model secmodel_securechroot(9)
On Mon, Jul 11, 2011 at 08:29:11AM +0100, David Laight wrote: > > One problem is that, historically, unix privileges have always been > based on a sledgehammer approach - if you don't want everybody to > be able to do something then only root can do it. > That can and has been fixed in other Unix operating systems. The problem is that a lot of people simply cannot see the use of such a facility not because they are dumb, just that they have not encountered a situation which could not be addressed with the normal unix permissions or, perhaps, sudo. Some things that have been problematic for me in the past with the unix security model that I have been able to solve using finer grain permissions are: 1) permitting an ordinary user to run apache on port 80, allowing them to perform restarts and kill misbehaving daemons without requiring root access. 2) permitting an ordinary user to run an ldap server on port 389 using the setuid daemon dance makes things awkward when it comes to trying to kill things off. Using sudo can be a pain because someone has to be around to type the password which means you cannot schedule restarts/kills as an unprivileged user. Other things it would be useful for would be winding back the number of setuid utilities on the system - ping could be permitted to open a raw network interface, at the moment it has to be setuid to do this. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: extent-patch and overview of what is supposed to follow
On Tue, Apr 05, 2011 at 10:37:37AM +0100, Mindaugas Rasiukevicius wrote: > > I am not convinced about statistics point. For intensive allocations, > constant-sized pool_cache(9) should/would be used, where it already > gathers statistics. If there is some particular need for statistics, > one can always collect it at the caller's level. Therefore, I do not > see the need to invade allocator's API for that. > We shouldn't just consider statistics anyway, there were some useful malloc debug facilities which allowed you to know what memory had been allocated to what process. I had extended that so you could tell what processes had allocated next to the region of interest - I used this to nail a rather nasty buffer overflow that was stomping on the memory allocations of innocent parties. I don't know how I could have tracked that bug down without the level of information available from the malloc debug logs. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: panic: ffs_valloc: dup alloc
On Wed, Oct 20, 2010 at 01:11:20PM -0500, KAMADA Ken'ichi wrote: > > I had no crash for a week (two weeks if the "skip-wdstart" test is > counted). Seems good. I'm back to -current. > It seems good to me too, no crashes. Prior to this I was getting a crash at least once a day possibly more frequently. I have been doing a fsck -fn on my cgd partition and that is always coming up clean too. Good stuff. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: panic: ffs_valloc: dup alloc
On Wed, Oct 13, 2010 at 11:26:05AM -0500, KAMADA Ken'ichi wrote: > > Yes, I was testing a similar patch to yours (skip calling wdstart() > when !device_is_active()) after my previous email. It did not crash > so far. I'm now running with your patch and will be back in a > week or so. It will take some time because the panic tends to occur > after an overnight sleep rather than a short nap. > I am seeing good results with David's patch too - multiple sleep/resumes over the course of the day without a kernel panic. For me, the panics were quite frequent so this is a big improvement. I will report back after a few days... or sooner if I have issues. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: Intel Centrino 6300 Ultimate-N adapter
On Wed, Apr 07, 2010 at 04:00:36PM -0700, B Harder wrote: > > Hi Sverre -- I am interested... I've actually already got the 6000-4 > firmware, got the pcidevs setup, and worked w/ the existing driver > code to get the device attached... at this point, it's failing w/ > eeprom errors (two of them, don't recall what there are, at the > moment). If you've got something in some state of "working", you're > further ahead than I am, though, and I'd love to see that code. > As you have discovered, it is highly unlikely that you can just add the device to pci devs and use the current iwn driver to make it work. I would expect that the interface to the firmware on the card is slightly different, requiring different command structures. This can happen even between versions of firmware on the same hardware. I am not sure how different our iwn driver is from the OpenBSD one these days, perhaps patching in the card specific code from their driver would be possible. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: panic: ffs_valloc: dup alloc
On Sun, Mar 21, 2010 at 04:49:05PM -0400, Steven Bellovin wrote: > > > That sounds like maybe the problem is not on the suspend side but on > > the resume side, that is, that stuff is being written out before (some > > layer of) the disk subsystem is ready to go again. With vanilla FFS > > such writes should be synchronous so it should be (relatively) easy to > > figure out what's going on. Do you feel like trying out dtrace? :-) > > If you'd asked a week ago, sure; now, I don't have the time. I need to update my -current to something more recent... > I don't think it's just named pipes anymore; the last crash I had > did pretty horrific things to the file system, probably because some > processes were busy creating files at the time I suspended. > I was seeing the corruption on plain files, mostly duplicate allocations or freeing a free frag. It looks to me like the kernel thinks that the FS operation has been committed to disk but the bits on the physical media have not changed. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: panic: ffs_valloc: dup alloc
On Fri, Mar 19, 2010 at 05:51:46PM -0500, KAMADA Ken'ichi wrote: > > I'm seeing a panic: ffs_valloc: dup alloc. > Does anyone have a similar panic? > I have seen various file system panics after suspend/resume for quite a while: NetBSD rover 5.99.18 NetBSD 5.99.18 (ROVER2) #10: Tue Sep 29 08:18:23 CST 2009 bl...@rover:/usr/src/sys/arch/i386/compile/ROVER2 i386 I have given up on suspending because my filesystems would be corrupted with monotonous regularity. The chances of a corruption seems to increase with the amount of disk activity happening on suspend. It seems like something is not being flushed (or not being marked as flushed) when the suspend happens. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
Re: blocksizes
On Fri, Jan 22, 2010 at 01:09:10PM +0100, Michael van Elst wrote: > > Keeping DEV_SIZE at 512 bytes avoids lots of changes. > Won't that mean there is a chance there will be a lot of read/modify/write going on if the driver is pretending to have 512byte sectors? Would that not be really really bad for NFS write performance since the NFS spec says a write has to hit the media before being confirmed then wouldn't a streaming write of 512byte sectors would cause a lot of sequential read/modify/write as each 512byte chunk of the underlying disk block size was updated? -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."