Re: veriexec(4) maintenance

2024-02-26 Thread Brett Lymn


Jan,

I apologise for taking so long to get back to you.  I do appreciate you
taking the time to help with this.

> 
> first it is difficult to read a diff that also introduces a "notyet" operation
> on file pages.
> 

OK, I will remove those sections to make things clearer.  I still have a
dream of getting per-page signatures in to the kernel some day...

> Looks like you replace global and item rw-locks with refcounts/signals.
> I tried to understand the rationale behind and failed.  Your implementation
> seems at least vulnerable to livelocks as the writer side is missing priority.
> 

My aim was to improve multi-thread performance and remove some of the
more horrible hacks I made to make things work - like looping on the
acquisition of a rw lock.  I also was lead to believe that rw locks are
very heavy weight so I was trying to replace them with something less
onerous.

> In general, what problem(s) are you trying to solve here?
> 

The general process for veriexec is when the kernel comes up the tables
are blank.  Early in the boot process the signatures are loaded into the
kernel.  After the signatures are loaded and a file is executed the
signature for the file is evaluated and the result of the comparison
between the loaded and calculated signatures is cached which improves
the performance.

The things that I think need to be protected are:

1) If the tables are being loaded then any evaluations should wait until
the table is loaded.

2) If the signature for a file is being evaluated then any other
accesses to that entry should wait until the evaulation is done.

3) At some securelevel settings, the signatures can be updated on the
fly.  The signature entry must not be updated while it is being
accessed.

4) If the file system associated with a particular table of signatures
is being unmounted then that table must be cleared.

> What is wrong with a quite simple hashtable to hold fileid<->fingerprint 
> mapping accessed as:
> 
>   mutex_enter table lock
>   lookup fileid -> item
>   mutex_enter item
>   mutex_exit table lock
> 
>   while item state is busy
>   cv_wait item cv
> 
>   if item state is unknown
>   set item state busy
>   mutex_exit item
>   compute fingerprint
>   mutex_enter item
>   set item state valid/invalid
>   cv_broadcast item cv
> 
>   Here the item state is either valid or invalid
> 
>   mutex_exit item
> 

well, that was pretty much what my intention was but clearly I failed in
the implementation.

> 
> lock_state is sometimes VERIEXEC_UNLOCKED, sometimes IO_NODELOCKED.
> 
>   1629veriexec_dump(struct lwp *l, prop_array_t rarray)
>   1630{
>   1631struct mount *mp;
>   1632mount_iterator_t *mpiter;
>   1633
>   1634mountlist_iterator_init(&mpiter);
>   1635
>   1636while ((mp = mountlist_iterator_trynext(mpiter)) != 
> NULL) {
>   1637/* If it fails, the file-system is [being] 
> unmounted. */
>   1638if (vfs_busy(mp) != 0)
>   1639continue;
> 
> This is completely wrong.  Operation mountlist_iterator_trynext() already
> returns the mount busy and skips mounts currently unmounting or remounting.
> Operation mountlist_iterator_next() was right or does this function get
> called from inside unmount?
> 

No, not this one, when I first did this code there was no
mountlist_iterator framework.  This dump call should ignore file systems
being unmounted.

>   1653veriexec_flush(struct lwp *l)
> 
> This is completely wrong, see above.
> 

This one should be called on unmount.

I will fix up and simplify the locking and make sure I use the
mountlist_iterator stuff properly

Thanks again for the feedback.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: veriexec(4) maintenance

2023-12-18 Thread Brett Lymn
On Sat, Sep 02, 2023 at 08:57:03AM +0930, Brett Lymn wrote:
> On Thu, Aug 31, 2023 at 10:24:07AM +0200, J. Hannken-Illjes wrote:
> > 
> > I'm short on time, some remarks:
> > 
> 
> Thanks, I will have a closer look at these issues.  This could be where the 
> deadlock I was
> seeing is coming from.
> 

OK, I have, finally, updated the diff in line with the comments (thanks
again for the good feedback).

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"
Index: kern_veriexec.c
===
RCS file: /cvsroot/src/sys/kern/kern_veriexec.c,v
retrieving revision 1.27
diff -u -r1.27 kern_veriexec.c
--- kern_veriexec.c 9 Apr 2023 09:18:09 -   1.27
+++ kern_veriexec.c 18 Dec 2023 21:12:46 -
@@ -1,4 +1,4 @@
-/* $NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $ 
*/
+/* $NetBSD$*/
 
 /*-
  * Copyright (c) 2005, 2006 Elad Efrat 
@@ -29,7 +29,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 
riastradh Exp $");
+__KERNEL_RCSID(0, "$NetBSD$");
 
 #include "opt_veriexec.h"
 
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,11 +70,6 @@
 #define VERIEXEC_UNLOCKED  0x00/* Nothing locked, callee does it */
 #define VERIEXEC_LOCKED0x01/* Global op lock held */
 
-/* state of file locking for veriexec_file_verify */
-#define VERIEXEC_FILE_UNLOCKED 0x02/* Nothing locked, callee does it */
-#define VERIEXEC_FILE_LOCKED   0x04/* File locked */
-
-#define VERIEXEC_RW_UPGRADE(lock)  while((rw_tryupgrade(lock)) == 0){};
 
 struct veriexec_fpops {
const char *type;
@@ -86,11 +83,17 @@
 
 /* Veriexec per-file entry data. */
 struct veriexec_file_entry {
-   krwlock_t lock; /* r/w lock */
+   kmutex_t vfe_lock;  /* operations lock */
+   int vfe_refcnt; /* number of refs */
+   kcondvar_t vfe_cv;  /* wait for no refs */
u_char *filename;   /* File name. */
u_char type;/* Entry type. */
u_char status;  /* Evaluation status. */
+   u_char page_fp_status;  /* Per-page FP status. */
u_char *fp; /* Fingerprint. */
+   void *page_fp;  /* Per-page fingerprints */
+   size_t npages;  /* Number of pages. */
+   size_t last_page_size;  /* To support < PAGE_SIZE */
struct veriexec_fpops *ops; /* Fingerprint ops vector*/
size_t filename_len;/* Length of filename. */
 };
@@ -120,15 +123,69 @@
 void *, void *, void *, void *);
 static struct veriexec_fpops *veriexec_fpops_lookup(const char *);
 static void veriexec_file_free(struct veriexec_file_entry *);
+static void veriexec_busy(void);
+static void veriexec_unbusy(void);
+static void vfe_busy(struct veriexec_file_entry *vfe);
+static void vfe_unbusy(struct veriexec_file_entry *vfe);
 
 static unsigned int veriexec_tablecount = 0;
 
 /*
- * Veriexec operations global lock - most ops hold this as a read
- * lock, it is upgraded to a write lock when destroying veriexec file
- * table entries.
+ * Veriexec operations global lock.
+ */
+static kmutex_t veriexec_op_lock;
+static kcondvar_t veriexec_cv;
+static int veriexec_refcnt;
+
+/*
+ * Busy a file table entry.
+ */
+static void
+vfe_busy(struct veriexec_file_entry *vfe)
+{
+   mutex_enter(&vfe->vfe_lock);
+   vfe->vfe_refcnt++;
+   mutex_exit(&vfe->vfe_lock);
+}
+
+/*
+ * Drop a reference to a file table entry, if count zero then wake
+ * up any potential waiters.
+ */
+static void
+vfe_unbusy(struct veriexec_file_entry *vfe)
+{
+   mutex_enter(&vfe->vfe_lock);
+   KASSERT(vfe->vfe_refcnt > 0);
+   if (--(vfe->vfe_refcnt) == 0)
+   cv_broadcast(&vfe->vfe_cv);
+   mutex_exit(&vfe->vfe_lock);
+}
+
+/*
+ * Grab the oplock and bump the reference count.
  */
-static krwlock_t veriexec_op_lock;
+static void
+veriexec_busy(void)
+{
+   mutex_enter(&veriexec_op_lock);
+   veriexec_refcnt++;
+   mutex_exit(&veriexec_op_lock);
+}
+
+/*
+ * Decrement the reference count, if we hit count zero then wake up
+ * any possible waiters.
+ */
+static void
+veriexec_unbusy(void)
+{
+   mutex_enter(&veriexec_op_lock);
+   KASSERT(veriexec_refcnt > 0);
+   if (--veriexec_refcnt == 0)
+   cv_

Re: veriexec(4) maintenance

2023-09-01 Thread Brett Lymn
On Thu, Aug 31, 2023 at 10:24:07AM +0200, J. Hannken-Illjes wrote:
> 
> I'm short on time, some remarks:
> 

Thanks, I will have a closer look at these issues.  This could be where the 
deadlock I was
seeing is coming from.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: veriexec(4) maintenance

2023-08-29 Thread Brett Lymn
On Tue, Aug 15, 2023 at 07:52:46AM +0930, Brett Lymn wrote:
> 
> I tried sending this with the diff but I think it was a bit big,
> here is the text, the diff is at ftp.netbsd.org under
> /pub/NetBSD/misc/blymn/veriexec_update
> 

OK, it has been a couple of weeks since I posted this and I have
had zero responses.  I have tested the changes and they seem ok
on my daily usage pattern.

Does anyone have _any_ comments or should I just commit the
changes?

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: veriexec(4) maintenance

2023-08-14 Thread Brett Lymn


I tried sending this with the diff but I think it was a bit big,
here is the text, the diff is at ftp.netbsd.org under
/pub/NetBSD/misc/blymn/veriexec_update

On Mon, Aug 14, 2023 at 07:51:50AM +0930, Brett Lymn wrote:
> On Fri, Aug 04, 2023 at 11:24:22PM +, David Holland wrote:
> > 
> > I was looking at this last weekend and for veriexec I have one fix and
> > some comments/notes, which I should probably commit.
> > 
> 
> OK, that sounds good.
> 
> > fileassoc I would recommend flushing and replacing with a proper
> > specificdata implementation for vnodes, which there should be no
> > particular barrier to. fileassoc is trying to be that but it's both
> > bust and doing it wrong. :-|
> > 
> 
> I think fileassoc was added with the view for using it for other things
> but in the end only veriexec is using it.  We can revert it back to
> it being veriexec specific.
> 
> I have attached a diff that is a version of veriexec that uses condvars
> and ref counters to handle the access control.  It compiles but
> hasn't been run tested on a current kernel (it worked when i used it
> before but it has been a long time).
> 

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: veriexec(4) maintenance

2023-08-13 Thread Brett Lymn
On Fri, Aug 04, 2023 at 11:24:22PM +, David Holland wrote:
> 
> I was looking at this last weekend and for veriexec I have one fix and
> some comments/notes, which I should probably commit.
> 

OK, that sounds good.

> fileassoc I would recommend flushing and replacing with a proper
> specificdata implementation for vnodes, which there should be no
> particular barrier to. fileassoc is trying to be that but it's both
> bust and doing it wrong. :-|
> 

I think fileassoc was added with the view for using it for other things
but in the end only veriexec is using it.  We can revert it back to
it being veriexec specific.

I have attached a diff that is a version of veriexec that uses condvars
and ref counters to handle the access control.  It compiles but
hasn't been run tested on a current kernel (it worked when i used it
before but it has been a long time).

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"
Index: kern_veriexec.c
===
RCS file: /cvsroot/src/sys/kern/kern_veriexec.c,v
retrieving revision 1.27
diff -u -r1.27 kern_veriexec.c
--- kern_veriexec.c 9 Apr 2023 09:18:09 -   1.27
+++ kern_veriexec.c 11 Aug 2023 07:17:50 -
@@ -1,4 +1,4 @@
-/* $NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 riastradh Exp $ 
*/
+/* $NetBSD$*/
 
 /*-
  * Copyright (c) 2005, 2006 Elad Efrat 
@@ -29,7 +29,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: kern_veriexec.c,v 1.27 2023/04/09 09:18:09 
riastradh Exp $");
+__KERNEL_RCSID(0, "$NetBSD$");
 
 #include "opt_veriexec.h"
 
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,11 +70,6 @@
 #define VERIEXEC_UNLOCKED  0x00/* Nothing locked, callee does it */
 #define VERIEXEC_LOCKED0x01/* Global op lock held */
 
-/* state of file locking for veriexec_file_verify */
-#define VERIEXEC_FILE_UNLOCKED 0x02/* Nothing locked, callee does it */
-#define VERIEXEC_FILE_LOCKED   0x04/* File locked */
-
-#define VERIEXEC_RW_UPGRADE(lock)  while((rw_tryupgrade(lock)) == 0){};
 
 struct veriexec_fpops {
const char *type;
@@ -86,11 +83,17 @@
 
 /* Veriexec per-file entry data. */
 struct veriexec_file_entry {
-   krwlock_t lock; /* r/w lock */
+   kmutex_t vfe_lock;  /* operations lock */
+   int vfe_refcnt; /* number of refs */
+   kcondvar_t vfe_cv;  /* wait for no refs */
u_char *filename;   /* File name. */
u_char type;/* Entry type. */
u_char status;  /* Evaluation status. */
+   u_char page_fp_status;  /* Per-page FP status. */
u_char *fp; /* Fingerprint. */
+   void *page_fp;  /* Per-page fingerprints */
+   size_t npages;  /* Number of pages. */
+   size_t last_page_size;  /* To support < PAGE_SIZE */
struct veriexec_fpops *ops; /* Fingerprint ops vector*/
size_t filename_len;/* Length of filename. */
 };
@@ -120,15 +123,69 @@
 void *, void *, void *, void *);
 static struct veriexec_fpops *veriexec_fpops_lookup(const char *);
 static void veriexec_file_free(struct veriexec_file_entry *);
+static void veriexec_busy(void);
+static void veriexec_unbusy(void);
+static void vfe_busy(struct veriexec_file_entry *vfe);
+static void vfe_unbusy(struct veriexec_file_entry *vfe);
 
 static unsigned int veriexec_tablecount = 0;
 
 /*
- * Veriexec operations global lock - most ops hold this as a read
- * lock, it is upgraded to a write lock when destroying veriexec file
- * table entries.
+ * Veriexec operations global lock.
+ */
+static kmutex_t veriexec_op_lock;
+static kcondvar_t veriexec_cv;
+static int veriexec_refcnt;
+
+/*
+ * Busy a file table entry.
+ */
+static void
+vfe_busy(struct veriexec_file_entry *vfe)
+{
+   mutex_enter(&vfe->vfe_lock);
+   vfe->vfe_refcnt++;
+   mutex_exit(&vfe->vfe_lock);
+}
+
+/*
+ * Drop a reference to a file table entry, if count zero then wake
+ * up any potential waiters.
  */
-static krwlock_t veriexec_op_lock;
+static void
+vfe_unbusy(struct veriexec_file_entry *vfe)
+{
+   mutex_enter(&vfe->vfe_lock);
+   KASSERT(vfe->vfe_refcnt > 0);
+   if (--(vfe->vfe_refcnt) == 0)
+   cv_broadcast(&vfe->vfe_cv);
+   mutex_exit(&vfe->vfe_lock);
+}
+
+/*
+ * Grab the oplock 

Re: veriexec(4) maintenance

2023-08-02 Thread Brett Lymn
On Wed, Aug 02, 2023 at 07:47:48AM +, Taylor R Campbell wrote:
> veriexec(4), and the fileassoc(9) API it uses internally, needs
> maintenance and probably some serious rework to fix synchronization
> problems on any MP and/or preemptible kernels (i.e., all x86 of the
> past couple decades) with implications for any security properties it
> is supposed to provide.
> 
> If anyone cares about veriexec(4) and would like to volunteer to take
> this on, we can discuss what needs to be done and how to proceed.
> 

I will take this on since I was originally responsible for the mess.  It
does play fast and loose with synchronisation and should be fixed.

I have some fixes in my private tree that went some way to addressing
the issues but they have been languishing for many many years because I
had an issue of a deadlock that would occaisionally occur.  At the time
I couldn't work out where the deadlock was coming from but maybe with
some expert help and the improved tools I can nut it out.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: protect pmf from network drivers that don't provide if_stop

2021-07-01 Thread Brett Lymn
On Thu, Jul 01, 2021 at 06:47:08AM -0300, Jared McNeill wrote:
> Not really a fan of this as it doesn't protect other potential if_stop users
> (and "temporary fix" rarely is..). How about something like this instead?
> 

Yes, that is a much better idea, thanks.  I have backed out my change
and committed the change to if.c you suggested.

Martin, I guess this make the PR I raised redundant.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: protect pmf from network drivers that don't provide if_stop

2021-06-29 Thread Brett Lymn
On Tue, Jun 29, 2021 at 02:36:35PM +0200, Martin Husemann wrote:
> 
> It is already merge hell and all drivers do it completley different on
> the branch - so I'd prefer not to touch all drivers in head.
> 

OK, I won't bash on the drivers then.

> Not calling it if NULL is fine (we could do that now), but we should
> make sure all drivers properly set it - only now is not a good time to
> KASSERT for it, we should add that after the merge.
> 

So, does this mean its ok to commit what I proposed to prevent the panic
with a view to the drivers being fixed in the future?

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: protect pmf from network drivers that don't provide if_stop

2021-06-29 Thread Brett Lymn
On Tue, Jun 29, 2021 at 11:16:37AM +0200, Martin Husemann wrote:
> On Tue, Jun 29, 2021 at 03:46:20PM +0930, Brett Lymn wrote:
> > I turned up a fix I had put into my source tree a while back, I think at
> > the time the wireless driver (urtwn IIRC) did not set an entry for
> > if_stop.
> 
> This is a driver bug, we should not work around it but catch it early
> and fix it.
> 

ok but I believe that it is a common issue with the usb wifi drivers which is a 
bit odd
since they are more likely to be attached to platforms that sleep.

> Side note: the wifi drivers fiddling with struct ifnet will stop
> once the new wifi code gets into the main tree.
> 

Is that going to be soon? I was thinking of doing a sweep of the usb network 
drivers to fix
this.  It is not affecting me at the moment because the driver for wireless I 
am using at
the moment actually defines if_stop.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


protect pmf from network drivers that don't provide if_stop

2021-06-28 Thread Brett Lymn


Folks,

I turned up a fix I had put into my source tree a while back, I think at
the time the wireless driver (urtwn IIRC) did not set an entry for
if_stop.  This resulted in a kernel panic if you tried to sleep the
machine pmf_class_network_suspend would try to call a null if_stop.

I crafted the following to preserve my sanity, ok to commit?


Index: kern_pmf.c
===
RCS file: /cvsroot/src/sys/kern/kern_pmf.c,v
retrieving revision 1.45
diff -u -r1.45 kern_pmf.c
--- kern_pmf.c  11 Jun 2020 02:30:21 -  1.45
+++ kern_pmf.c  29 Jun 2021 01:27:01 -
@@ -892,11 +892,18 @@
struct ifnet *ifp = device_pmf_class_private(dev);
int s;
 
-   s = splnet();
-   IFNET_LOCK(ifp);
-   (*ifp->if_stop)(ifp, 0);
-   IFNET_UNLOCK(ifp);
-   splx(s);
+   if (ifp == NULL)
+   return true;
+
+   if ((*ifp->if_stop) == NULL)
+   printf("device %s has no if_stop\n", ifp->if_xname);
+   else {
+   s = splnet();
+   IFNET_LOCK(ifp);
+   (*ifp->if_stop)(ifp, 0);
+   IFNET_UNLOCK(ifp);
+   splx(s);
+   }
 
    return true;
 }

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: fixing coda in -current

2018-11-26 Thread Brett Lymn
On Sun, Nov 25, 2018 at 08:33:53PM +, David Holland wrote:
> 
> So I have no immediate comment on the patch but I'd like to understand
> better what it's doing -- the last time I crawled around in it
> (probably 7-8 years ago) it appeared to among other things have an
> incestuous relationship with ufs_readdir such that if you tried to use
> anything under than ffs as the local container it would detonate
> violently. But I never did figure out exactly what the deal was other
> than it was confusing and seemed to violate a lot of abstractions.
> 

The tricky thing about coda is that a lot of the processing that is
usually done within the VFS kernel code is actually processed in
userland the the coda daemon venus.  What happens is a userland
processes the coda fs, the coda kernel code "upcalls" that requests to
the venus daemon which processess it and may ask the kernel module for
information before returning results via the kernel module to the
userlan process.  So the FS sort of hangs half in and half out of the
kernel - if both do not agree on structures and other details then nasal
demons ensue.

I haven't looked at how the VFS ops structure is filled in but I can
easily see if the kernel module simply takes the underlying file system
ops then things will lose because venus expects ufs style dirents.  This
is really the only thing at the moment that I can think of that would
cause explosions IIRC the coda fs data is just kept in a single file
blob sitting on the host fs (venus does the work to track writes that
are pending server reconnection).

> Can you clarify? It would be nice to have it working properly and
> stuff like the above is only going to continue to fail in the future...
> 

Now that you have made me think about this more I am thinking that the
fix that has gone in is more or less a bandaid - I should track this
back a bit further as it is more than likely that a more correct fix
exists inside venus instead of the kernel code.  What we have now won't
hurt and at least stops the super annoying KASSERT when I touch /coda.
I need to have a look at the coda pkgsrc anyway, my patches to support
mount_coda are conflicting which I need to sort out.

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


Re: fixing coda in -current

2018-11-21 Thread Brett Lymn
On Wed, Nov 21, 2018 at 09:01:40AM +0100, Martin Husemann wrote:
> 
> No, you can easily check for the presence of the userland binaries in ATF
> and skip the test if they are missing.
> 

OK, no problems

> I can setup a server for my test environment and install the userland tools
> on the test machines, so we could run the full fs tests suite against coda.
> 

If you need any assistance with configuring coda then please let me
know.  It has been a long time but hopefully I can recall something
useful...

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


Re: fixing coda in -current

2018-11-20 Thread Brett Lymn
On Tue, Nov 20, 2018 at 07:39:26PM +, m...@netbsd.org wrote:
> On Tue, Nov 20, 2018 at 08:06:37PM +0100, Hauke Fath wrote:
> > ISTR that somebody on the CODA mailing-list suggested a re-implementation
> > as userland file-system, but I don't think much has happened on that front.
> 
> note that you can cd /usr/tests/fs; atf-run as user, because
> they're mostly not running on the kernel :-)

You need the userland installed from pkgsrc and a coda server configured
before it will work so setting up atf is challenging.

I do have some private code that provides a mount_coda so you can have a
fstab entry but it still relies on pkgsrc being there to work.

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


Re: fixing coda in -current

2018-11-20 Thread Brett Lymn
On Tue, Nov 20, 2018 at 10:18:56AM -0500, Greg Troxel wrote:
> I used to use it, and may again.  So I'd like to see it stay, partly
> because I think it's good to keep NetBS relevant in the fileystem
> research world.  I am expecting to see new upstream activity.
> 

I guess it is obvious that I use it too.  There is still some activity
going on at CMU around coda so it has not been abandoned by the
upstream.  It is difficult to automate the testing because you need the
userland support there too - the kernel driver does a bunch of upcalls
to a userland daemon to perform most of the work.

I use coda to keep my work in progress stuff on so I can work on my
laptop or my desktop, have the changes shared but then be able to take
my laptop off network and keep working, when I get back home the changes
can be synced without me having to do anything.  Sure, there are other
ways of doing this but trying to maintain a shared tree potentially
changing either side is difficult when you don't have constant network.

> But, I think it makes sense to remove it from GENERIC, and perhaps have
> whatever don't-autoload treatment, so that people only get it if they
> explicitly ask for it.  That way it should not bother others.
> 

Yes, this would be fine.  Mind you, you have to explicitly install the
userland from pkgsrc so the driver is dormant until that happens.

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


fixing coda in -current

2018-11-20 Thread Brett Lymn


I am guessing that not many people use coda (or they just haven't
complained) as it seems like the coda kernel support has suffered some
bit-rot.  Trying to access a coda file system on -current results in a
couple of KASSERTs firing - the first is easy, we need to lock the vnode
on readdir but after doing that there is a check in ufs_readwrite.c to
ensure the vnode is either a VDIR or VLNK type, neither of which is true
for coda because the whole coda file system is contained in a regular
file on the host file system so the vnode type is VREG.  The following
patch makes coda work for me.  I don't think that manipulating the vnode
type is really great but a lesser evil than trying to create a duplicate
of the ufs readdir code and also not as bad as removing the KASSERTs
which are useful sanity checks in the majority of use cases.

Comments? Anyone really care?

Index: coda_vnops.c
===
RCS file: /cvsroot/src/sys/coda/coda_vnops.c,v
retrieving revision 1.106
diff -u -r1.106 coda_vnops.c
--- coda_vnops.c26 May 2017 14:21:00 -  1.106
+++ coda_vnops.c20 Nov 2018 09:42:31 -
@@ -1537,6 +1537,7 @@
 /* upcall decl */
 /* locals */
 int error = 0;
+enum vtype saved_type;
 
 MARK_ENTRY(CODA_READDIR_STATS);
 
@@ -1569,7 +1570,13 @@
/* Have UFS handle the call. */
CODADEBUG(CODA_READDIR, myprintf(("%s: fid = %s, refcnt = %d\n",
__func__, coda_f2s(&cp->c_fid), vp->v_usecount)); )
+saved_type = vp->v_type;
+   vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
+vp->v_type = VDIR; /* pretend the container file is a dir */
error = VOP_READDIR(vp, uiop, cred, eofflag, cookies, ncookies);
+vp->v_type = saved_type;
+   VOP_UNLOCK(vp);
+
if (error)
MARK_INT_FAIL(CODA_READDIR_STATS);
else

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Brett Lymn
 
 I would be interested to finish that off, I need to make some time to
get to doing it though.
I have been sitting on some changes to veriexec for ~ years that
change it from locking everything to using reference counts and
condition variables which removes some nasty hacks I did.  I have not
committed the changes because the kernel would sometimes deadlock and
I was having trouble tracking down why.  Perhaps I was looking in the
wrong spot for the error and it was fileassoc all along that was
causing the deadlock.

- Original Message -
From: "Mindaugas Rasiukevicius" 
To:"Jason Thorpe" 
Cc:
Sent:Fri, 10 Aug 2018 00:12:23 +0100
Subject:Re: 8.0 performance issue when running build.sh?

 Jason Thorpe  wrote:
 > 
 > 
 > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
 > > 
 > > Actually, I wonder if we could kill off the time spent by
fileassoc. Is
 > > it still used only by veriexec? We can easily option that out of
the
 > > build box kernels.
 > 
 > Indeed. And there are better ways to do what veriexec does, in any
case.
 > 

 Many years ago I wrote a diff to make fileassoc MP-safe:

 http://www.netbsd.org/~rmind/fileassoc.diff

 If somebody wants to finish -- I am glad to help.

 -- 
 Mindaugas




Re: veriexec

2017-08-29 Thread Brett Lymn
On Tue, Aug 29, 2017 at 10:36:57AM +0800, Paul Goyette wrote:
> On Tue, 29 Aug 2017, Paul Goyette wrote:
> 
> >While looking at Sevan's recent PR, I notice a couple of problems with the 
> >current code.
> >
> >In sys/kern_veriexec.c routine veriexec_file_add(), at line 1072 we 
> >allocate a vfe entry, and initialize the rw_lock contained therein.
> >
> >Then there are some error branches at lines 1082 and 1090 that simply 
> >"goto out" without ever destroying the rw_lock nor free()ing the vfe.
> >
> >(diffs removed, since they were wrong!)
> >
> >Comments or other suggestions?
> 
> On IRC, riastradh@ pointed out that the above diffs are not quite right, 
> since in the success case we need to avoid destroying the rwlock and 
> free()ing the vfe entry.
> 

LGTM.  Thanks.

-- 
Brett Lymn
This email has been sent on behalf of one of the following companies within the 
BAE Systems Australia group of companies:

BAE Systems Australia Limited - Australian Company Number 008 423 005
BAE Systems Australia Defence Pty Limited - Australian Company Number 006 
870 846
BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 
228 864

Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
Edinburgh, South Australia, 5111. If the identity of the sending company is
not clear from the content of this email please contact the sender.

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy or
disclose its content, but please reply to this email immediately and highlight
the error to the sender and then immediately delete the message.



Re: bin/52512: Duplicate files prevent veriexecctl from loading signature file

2017-08-29 Thread Brett Lymn
On Tue, Aug 29, 2017 at 06:40:15PM +0800, Paul Goyette wrote:
> >>It is unclear whether we should modify veriexecctl to not set the error 
> >>for EEXIST, or if we should modify veriexecgen to not generate multiple 
> >>entries (with different names) for the same file.  (It seems to me 
> >>unreasonable to expect the user to remove the duplicates.)
> >
> >A third option would be to modify the kernel code to not complain when 
> >attempting to add a new entry, as long as the new fingerprint type and 
> >value match the existing values.
> 
> The attached patch implements this option.
> 

Looks good to me.  Thanks for fixing this.

-- 
Brett Lymn
This email has been sent on behalf of one of the following companies within the 
BAE Systems Australia group of companies:

BAE Systems Australia Limited - Australian Company Number 008 423 005
BAE Systems Australia Defence Pty Limited - Australian Company Number 006 
870 846
BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 
228 864

Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
Edinburgh, South Australia, 5111. If the identity of the sending company is
not clear from the content of this email please contact the sender.

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy or
disclose its content, but please reply to this email immediately and highlight
the error to the sender and then immediately delete the message.



Re: Lockless IP input queue, the pktqueue interface

2014-05-28 Thread Brett Lymn
On Thu, May 29, 2014 at 08:12:35AM +1000, Darren Reed wrote:
> 
> The method that I've seen used in Solaris (for example) is to use
> foo_impl.h to providethe details of data structure that are essentially
> private and those .h filesmay or may notbe shipped as part of the end
> user system.Using pktqueue_private.h might be suitable.
> 

curses.h already does this, for userland the WINDOW (and other) pointers
are opaque handles.  The actual definition is hidden in curses_private.h
which is not installed.

-- 
Brett Lymn
This email has been sent on behalf of one of the following companies within the 
BAE Systems Australia group of companies:

BAE Systems Australia Limited - Australian Company Number 008 423 005
BAE Systems Australia Defence Pty Limited - Australian Company Number 006 
870 846
BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 
228 864

Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
Edinburgh, South Australia, 5111. If the identity of the sending company is
not clear from the content of this email please contact the sender.

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy or
disclose its content, but please reply to this email immediately and highlight
the error to the sender and then immediately delete the message.



Re: Porting Linux SCSI ioctl stuff (was: Re: LTFS)

2013-12-18 Thread Brett Lymn
On Wed, Dec 18, 2013 at 03:34:31PM +, Emmanuel Dreyfus wrote:
> 
> With sensedata defined as:  unsigned char sensedata[128]
> 
> What does that mean? i woudl have expected to have buffer
> lengths at the same size.
> 

Sounds like someone is not following the scsi specification.

-- 
Brett Lymn
This email has been sent on behalf of one of the following companies within the 
BAE Systems Australia group of companies:

BAE Systems Australia Limited - Australian Company Number 008 423 005
BAE Systems Australia Defence Pty Limited - Australian Company Number 006 
870 846
BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 
228 864

Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
Edinburgh, South Australia, 5111. If the identity of the sending company is
not clear from the content of this email please contact the sender.

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy or
disclose its content, but please reply to this email immediately and highlight
the error to the sender and then immediately delete the message.



Re: bus_dmamap_destroy no longer callable from interrupt context?

2013-11-15 Thread Brett Lymn
On Fri, Nov 15, 2013 at 10:31:52PM +0100, Christoph Badura wrote:
> 
> Bummer. It sure looks to me like bus_dmamap_destroy used to be callable
> from interrupt context.  And the Open/FreeBSD code being identical leads me
> to believe they do support destroying in interrupt context.
> 

You can't really go by what OpenBSD do, they still have the big kernel
lock so their scope for race conditions is a lot smaller.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: NGROUPS/NGROUPS_MAX

2013-10-09 Thread Brett Lymn
On Wed, Oct 09, 2013 at 02:11:09PM +, Brian Ginsbach wrote:
> 
> Recent versions of AIX, Solaris, and Linux contain work-arounds that
> essentially by-pass the 16 group limitation (RFC 5531).  The
> degree of by-pass varies (the number of additional groups may not
> necessarily be unlimited).

this may well be but just FYI, samba will do an assert on a user in more
that 16 unix groups.  This may not be relevant in the current context
but just putting it out there :)

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: Netbsd /avr32 update.

2013-09-30 Thread Brett Lymn
On Mon, Sep 30, 2013 at 10:06:46AM +0200, Martin Husemann wrote:
> On Sun, Sep 29, 2013 at 08:36:15PM -0300, Tomas Niño Kehoe wrote:
> > The next step, besides continuing / fixing the actual port is to start
> > working on the toolchain. Particularly an llvm backend for NetBSD/avr32 
> > seems
> > the way to go.
> 
> For those of us not knowing the details: isn't avr32 supported by stock gcc?
> If not, why? If only in later versions, which?
> 

This may or may not help you, this is what I did to build gcc-4.7.2 so I
could rebuild some arduino firmware:

http://implementality.blogspot.com.au/2013/01/building-arduino-firmware-on-netbsd.html


-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: fixing the vnode lifecycle

2013-09-25 Thread Brett Lymn
On Wed, Sep 25, 2013 at 02:33:43PM +0200, J. Hannken-Illjes wrote:
> On Sep 22, 2013, at 5:28 AM, David Holland  wrote:
> 
> 
> > First, obviously the vfs-level vnode cache code should provide vnode
> > lookup so file systems don't need to maintain their own vnode
> > tables. Killing off the fs-level vnode tables not only simplifies the
> > world but also avoids exposing a number of states and operations
> > required only to keep the fs-level table in sync with everything else.
> > This will help a great deal to get rid of the bodgy locking and all
> > the races.
> > 
> > (If any file system does anything fancy with its vnode table, other
> > than iterating it and doing lookups by inode number, I'd like to find
> > out.)
> 
> Expect some file systems to use  a key size != sizeof(ino_t) -- nfs
> for example uses file handles up to 64 bytes.
> 

Right, as do a few other file systems.  IIRC all file systems provide a
filehandle generation routine, the vnodes should be looked up using the
handle not the inode number.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: 4.0.1 i386 wedged

2013-05-22 Thread Brett Lymn
On Tue, May 21, 2013 at 12:25:40PM -0700, Brian Buhrow wrote:
>   hello.  I was under the impression that vnconfigging an nfs-mounted
> file continues to not work to today -- and that is why xen has such a hard
> time supporting guests under NFS.  In other words, "Don't do that".  If you
> need such a big file  vnconfigged, use local disk.
>   I could be wrong on this, but I'm pretty sure I have it right.
> 

It _used_ to work - in recent years I did a recovery of a linux lvm
backup by nfs mounting the storage that had the file created by
dd'ing the linux disk, doing a vnconfig and then a lvm change to get at
the linux file systems.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: Questions about LVM compatibility

2013-02-26 Thread Brett Lymn
On Tue, Feb 26, 2013 at 04:07:40PM +0100, tech mailinglists wrote:
> 
> Is the NetBSD implementation of LVM compatible with the Linux 
> implementation of LVM?
> 

I was able to mount LVM volumes from a RHEL 5 system onto NetBSD without
a problem, I didn't have success with ones from RHEL 6.  If your LVM is
one that NetBSD supports then you should be fine.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: fixing compat_12 getdents

2012-12-10 Thread Brett Lymn
On Mon, Dec 10, 2012 at 09:23:15PM +, David Laight wrote:
> 
> Then people get upset because they say "function foo() isn't allowed
> to set errno to 'bar'".
> It is rather a shame that posix tries to list all errno a function
> can return, not just those for explicit 'likely' (ie normal)
> non-success returns froma function.
> 

Then how do you sanely write error handling routines?  The error returns
form part of the interface and should be documented as such.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: Tips on programming usb devices?

2012-09-10 Thread Brett Lymn
On Mon, Sep 10, 2012 at 03:05:48PM +0100, Iain Hibbert wrote:
> 
> > The first 4 interfaces present themselves as  general communications ports,
> > of which the first permits standard serial connections through which modem
> > Hayes style AT commands can be sent to control the modem's behavior.   The
> > other 3 ports are available and, I believe, work, but I don't know how to
> > use them.
> 
> one of them may be a GPS port.. the uhso devices seem to also have these
> (though mine does not)
> 

One is probably a status monitor port for the HSDPA modem, this can be
used to monitor signal strength and so forth (don't ask me how
though...)

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: SAS scsibus target numbering

2012-07-27 Thread Brett Lymn
On Fri, Jul 27, 2012 at 09:49:38AM +1000, Simon Burge wrote:
> > 
> > ISTR SPARC machines used an initiator address 6 for hysterical reasons.  I 
> > forget why.
> 
> Likewise for some early models of DECstation (2100 and 3100?).  I never
> understood why.
> 

Priorities.  When negotiating for the bus the highest numbered device
wins if there is an arbitration clash... iirc ids greater than 7 have a
priority lower than id 0, I presume for compat reasons.  So for a scsi
ID 0 - 7 the highest wins, it makes sense that the controller is a high
number (as Mouse said it is normally 7 on old sun hardware) so that it
can grab the bus when it wants to talk.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: DIRBLKSIZ differs between userland & kernel

2012-02-27 Thread Brett Lymn
On Mon, Feb 27, 2012 at 07:50:51AM +, David Holland wrote:
>  > 
>  > Right.
>  > What the vfs layer for coda does for coda_readdir is wrap some goop
>  > around doing a VOP_READDIR on the underlying filesystem.  In my case
>  > since the containers lived on ufs that is what was used.
> 
> Sure. But then, as long as syntactically valid dirents come back it
> shouldn't matter what DIRBLKSIZ is.
> 

Well, syntactically correct they may be but if a dirent straddles the
DIRBLKSIZE boundary then things will break.  I think this is what
happens.  Venus carefully aligns dirents to the 1024 byte boundary but
ufs_readdir() rounds down to the nearest 512 byte boundary when it is
doing its processing - if that happens to be part way through a dirent
that venus has provided then the nasal demons happen.

At the moment I am wondering if I shouldn't just do a statvfs() on the
filesystem and use the device block size instead of DIRBLKSIZE.

>  > 
>  > Well venus pads them to 1024 - there is a chance that venus won't fill
>  > the entire block, if venus less than half fills the block then
>  > ufs_readdir() gets garbage.
> 
> This I still don't understand. ufs_readdir doesn't read anything from
> venus.
> 

Actually it does.  The way that coda works currently is a bit
complicated, it does part of its work in the kernel but part of it is
passed up to venus.  A lot of the file system semantics are handled
within venus, the fs data is actually stored in a container file and
venus is the one that handles getting data into the container from the
server (if it is not already cached in there), pushing data from the
container back to the server (when there is a connection), satisfying
reads/writes from the local system and a lot of other things.

When you do a getdents(2) call on a coda directory the readdir syscall
eventually gets called. This is directed to coda_readdir() by
VOP_READDIR, coda_readdir() checks if the container file is open
already and opens it if it is not.  Once the container file is open then
coda_readdir() calls VOP_READDIR on the underlying fs, in my case this
gets directed to ufs_readdir().  Inside ufs_readdir() a VOP_READ is
performed to get the directory data.  In this case this is handled by
coda_read() which forms up a request does an upcall to userland for
venus to handle the request.  Venus looks at the request and sees that
it is a request for a read of a directory so it takes the coda directory
entries and on the fly forms up a series of dirents to satisfy the read
request and hands this back to the kernel.  Then ufs_readdir() processes
this data.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: DIRBLKSIZ differs between userland & kernel

2012-02-26 Thread Brett Lymn
On Mon, Feb 27, 2012 at 02:43:51AM +, David Holland wrote:
> 
> I don't think any of those is the right answer. Coda is not limited to
> running on top of ffs, so it shouldn't be doing only filesystem-
> independent things when talking to the filesystem it uses for storage.
> (Right?)
> 

Right.
What the vfs layer for coda does for coda_readdir is wrap some goop
around doing a VOP_READDIR on the underlying filesystem.  In my case
since the containers lived on ufs that is what was used.

> Therefore it should be using the value from  in both the
> kernel and in venus. If it's running on top of ffs, ffs will provide
> dirents with padding at 512-byte intervals that it would think
> unnecessary, but I would think it shouldn't notice or care.
>

Well venus pads them to 1024 - there is a chance that venus won't fill
the entire block, if venus less than half fills the block then
ufs_readdir() gets garbage.
 
> Then again, maybe I don't understand what's going on, as there
> shouldn't be any way for ufs_readdir to see, much less trip on,
> dirents generated by venus.

As above, the coda_readdir() in coda_vnops.c wraps some goop around a
VOP_READDIR - pushing the handling of the dirents to the underlying fs
method.

> 
> Note that ffs needs DIRBLKSIZ to be the same as the underlying atomic
> I/O size, or various unspecified bad things can happen in crashes. So
> you can't change what ffs is doing.
> 

Absolutely no thought in my mind about changing the size of DIRBLKSIZ in
the kernel, I do understand the implications of doing that and until
(if?) everyone has 4k sectors it probably will not change.  What I was
talking about in relation to this is that ufs_readdir() pulls the
directory block size from a "ufs-ified" version of the mount structure
attached as private data to the mount struct.  I had thought changing
that would fix the problem but as you rightly point out we are not just
dealing with ufs here so I could easily fix the problem for me but still
leave a lurking bug :(

> Also, I have no idea why the userland value diverged from the ffs
> value, but I doubt it's safe to change it without adding a large pile
> of compat wibble.
> 

Yep, this is why I asked :)

> Finally, we should not not not have duplicate symbol names like this.
> I guess now that we've branched I can go clean it up...
> 

That would be great - we will still have a problem with venus but at
least others won't be bitten by this weirdness.

I am thinking that maybe the best thing would be to try and get venus to
read the directory block size from the underlying file system and use
that, if I can.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




DIRBLKSIZ differs between userland & kernel

2012-02-26 Thread Brett Lymn

I have been tracking through a bug with Coda where, basically, getdents(2)
is not returning all the directory entries.  The files exist in the
directory but do not show up on a globbed listing.  After some digging I
found that things ended up in ufs_readdir() which was terminating early
due to a bad dirent.

What is happening is the userland part of Coda, venus, manufactures a
the dirents for a directory when a read request comes up from the
kernel.  It has code to carefully avoid a dirent spanning a DIRBLKSIZ
boundary by padding a dirent near the boundary.  This data is returned
to the kernel for processing.  Where things come unstuck is DIRBLKSIZ is
defined in /usr/include/dirent.h as 1024 bytes, inside the kernel ufs
code sys/ufs/ufs/dir.h DIRBLKSIZE is set to DEV_BSIZE which is 512
bytes.  This means that venus can produce a block of dirents that finish
before the 512 byte mark, the kernel code tries to align back to a 512
byte boundary and fails to find a valid dirent - the fact that ufs_readdir()
exits gracefully rather than causing a panic is more by luck than
anything else.

To fix it I think I have the following choices:

1) Patch venus so it ignores the userland DIRBLKSIZ and, instead, uses
DEV_BSIZE (if available) or just hard code in 512 bytes as the dirent
block size.

2) Change DIRBLKSIZ in dirent.h so it matches the kernel

3) Fix mount_coda so it updates the um_dirblksiz to match userland.

I have done 1 to test and a previously misbehaving directory does seem
to work a lot better.  I see both 1 & 3 being a bit dangerous in the
event that if the definitions change again.  I am not entirely sure
about the implications of 2 - a quick look only showed up initdir.c in
libc using the define but there may be others.

Does anyone have any views on which approach would be the most correct
way of fixing this?

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: A port of tpm(4) from FreeBSD or OpenBSD?

2012-01-13 Thread Brett Lymn
On Fri, Jan 13, 2012 at 10:10:31AM -0800, Brian Buhrow wrote:
>   Hello.  I'm wondering if anyone has worked on a port of the tpm(4)
> driver from FreeBSD or OpenBSD?  I realize it's a litle late, but it seems
> like a nice thing to have in NetBSD-6.
> 

Yes, I think it would be good to support the tpm device.  It would give
us a secure certificate store & method of validating a certificate in
the kernel.

> Thoughts?

IMHO we shouldn't be hurrying this into NetBSD 6.  Not just this, I have
seen another mention of "it would be nice to have in 6".  I think we
need to be wary of trying to crunch lots of new things into 6 right at
the last minute and risk having a bunch of broken drivers due to
inadequate testing.  I don't mean to pick on you in particular, Brian,
it is just that your message happened to be the second "nice to have in
6" I saw.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: Lost file-system story

2011-12-13 Thread Brett Lymn
On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote:
> 
> fsck is supposed to handle *all* corruptions to the file system that can
> occur as part of normal file system operation in the kernel. It is doing
> best effort for others. It's a bug if it doesn't do the former and a
> potential missing feature for the latter.
> 

There are a lot of slips twixt cup and lip.  If you are really unlucky
you can get an outage at just the wrong time that will cause the
filesystem to be hosed so badly that fsck cannot recover it.  Sure, fsck
can run to completion but all you have is most of your FS in lost+found
which you have to be really really desperate to sort through.  I have
been working with UNIX for over 20years now and I have only seen this
happen once and it was with a commercial UNIX.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: RFC: New security model secmodel_securechroot(9)

2011-07-11 Thread Brett Lymn
On Mon, Jul 11, 2011 at 08:29:11AM +0100, David Laight wrote:
> 
> One problem is that, historically, unix privileges have always been
> based on a sledgehammer approach - if you don't want everybody to
> be able to do something then only root can do it.
> 

That can and has been fixed in other Unix operating systems.  The
problem is that a lot of people simply cannot see the use of such a
facility not because they are dumb, just that they have not encountered
a situation which could not be addressed with the normal unix
permissions or, perhaps, sudo.

Some things that have been problematic for me in the past with the unix
security model that I have been able to solve using finer grain
permissions are:

1) permitting an ordinary user to run apache on port 80, allowing them
to perform restarts and kill misbehaving daemons without requiring root
access.

2) permitting an ordinary user to run an ldap server on port 389

using the setuid daemon dance makes things awkward when it comes to
trying to kill things off.  Using sudo can be a pain because someone has
to be around to type the password which means you cannot schedule
restarts/kills as an unprivileged user.

Other things it would be useful for would be winding back the number of
setuid utilities on the system - ping could be permitted to open a raw
network interface, at the moment it has to be setuid to do this.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: extent-patch and overview of what is supposed to follow

2011-04-05 Thread Brett Lymn
On Tue, Apr 05, 2011 at 10:37:37AM +0100, Mindaugas Rasiukevicius wrote:
> 
> I am not convinced about statistics point.  For intensive allocations,
> constant-sized pool_cache(9) should/would be used, where it already
> gathers statistics.  If there is some particular need for statistics,
> one can always collect it at the caller's level.  Therefore, I do not
> see the need to invade allocator's API for that.
> 

We shouldn't just consider statistics anyway, there were some useful
malloc debug facilities which allowed you to know what memory had been
allocated to what process.  I had extended that so you could tell what
processes had allocated next to the region of interest - I used this to
nail a rather nasty buffer overflow that was stomping on the memory
allocations of innocent parties.  I don't know how I could have tracked
that bug down without the level of information available from the malloc
debug logs.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: panic: ffs_valloc: dup alloc

2010-10-20 Thread Brett Lymn
On Wed, Oct 20, 2010 at 01:11:20PM -0500, KAMADA Ken'ichi wrote:
> 
> I had no crash for a week (two weeks if the "skip-wdstart" test is
> counted).  Seems good.  I'm back to -current.
> 

It seems good to me too, no crashes.  Prior to this I was getting a
crash at least once a day possibly more frequently.  I have been doing
a fsck -fn on my cgd partition and that is always coming up clean
too.  Good stuff.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: panic: ffs_valloc: dup alloc

2010-10-19 Thread Brett Lymn
On Wed, Oct 13, 2010 at 11:26:05AM -0500, KAMADA Ken'ichi wrote:
> 
> Yes, I was testing a similar patch to yours (skip calling wdstart()
> when !device_is_active()) after my previous email.  It did not crash
> so far.  I'm now running with your patch and will be back in a
> week or so.  It will take some time because the panic tends to occur
> after an overnight sleep rather than a short nap.
> 

I am seeing good results with David's patch too - multiple
sleep/resumes over the course of the day without a kernel panic.  For
me, the panics were quite frequent so this is a big improvement.  I
will report back after a few days... or sooner if I have issues.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: Intel Centrino 6300 Ultimate-N adapter

2010-04-08 Thread Brett Lymn
On Wed, Apr 07, 2010 at 04:00:36PM -0700, B Harder wrote:
> 
> Hi Sverre -- I am interested... I've actually already got the 6000-4
> firmware, got the pcidevs setup, and worked w/ the existing driver
> code to get the device attached... at this point, it's failing w/
> eeprom errors (two of them, don't recall what there are, at the
> moment). If you've got something in some state of "working", you're
> further ahead than I am, though, and I'd love to see that code.
> 

As you have discovered, it is highly unlikely that you can just add
the device to pci devs and use the current iwn driver to make it
work.  I would expect that the interface to the firmware on the card
is slightly different, requiring different command structures.  This
can happen even between versions of firmware on the same hardware.  I
am not sure how different our iwn driver is from the OpenBSD one these
days, perhaps patching in the card specific code from their driver
would be possible.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: panic: ffs_valloc: dup alloc

2010-03-21 Thread Brett Lymn
On Sun, Mar 21, 2010 at 04:49:05PM -0400, Steven Bellovin wrote:
> 
> > That sounds like maybe the problem is not on the suspend side but on
> > the resume side, that is, that stuff is being written out before (some
> > layer of) the disk subsystem is ready to go again. With vanilla FFS
> > such writes should be synchronous so it should be (relatively) easy to
> > figure out what's going on. Do you feel like trying out dtrace? :-)
> 
> If you'd asked a week ago, sure; now, I don't have the time.

I need to update my -current to something more recent...

>  I don't think it's just named pipes anymore; the last crash I had
> did pretty horrific things to the file system, probably because some
> processes were busy creating files at the time I suspended.  >

I was seeing the corruption on plain files, mostly duplicate
allocations or freeing a free frag.  It looks to me like the kernel
thinks that the FS operation has been committed to disk but the bits
on the physical media have not changed.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: panic: ffs_valloc: dup alloc

2010-03-20 Thread Brett Lymn
On Fri, Mar 19, 2010 at 05:51:46PM -0500, KAMADA Ken'ichi wrote:
> 
> I'm seeing a panic: ffs_valloc: dup alloc.
> Does anyone have a similar panic?
> 

I have seen various file system panics after suspend/resume for quite
a while:

NetBSD rover 5.99.18 NetBSD 5.99.18 (ROVER2) #10: Tue Sep 29 08:18:23 CST 2009  
bl...@rover:/usr/src/sys/arch/i386/compile/ROVER2 i386

I have given up on suspending because my filesystems would be
corrupted with monotonous regularity.  The chances of a corruption
seems to increase with the amount of disk activity happening on
suspend.   It seems like something is not being flushed (or not being
marked as flushed) when the suspend happens.

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."




Re: blocksizes

2010-01-24 Thread Brett Lymn
On Fri, Jan 22, 2010 at 01:09:10PM +0100, Michael van Elst wrote:
> 
> Keeping DEV_SIZE at 512 bytes avoids lots of changes.
> 

Won't that mean there is a chance there will be a lot of
read/modify/write going on if the driver is pretending to have 512byte
sectors? Would that not be really really bad for NFS write performance
since the NFS spec says a write has to hit the media before being
confirmed then wouldn't a streaming write of 512byte sectors would
cause a lot of sequential read/modify/write as each 512byte chunk of
the underlying disk block size was updated?

-- 
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited.  If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility.  It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."