Re: Finding an available fss device

2018-08-30 Thread J. Hannken-Illjes


> On 23. Aug 2018, at 11:59, J. Hannken-Illjes  wrote:
> 
> 
>> On 22. Aug 2018, at 08:50, Emmanuel Dreyfus  wrote:
>> 
>> On Mon, Aug 20, 2018 at 10:39:21AM +0200, J. Hannken-Illjes wrote:
 I applied that to NetBSD-8.0, and it seems to behave much better.
>>> Good.
>> 
>> Will you commit and request a pullup? The change is valuable.
> 
> Sure ...

[pullup-8 #999] fss config update
[pullup-8 #1000] Fix deadlock with getnewbuf()

Should also fix the deadlock VFS_SNAPSHOT->ffs_copyonwrite->biowait.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-23 Thread J. Hannken-Illjes


> On 22. Aug 2018, at 08:50, Emmanuel Dreyfus  wrote:
> 
> On Mon, Aug 20, 2018 at 10:39:21AM +0200, J. Hannken-Illjes wrote:
>>> I applied that to NetBSD-8.0, and it seems to behave much better.
>> Good.
> 
> Will you commit and request a pullup? The change is valuable.

Sure ...

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-21 Thread Emmanuel Dreyfus
On Mon, Aug 20, 2018 at 10:39:21AM +0200, J. Hannken-Illjes wrote:
> > I applied that to NetBSD-8.0, and it seems to behave much better.
> Good.

Will you commit and request a pullup? The change is valuable.
-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-20 Thread Emmanuel Dreyfus
On Mon, Aug 20, 2018 at 10:39:21AM +0200, J. Hannken-Illjes wrote:
> This patch will change nothing mentioned there.  As I already asked:
(...)

I was away from this machine, I wil post the answer on the relevant thread.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-20 Thread J. Hannken-Illjes


> On 20. Aug 2018, at 10:34, Emmanuel Dreyfus  wrote:
> 
> On Thu, Aug 16, 2018 at 12:18:34PM +0200, J. Hannken-Illjes wrote:
>> - 001_add_sc_state replaces the flags FSS_ACTIVE and FSS_ERROR with
>> a state field.
>> 
>> - 002_extend_state adds states for construction or destruction of
>> a snapshot and fss_ioctl no longer blocks forever waiting for
>> construction or destruction of a snapshot to complete.
>> 
>> Opinions?
> 
> I applied that to NetBSD-8.0, and it seems to behave much better.

Good.

> What about the deadlock scenario you mentionned, e.g. taking 
> multiple snapshots of / at the same time, or a snapshot of / 
> and /home at the same time? Should they work with this patch?

Suppose you mean the "All processes go tstile" thread ...

This patch will change nothing mentioned there.  As I already asked:

- The first thirty lines of "dumpfs /home" please.

- Did you use "dump -x ..." or "dump -X"?

- Did the "dump" process hang?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-20 Thread Emmanuel Dreyfus
On Thu, Aug 16, 2018 at 12:18:34PM +0200, J. Hannken-Illjes wrote:
> - 001_add_sc_state replaces the flags FSS_ACTIVE and FSS_ERROR with
> a state field.
> 
> - 002_extend_state adds states for construction or destruction of
> a snapshot and fss_ioctl no longer blocks forever waiting for
> construction or destruction of a snapshot to complete.
> 
> Opinions?

I applied that to NetBSD-8.0, and it seems to behave much better.

What about the deadlock scenario you mentionned, e.g. taking 
multiple snapshots of / at the same time, or a snapshot of / 
and /home at the same time? Should they work with this patch?



-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-16 Thread J. Hannken-Illjes

> On 14. Aug 2018, at 11:16, J. Hannken-Illjes  wrote:
> 
> 
>> On 13. Aug 2018, at 19:25, Emmanuel Dreyfus  wrote:
>> 
>> On Mon, Aug 13, 2018 at 11:56:45AM +, Taylor R Campbell wrote:
>>> Unless I misunderstand fss(4), this is an abuse of mutex(9): nothing
>>> should sleep while holding the lock, so that nothing trying to acquire
>>> the lock will wait for a long time.
>> 
>> Well, the cause is not yet completely clear to me, but the user 
>> experience is terrible. The first time I used it, I thought
>> the system crashed, because fssconfig -l was just hung for 
>> hours.
>> 
>> And it is very easy to acheive a situation where most processes
>> are in tstile awaiting a vnode lock for a name lookup.
> 
> I see two problems here.
> 
> 1) File system internal snapshots take long to create or destroy on
>   large file systems.  I have no solution to this problem.
> 
>   Using file system external snapshots for dumps should work fine.
> 
> 2) Fss devices block in ioctl while a snapshot gets created or
>   destroyed.  A possible fix is to replace the current
>   active/non-active state with idle/creating/active/destroying


The attached diffs implement this in a pullup-friendly way.

- 001_add_sc_state replaces the flags FSS_ACTIVE and FSS_ERROR with
a state field.

- 002_extend_state adds states for construction or destruction of
a snapshot and fss_ioctl no longer blocks forever waiting for
construction or destruction of a snapshot to complete.

Opinions?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


001_add_sc_state
Description: Binary data



002_extend_state
Description: Binary data


Re: Finding an available fss device

2018-08-14 Thread Taylor R Campbell
> Date: Tue, 14 Aug 2018 11:16:44 +0200
> From: "J. Hannken-Illjes" 
> 
>Problem here is backwards compatibility.  I have no idea what to
>return for FSSIOCGET when the state is creating or destroying.

1. It would be a small improvement if waiting to acquire the lock, at
   least, were interruptible, like I suggested in an earlier message.

2. For a test-and-set, we could just rename it to OFSSIOCGET and
   introduce a new number for FSSIOCGET in netbsd-9 and beyond?


Re: Finding an available fss device

2018-08-14 Thread J. Hannken-Illjes


> On 13. Aug 2018, at 19:25, Emmanuel Dreyfus  wrote:
> 
> On Mon, Aug 13, 2018 at 11:56:45AM +, Taylor R Campbell wrote:
>> Unless I misunderstand fss(4), this is an abuse of mutex(9): nothing
>> should sleep while holding the lock, so that nothing trying to acquire
>> the lock will wait for a long time.
> 
> Well, the cause is not yet completely clear to me, but the user 
> experience is terrible. The first time I used it, I thought
> the system crashed, because fssconfig -l was just hung for 
> hours.
> 
> And it is very easy to acheive a situation where most processes
> are in tstile awaiting a vnode lock for a name lookup.

I see two problems here.

1) File system internal snapshots take long to create or destroy on
   large file systems.  I have no solution to this problem.

   Using file system external snapshots for dumps should work fine.

2) Fss devices block in ioctl while a snapshot gets created or
   destroyed.  A possible fix is to replace the current
   active/non-active state with idle/creating/active/destroying
   and changing FSSIOCSET to

   mutex_enter(&sc->sc_lock);
   if (sc->sc_state != FSS_IDLE) {
  mutex_exit(&sc->sc_lock);
  return EBUSY;
   }
   sc->sc_state = FSS_CREATING;
   mutex_exit(&sc->sc_lock);

   error = fss_create_snapshot();

   mutex_enter(&sc->sc_lock);
   if (error)
  sc->sc_state = FSS_IDLE;
   else
  sc->sc_state = FSS_ACTIVE;
   mutex_exit(&sc->sc_lock);

   return error;

   Problem here is backwards compatibility.  I have no idea what to
   return for FSSIOCGET when the state is creating or destroying.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-13 Thread Edgar Fuß
> Well, the cause is not yet completely clear to me, but the user 
> experience is terrible. The first time I used it, I thought
> the system crashed, because fssconfig -l was just hung for 
> hours.
My experience is that an external snapshot runs much, much faster.
It may even be the case that an "external" snapshot with the backup area 
on the same device is fast, too -- I don't recall.
I never ever understood this.


Re: Finding an available fss device

2018-08-13 Thread Emmanuel Dreyfus
On Mon, Aug 13, 2018 at 11:56:45AM +, Taylor R Campbell wrote:
> Unless I misunderstand fss(4), this is an abuse of mutex(9): nothing
> should sleep while holding the lock, so that nothing trying to acquire
> the lock will wait for a long time.

Well, the cause is not yet completely clear to me, but the user 
experience is terrible. The first time I used it, I thought
the system crashed, because fssconfig -l was just hung for 
hours.

And it is very easy to acheive a situation where most processes
are in tstile awaiting a vnode lock for a name lookup.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-13 Thread Taylor R Campbell
> Date: Fri, 10 Aug 2018 13:46:55 +
> From: Emmanuel Dreyfus 
> 
> Perhaps the right way is to add a FSSIOBUSY ioctl that would
> use mutex_tryenter and return EBUSY if the device is in use?

Unless I misunderstand fss(4), this is an abuse of mutex(9): nothing
should sleep while holding the lock, so that nothing trying to acquire
the lock will wait for a long time.  Instead, fss(4) should use an
interruptible lock built on a mutex and a condvar.  Something like
this:

struct fss_lock {
kmutex_t lock;
kcondvar_t cv;
kthread_t owner; /* maybe currently rendered as FSS_ACTIVE bit */
};

/* Acquire it with the opportunity to be interrupted.  */
mutex_enter(&sc->sc_fss_lock.lock);
while (sc->sc_fss_lock.owner == NULL) {
if (wait) {
error = cv_wait_sig(&sc->sc_fss_lock.cv,
&sc->sc_ffs_lock.cv);
} else {
error = EBUSY;
}
if (error) {
mutex_exit(&sc->sc_fss_lock.lock);
return error;
}
}
sc->sc_fss_lock.owner = curlwp;
mutex_exit(&sc->sc_fss_lock.lock);

/* Create the snapshot;  */
error = fss_create_snapshot(...);

/* Release it.  *
mutex_enter(&sc->sc_fss_lock.lock);
KASSERT(sc->sc_fss_lock.owner == curlwp);
sc->sc_fss_lock.owner = NULL;
mutex_exit(&sc->sc_fss_lock.lock);


Re: Finding an available fss device

2018-08-13 Thread J. Hannken-Illjes


> On 13. Aug 2018, at 09:53, Emmanuel Dreyfus  wrote:
> 
> On Sun, Aug 12, 2018 at 10:16:48AM +0200, J. Hannken-Illjes wrote:
>> While creating a snapshot "/mount0" lookup "/mount0/file", it will block
>> as "/mount0" is suspended.  The lookup holds a lock on "/".
>> 
>> Now snapshot "/ "and trying to suspend "/" will block as the lookup
>> has the root vnode locked.
> 
> This scenario is not the same as the one I asked about, which
> was: performing a snapshot of filesystem mounted on /mount0 
> using /dev/fss0 and a snapshot of filesystem mounted on /mount1
> using /dev/fss1 while the first one is still active. Is there some
> deadlock in this case?

Still not sure we are talking about the same thing.

1) Create snapshot of /mount0 with fss0
1a) Open /dev/fss0
1b) Ioctl FSSIOCSET on /dev/fss0 to create the snapshot
1c) Read data from /dev/fss0
1d) Ioctl FSSIOCCLR on /dev/fss0 to delete the snapshot
1e) Close /dev/fss0

The same for a snapshot of /mount1 with fss1.

2) Create snapshot of /mount1 with fss1
2a) Open /dev/fss1
2b) Ioctl FSSIOCSET on /dev/fss1 to create the snapshot
2c) Read data from /dev/fss1
2d) Ioctl FSSIOCCLR on /dev/fss1 to delete the snapshot
2e) Close /dev/fss1

All operations are mutually exclusive, we always run exactly
one of 1), 1a) ... 2e), a second operation will block until
it gets exclusive access.

Of these operations, 1b), 1d), 2b) and 2d) may take a long time
to run if the snapshot is file system internal.

> But you also raise a deadlock scenario for which there is no
> protection in currentn code. I already experienced it in the
> past and it would be fair to return EGAIN rather than letting the
> administrator set a snapshot that will kill the system later.

This scenario is protected by mutual exclusion of fss device
operations as explained above.  Creating the snapshot of "/"
waits for the creation of the snapshot of "/mount0" to finish.

Additionally VFS_SUSPEND is always exclusive (see mutex
vfs_suspend_lock) as it gets used from mount() and unmount() too.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-13 Thread Emmanuel Dreyfus
On Sun, Aug 12, 2018 at 10:16:48AM +0200, J. Hannken-Illjes wrote:
> While creating a snapshot "/mount0" lookup "/mount0/file", it will block
> as "/mount0" is suspended.  The lookup holds a lock on "/".
> 
> Now snapshot "/ "and trying to suspend "/" will block as the lookup
> has the root vnode locked.

This scenario is not the same as the one I asked about, which
was: performing a snapshot of filesystem mounted on /mount0 
using /dev/fss0 and a snapshot of filesystem mounted on /mount1
using /dev/fss1 while the first one is still active. Is there some
deadlock in this case?

But you also raise a deadlock scenario for which there is no
protection in currentn code. I already experienced it in the
past and it would be fair to return EGAIN rather than letting the
administrator set a snapshot that will kill the system later.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-12 Thread Robert Elz
Date:Sun, 12 Aug 2018 13:25:26 +
From:Emmanuel Dreyfus 
Message-ID:  <20180812132526.gh17...@homeworld.netbsd.org>

  | I was wondering about the FSS_ACTIVE test.

It is just one bit, either it is set, or it is not   SInce the code
is already referencing sc-> (the mutex lives inside it) it cannot
be concerned about the device structures being deleted, so
that bit and the field that contains it certainly exists, and can
be referenced.   Referencing it cannot hurt any other parallel
thread, the worst that can happen, is that we see a stale copy
which is just the race condition.

Perhaps if it is tested outside the lock, it should be tested again
after the lock is taken, I did not look carefully enough at
fss_create_snapshot() to see what would happen if it is
called with an already active device.   That second test
would rarely fail, but could.

  | Do you sugest the mutex should be replaced by a  rwlock? 

Now you're asking stuff that's way beyond my pay grade...

kre



Re: Finding an available fss device

2018-08-12 Thread Emmanuel Dreyfus
On Sun, Aug 12, 2018 at 04:32:49PM +0700, Robert Elz wrote:
> Clearly there's no point locking before testing for FWRITE
> in flag, that's a local var (param) to this function, 

Right, that one is obvious, I was wondering about the 
FSS_ACTIVE test.

> but the "if it is locked it must be active" is not correct I think, there
> might just be some other process doing a FSSIOCGET
> or something at the same time as the attempt to FSSIOCSET.
> The GET needs to lock, to return consistent values, but that does
> not mean that this fss has an active snapshot.

Do you sugest the mutex should be replaced by a  rwlock? 

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-12 Thread Robert Elz
Date:Sun, 12 Aug 2018 08:05:26 +
From:Emmanuel Dreyfus 
Message-ID:  <20180812080526.gf17...@homeworld.netbsd.org>

  |  Why would test then lock?

Because it avoids the overheads of acquiring a lock for no
particularly good purpose, only to immediately release it
again?   It would be different if it were to make a difference
to anything, obviously.

Clearly there's no point locking before testing for FWRITE
in flag, that's a local var (param) to this function, but the
"if it is locked it must be active" is not correct I think, there
might just be some other process doing a FSSIOCGET
or something at the same time as the attempt to FSSIOCSET.
The GET needs to lock, to return consistent values, but that does
not mean that this fss has an active snapshot.

The change you propose has subtly altered the semantics
another way as well (which probably does not matter) in that
previously if a FSSIOCSET just as the previous fss use was
being closed down, previously the ioctl would have waited on
the lock, and then succeeded, now it will fail (so would my change).

Lastly, it is clear (well, I think) that you and hannken@ are
talking at cross purposes, though whether this alters his answer
I have no idea (nor do I know the answer), but when you said:

snapshot /mount0 on fss0 and /mount1 on fss1?

I am fairly sure that you meant "snapshot the filesystem which is
mounted on /mount0 (using fss0) and also snapshot the filesystem
which is mounted on /mount1 (using fss1)" where I believe your
words might have been interpreted as "make a snapshot of some
filesystem using fss0, and make that available as /mount0, and
make another snapshot of the same filesystem (using fss1) and
expose that as /mount1.

It is best to be very clear about exactly what you mean, not use
shorthand.

kre



Re: Finding an available fss device

2018-08-12 Thread J. Hannken-Illjes



> On 12. Aug 2018, at 10:07, Emmanuel Dreyfus  wrote:
> 
> On Sun, Aug 12, 2018 at 09:55:27AM +0200, J. Hannken-Illjes wrote:
>>> You mean you cannot at the same tme snapshot /mount0 on fss0 and 
>>> /mount1 on fss1?
>> 
>> Yes, you have to create the snapshot on /mount0 and once it has been
>> created you create the snapshot on /mount1.
> 
> Where is that limitation? I would not exepect to get such a limitation
> when working on both a different mount and fss device.

The simplest deadlock is:

While creating a snapshot "/mount0" lookup "/mount0/file", it will block
as "/mount0" is suspended.  The lookup holds a lock on "/".

Now snapshot "/ "and trying to suspend "/" will block as the lookup
has the root vnode locked.

There are too many other deadlock scenarios this could ever work.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-12 Thread Emmanuel Dreyfus
On Sun, Aug 12, 2018 at 09:55:27AM +0200, J. Hannken-Illjes wrote:
> > You mean you cannot at the same tme snapshot /mount0 on fss0 and 
> > /mount1 on fss1?
> 
> Yes, you have to create the snapshot on /mount0 and once it has been
> created you create the snapshot on /mount1.

Where is that limitation? I would not exepect to get such a limitation
when working on both a different mount and fss device.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-12 Thread Emmanuel Dreyfus
On Sat, Aug 11, 2018 at 10:52:42AM +0700, Robert Elz wrote:
> I doubt that your new proposed ioctl() is a very good
> interface 

Indeed, the following change is enough to find a free fss without a 
hang, and it does not introduce a new ioctl.  It is quite close to 
your proposal, except I lock before testing FSS_ACTIVE. Why would
test then lock? 

Index: sys/dev/fss.c
===
RCS file: /cvsroot/src/sys/dev/fss.c,v
retrieving revision 1.98.2.2
diff -U4 -r1.98.2.2 fss.c
--- sys/dev/fss.c   13 Jan 2018 05:38:54 -  1.98.2.2
+++ sys/dev/fss.c   12 Aug 2018 08:00:52 -
@@ -336,9 +336,13 @@
fss->fss_csize = fss50->fss_csize;
fss->fss_flags = 0;
/* Fall through */
case FSSIOCSET:
-   mutex_enter(&sc->sc_lock);
+   if (mutex_tryenter(&sc->sc_lock) == 0) {
+   error = EBUSY;
+   break;
+   }
+
if ((flag & FWRITE) == 0)
error = EPERM;
else if ((sc->sc_flags & FSS_ACTIVE) != 0)
error = EBUSY;



-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-12 Thread J. Hannken-Illjes



> On 12. Aug 2018, at 03:58, Emmanuel Dreyfus  wrote:
> 
> On Sat, Aug 11, 2018 at 10:33:04AM +0200, J. Hannken-Illjes wrote:
>> When fssconfig "hangs" the dump is creating a snapshot.  Creating
>> a snapshot (and suspending a file system) is serialized.  Allowing
>> more than one file system suspension at a time will deadlock most
>> of the time.
> 
> You mean you cannot at the same tme snapshot /mount0 on fss0 and 
> /mount1 on fss1?

Yes, you have to create the snapshot on /mount0 and once it has been
created you create the snapshot on /mount1.

The snapshot on /mount0 is already usable while you create the second
snapshot on /mount1.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-11 Thread Emmanuel Dreyfus
On Sat, Aug 11, 2018 at 10:33:04AM +0200, J. Hannken-Illjes wrote:
> When fssconfig "hangs" the dump is creating a snapshot.  Creating
> a snapshot (and suspending a file system) is serialized.  Allowing
> more than one file system suspension at a time will deadlock most
> of the time.

You mean you cannot at the same tme snapshot /mount0 on fss0 and 
/mount1 on fss1?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: Finding an available fss device

2018-08-11 Thread J. Hannken-Illjes



> On 10. Aug 2018, at 15:46, Emmanuel Dreyfus  wrote:
> 
> Hello
> 
> How are user processes supposed to find an unused fss device?
> In dump(8) code,  there is an iteration on /dev/rfss* trying to
> performan an ioctl FSSIOCSET. The code tests for EBUSY on failure,
> but in my experience that struggles to happen: if the device is 
> already in use, the ioctl will sleep in the kernel for ages before 
> getting a reply.
> 
> This is something I can even experience with fssconfig -l, which 
> hangs for a while if dump is running.
> 
> Is there another way? I thought about searching vnode in kernel to 
> check if the device is already used by someone else, but that looks 
> overkill. 
> 
> Perhaps the right way is to add a FSSIOBUSY ioctl that would
> use mutex_tryenter and return EBUSY if the device is in use?

When fssconfig "hangs" the dump is creating a snapshot.  Creating
a snapshot (and suspending a file system) is serialized.  Allowing
more than one file system suspension at a time will deadlock most
of the time.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Finding an available fss device

2018-08-10 Thread Robert Elz
I doubt that your new proposed ioctl() is a very good
interface - to do what you really need you would require
the equivalent of a "test & set" otherwise all you are
doing is creating a race condition - even though it is,
because of the low number of users, one that is
unlikely to matter very often.

If the interface were useful, I see no point in taking and
releasing a mutex in order to read one bit - the bit is either
set or not, and is either stable set or not (if it is stable,
the mutex achieves nothing, if it is about to, or has just
changed, then that's just the race condition, the
ioctl would see the value either before, or after,
that change, the mutex doesn't help you know
which, so it doesn't really achieve anything).

If the mutex were useful, you'e used the wrong one,
FSS_ACTIVE is set/cleared under the control of
sc_lock not sc_slock (which is kind of strange as
other flags in the same word are controlled by sc_slock
which does not look like it would be reliable to me).

What I think I'd do is change the code for FSSIOCSET
to be something like

case FSSIOCSET:
error = 0;
if ((flag & FWRITE) == 0)
error = EPERM;
else if ((sc->sc_flags & FSS_ACTIVE) != 0)
error = EBUSY;
if (error == 0) {
mutex_enter(&sc->sc_lock);
error = fss_create_snapshot(sc, fss, l);
if (error == 0)
sc->sc_uflags = fss->fss_flags;
mutex_exit(&sc->sc_lock); 
}
break;

(apologies for any indentation screwups, it looks OK
as I am typing it, but that might not be what it looks like
to anyone else...)

With that you have more the test & set operation, which
should not block if the device is active already.

You could also re-order the two initial tests, so you could
do a read-only open, attempt this operation, which would
always fail then, but with errno==EBUSY if the device is
active, and errno==EPERM otherwise - which would provice
the (racy) just see if it is available operation.

But you need advice from someone who unserstands the locking
issues, and can see if they are used properly with this code
now, and what is really needed to get what you want - don't
just believe this because it looks right to me.

kre



Re: Finding an available fss device

2018-08-10 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Perhaps the right way is to add a FSSIOBUSY ioctl that would
> use mutex_tryenter and return EBUSY if the device is in use?

I propose the change below, so that we can find an available /dev/fss*
device without hanging:

--- sys/dev/fss.c.orig
+++ sys/dev/fss.c
@@ -427,8 +427,17 @@
mutex_exit(&sc->sc_slock);
error = 0;
break;
 
+   case FSSIOBUSY:
+   if (mutex_tryenter(&sc->sc_slock) == 0) {
+   error = EBUSY;
+   break;
+   }
+   error = (sc->sc_flags & FSS_ACTIVE) ? EBUSY : 0;
+   mutex_exit(&sc->sc_slock);
+   break;
+
default:
error = EINVAL;
break;
}
--- sys/dev/fssvar.h.orig
+++ sys/dev/fssvar.h
@@ -56,8 +56,10 @@
 #define FSSIOCGET  _IOR('F', 1, struct fss_get)/* Status */
 #define FSSIOCCLR  _IO('F', 2) /* Unconfigure */
 #define FSSIOFSET  _IOW('F', 3, int)   /* Set flags */
 #define FSSIOFGET  _IOR('F', 4, int)   /* Get flags */
+#define FSSIOBUSY  _IO('F', 6) /* Is busy? */
+
 #ifdef _KERNEL
 #include 
 
 struct fss_set50 {


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Finding an available fss device

2018-08-10 Thread Emmanuel Dreyfus
Hello

How are user processes supposed to find an unused fss device?
In dump(8) code,  there is an iteration on /dev/rfss* trying to
performan an ioctl FSSIOCSET. The code tests for EBUSY on failure,
but in my experience that struggles to happen: if the device is 
already in use, the ioctl will sleep in the kernel for ages before 
getting a reply.

This is something I can even experience with fssconfig -l, which 
hangs for a while if dump is running.

Is there another way? I thought about searching vnode in kernel to 
check if the device is already used by someone else, but that looks 
overkill. 

Perhaps the right way is to add a FSSIOBUSY ioctl that would
use mutex_tryenter and return EBUSY if the device is in use?

-- 
Emmanuel Dreyfus
m...@netbsd.org