Re: [reiserfs-list] Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-31 Thread Hans Reiser

Daniel Phillips wrote:

> Graciously accepted.  Coming up with something sensible in a mere 6
> months would be a minor miracle. ;-)
> 
> - what happens if the user forgets to close the transaction?

then the user has branched into his own version, or at least that would be my
take on it.  Another possible method is to expire transactions by persons who
lack permission to keep them open indefinitely.  I suppose one could expire them
to abort, or expire them to commit, both being valid under some circumstances.  


> 
>I plan to set a checkpoint there (because the transaction got
>too big) and log the fact that it's open.
> 
> - issues of lock/transaction duration
> 
>Once again relying on checkpoints, when the transaction gets
>uncomfortably big for cache, set a checkpoint.  I haven't thought
>about locks
> 
> - transaction batching
> 
>1) Explicit transaction batch close 2) Cache gets past a certain
>fullness.  In both cases, no new transactions are allowed to start
>and as soon as all current ones are closed we close the batch.re6;
> 
> - of levels of isolation
> - concurrent transactions modifying global fs metadata
>and some but not all of those concurrent transactions receiving a
>rollback
> 
>First I was going to write 'huh?' here, then I realized you're
>talking about real database ops, not just filesystem ops.  I had
>in mind something more modest: transactions are 'mv', 'read/write'
>(if the 'atomic read/write' is set), other filesystem operations I've
>forgotten, and anything the user puts between open_xact and
>close_xact.  You are raising the ante a little ;-)
> 
>In my case (Tux2) I could do an efficient rollback to the beginning
>   of the batch (phase), then I would have had to have kept an
>in-memory log of the transactions for selective replay.  With a
>journal log you can obviously do the same thing, but perhaps more
>efficiently if your journal design supports undo/redo.
> 
>The above is a pure flight of fancy, we won't be seeing anything
>so fancy as an API across filesystems.

It is just a matter of time, and we will.  I think that the major release AFTER
2.6 will see it.  First we have to get a prototype done in time for 2.6

> 
> - permissions relating to keeping transactions open.
>We can see this one in the light of a simple filesystem
>transaction: what happens if we are in the middle of a mv and
>someone changes the permissions?  Go with the starting or
>ending permissions?
> 
> Well, the database side of this is really interesting, but to get
> something generic across filesystems, the scope pretty well has to be
> limited to journal-type transactions, don't you think?

don't know what a journal-type transaction is and how it differs from a database
transaction.

> 
> --
> Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [reiserfs-list] Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-31 Thread Hans Reiser

Daniel Phillips wrote:

 Graciously accepted.  Coming up with something sensible in a mere 6
 months would be a minor miracle. ;-)
 
 - what happens if the user forgets to close the transaction?

then the user has branched into his own version, or at least that would be my
take on it.  Another possible method is to expire transactions by persons who
lack permission to keep them open indefinitely.  I suppose one could expire them
to abort, or expire them to commit, both being valid under some circumstances.  


 
I plan to set a checkpoint there (because the transaction got
too big) and log the fact that it's open.
 
 - issues of lock/transaction duration
 
Once again relying on checkpoints, when the transaction gets
uncomfortably big for cache, set a checkpoint.  I haven't thought
about locks
 
 - transaction batching
 
1) Explicit transaction batch close 2) Cache gets past a certain
fullness.  In both cases, no new transactions are allowed to start
and as soon as all current ones are closed we close the batch.re6;
 
 - of levels of isolation
 - concurrent transactions modifying global fs metadata
and some but not all of those concurrent transactions receiving a
rollback
 
First I was going to write 'huh?' here, then I realized you're
talking about real database ops, not just filesystem ops.  I had
in mind something more modest: transactions are 'mv', 'read/write'
(if the 'atomic read/write' is set), other filesystem operations I've
forgotten, and anything the user puts between open_xact and
close_xact.  You are raising the ante a little ;-)
 
In my case (Tux2) I could do an efficient rollback to the beginning
   of the batch (phase), then I would have had to have kept an
in-memory log of the transactions for selective replay.  With a
journal log you can obviously do the same thing, but perhaps more
efficiently if your journal design supports undo/redo.
 
The above is a pure flight of fancy, we won't be seeing anything
so fancy as an API across filesystems.

It is just a matter of time, and we will.  I think that the major release AFTER
2.6 will see it.  First we have to get a prototype done in time for 2.6

 
 - permissions relating to keeping transactions open.
We can see this one in the light of a simple filesystem
transaction: what happens if we are in the middle of a mv and
someone changes the permissions?  Go with the starting or
ending permissions?
 
 Well, the database side of this is really interesting, but to get
 something generic across filesystems, the scope pretty well has to be
 limited to journal-type transactions, don't you think?

don't know what a journal-type transaction is and how it differs from a database
transaction.

 
 --
 Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-28 Thread Horst von Brand

Daniel Phillips <[EMAIL PROTECTED]> said:
> On Sunday 27 May 2001 15:32, Edgar Toernig wrote:

[...]

> > you break UNIX fundamentals.  But I'm quite relieved now because I'm
> > pretty sure that something like that will never go into the kernel.

> OK, I'll take that as "I couldn't find a piece of code that breaks, so 
> it's on to the legal issues".

It boggles my (perhaps underdeveloped) mind to have things that are files
_and_ directories at the same time. The last time this was discussed was
for handling forks (a la Mac et al) in files, and it was shot down.

> SUS doesn't seem to have a lot to say about this.  The nearest thing to 
> a ruling I found was "The special filename dot refers to the directory 
> specified by its predecessor".  Which is not the same thing as:
> 
>open("foo", O_RDONLY) == open ("foo/.", O_RDONLY)

It says "foo" and "foo/." are the same _directory_, where "foo" is a
directory as otherwise "foo/" makes no sense, AFAICS. Is there
any mention on a _file_ "bar" and going "bar/" or "bar/"?
-- 
Horst von Brand [EMAIL PROTECTED]
Casilla 9G, Vin~a del Mar, Chile   +56 32 672616
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-28 Thread Horst von Brand

Daniel Phillips [EMAIL PROTECTED] said:
 On Sunday 27 May 2001 15:32, Edgar Toernig wrote:

[...]

  you break UNIX fundamentals.  But I'm quite relieved now because I'm
  pretty sure that something like that will never go into the kernel.

 OK, I'll take that as I couldn't find a piece of code that breaks, so 
 it's on to the legal issues.

It boggles my (perhaps underdeveloped) mind to have things that are files
_and_ directories at the same time. The last time this was discussed was
for handling forks (a la Mac et al) in files, and it was shot down.

 SUS doesn't seem to have a lot to say about this.  The nearest thing to 
 a ruling I found was The special filename dot refers to the directory 
 specified by its predecessor.  Which is not the same thing as:
 
open(foo, O_RDONLY) == open (foo/., O_RDONLY)

It says foo and foo/. are the same _directory_, where foo is a
directory as otherwise foo/something makes no sense, AFAICS. Is there
any mention on a _file_ bar and going bar/ or bar/something?
-- 
Horst von Brand [EMAIL PROTECTED]
Casilla 9G, Vin~a del Mar, Chile   +56 32 672616
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Marko Kreen

On Sun, May 27, 2001 at 10:45:17PM +0200, Daniel Phillips wrote:
> On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
> > Daniel Phillips wrote:
> > > I'm not claiming there isn't breakage somewhere,
> >
> > you break UNIX fundamentals.  But I'm quite relieved now because I'm
> > pretty sure that something like that will never go into the kernel.
> 
> OK, I'll take that as "I couldn't find a piece of code that breaks, so 
> it's on to the legal issues".
> 
> SUS doesn't seem to have a lot to say about this.  The nearest thing to 
> a ruling I found was "The special filename dot refers to the directory 
> specified by its predecessor".  Which is not the same thing as:
> 
>open("foo", O_RDONLY) == open ("foo/.", O_RDONLY)
> 
> I don't know about POSIX (I don't have it: a pox on standards 
> organizations that don't make their standards freely available) but SUS 
> doesn't seem to forbid this.

My question is: Is it needed?  You are advocating quite
non-obvious behaviour on a UNIX-like fs.  Cant the end result
achieved in more obvious manner?

I see at most 3 types of magic files:

1) regular file - nothing special.  Whether it has CHR/BLK set
   or not is irrelevant.

2) file with subdevs.  As 1) but you can acces dev/something
   for subdev 'something'.  Permissions should be probably taken
   from 'dev'.  Ofcourse you cant do 'ls' on the thing.

3) magicdev as directory.  Act as ordinary directory.  Only
   reason is to group devices.

And all those should be manageable by devfsd, so you can tell
devfsd to take subdev and create it as file somewhere else.
So 2) and 3) are more like 'defaults'.

So: is there additional type required with non-obvious file/dir
behaviour mix?

-- 
marko

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Daniel Phillips

On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > It won't, the open for "." is handled in the VFS, not the
> > filesystem - it will open the directory.  (Without needing to be
> > told it's a directory via O_DIRECTORY.)  If you do open("magicdev")
> > you'll get the device, because that's handled by magicdevfs.
>
> You really mean that "magicdev" is a directory and:
>
>   open("magicdev/.", O_RDONLY);
>   open("magicdev", O_RDONLY);
>
> would both succeed but open different objects?

Yes, and:

open("magicdev/.", O_RDONLY | O_DIRECTORY);
open("magicdev", O_RDONLY | O_DIRECTORY);

will both succeed and open the same object.

> > I'm not claiming there isn't breakage somewhere,
>
> you break UNIX fundamentals.  But I'm quite relieved now because I'm
> pretty sure that something like that will never go into the kernel.

OK, I'll take that as "I couldn't find a piece of code that breaks, so 
it's on to the legal issues".

SUS doesn't seem to have a lot to say about this.  The nearest thing to 
a ruling I found was "The special filename dot refers to the directory 
specified by its predecessor".  Which is not the same thing as:

   open("foo", O_RDONLY) == open ("foo/.", O_RDONLY)

I don't know about POSIX (I don't have it: a pox on standards 
organizations that don't make their standards freely available) but SUS 
doesn't seem to forbid this.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> It won't, the open for "." is handled in the VFS, not the filesystem -
> it will open the directory.  (Without needing to be told it's a
> directory via O_DIRECTORY.)  If you do open("magicdev") you'll get the
> device, because that's handled by magicdevfs.

You really mean that "magicdev" is a directory and:

open("magicdev/.", O_RDONLY);
open("magicdev", O_RDONLY);

would both succeed but open different objects?

> I'm not claiming there isn't breakage somewhere,

you break UNIX fundamentals.  But I'm quite relieved now because I'm
pretty sure that something like that will never go into the kernel.

Ciao, ET.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Edgar Toernig

Daniel Phillips wrote:
 
 It won't, the open for . is handled in the VFS, not the filesystem -
 it will open the directory.  (Without needing to be told it's a
 directory via O_DIRECTORY.)  If you do open(magicdev) you'll get the
 device, because that's handled by magicdevfs.

You really mean that magicdev is a directory and:

open(magicdev/., O_RDONLY);
open(magicdev, O_RDONLY);

would both succeed but open different objects?

 I'm not claiming there isn't breakage somewhere,

you break UNIX fundamentals.  But I'm quite relieved now because I'm
pretty sure that something like that will never go into the kernel.

Ciao, ET.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Marko Kreen

On Sun, May 27, 2001 at 10:45:17PM +0200, Daniel Phillips wrote:
 On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
  Daniel Phillips wrote:
   I'm not claiming there isn't breakage somewhere,
 
  you break UNIX fundamentals.  But I'm quite relieved now because I'm
  pretty sure that something like that will never go into the kernel.
 
 OK, I'll take that as I couldn't find a piece of code that breaks, so 
 it's on to the legal issues.
 
 SUS doesn't seem to have a lot to say about this.  The nearest thing to 
 a ruling I found was The special filename dot refers to the directory 
 specified by its predecessor.  Which is not the same thing as:
 
open(foo, O_RDONLY) == open (foo/., O_RDONLY)
 
 I don't know about POSIX (I don't have it: a pox on standards 
 organizations that don't make their standards freely available) but SUS 
 doesn't seem to forbid this.

My question is: Is it needed?  You are advocating quite
non-obvious behaviour on a UNIX-like fs.  Cant the end result
achieved in more obvious manner?

I see at most 3 types of magic files:

1) regular file - nothing special.  Whether it has CHR/BLK set
   or not is irrelevant.

2) file with subdevs.  As 1) but you can acces dev/something
   for subdev 'something'.  Permissions should be probably taken
   from 'dev'.  Ofcourse you cant do 'ls' on the thing.

3) magicdev as directory.  Act as ordinary directory.  Only
   reason is to group devices.

And all those should be manageable by devfsd, so you can tell
devfsd to take subdev and create it as file somewhere else.
So 2) and 3) are more like 'defaults'.

So: is there additional type required with non-obvious file/dir
behaviour mix?

-- 
marko

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Daniel Phillips

On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
 Daniel Phillips wrote:
  It won't, the open for . is handled in the VFS, not the
  filesystem - it will open the directory.  (Without needing to be
  told it's a directory via O_DIRECTORY.)  If you do open(magicdev)
  you'll get the device, because that's handled by magicdevfs.

 You really mean that magicdev is a directory and:

   open(magicdev/., O_RDONLY);
   open(magicdev, O_RDONLY);

 would both succeed but open different objects?

Yes, and:

open(magicdev/., O_RDONLY | O_DIRECTORY);
open(magicdev, O_RDONLY | O_DIRECTORY);

will both succeed and open the same object.

  I'm not claiming there isn't breakage somewhere,

 you break UNIX fundamentals.  But I'm quite relieved now because I'm
 pretty sure that something like that will never go into the kernel.

OK, I'll take that as I couldn't find a piece of code that breaks, so 
it's on to the legal issues.

SUS doesn't seem to have a lot to say about this.  The nearest thing to 
a ruling I found was The special filename dot refers to the directory 
specified by its predecessor.  Which is not the same thing as:

   open(foo, O_RDONLY) == open (foo/., O_RDONLY)

I don't know about POSIX (I don't have it: a pox on standards 
organizations that don't make their standards freely available) but SUS 
doesn't seem to forbid this.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-26 Thread Daniel Phillips

On Saturday 26 May 2001 05:07, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > Oops, oh wait, there's already another open point: your breakage
> > examples both rely on opening ".".  You're right, "." should always
> > be a directory and I believe that's enforced by the VFS.  So we
> > don't have an example of breakage yet.
>
> That's just because I did a simple "ls".  But it doesn't make a
> difference.  The magicdevs _are_ directories and
>
>   chdir("magicdev");
>   open(".", O_RDONLY);
>
> shouldn't open the device.

It won't, the open for "." is handled in the VFS, not the filesystem - 
it will open the directory.  (Without needing to be told it's a 
directory via O_DIRECTORY.)  If you do open("magicdev") you'll get the 
device, because that's handled by magicdevfs.

I'm not claiming there isn't breakage somewhere, just that we didn't 
find it on this attempt.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-26 Thread Daniel Phillips

On Saturday 26 May 2001 05:07, Edgar Toernig wrote:
 Daniel Phillips wrote:
  Oops, oh wait, there's already another open point: your breakage
  examples both rely on opening ..  You're right, . should always
  be a directory and I believe that's enforced by the VFS.  So we
  don't have an example of breakage yet.

 That's just because I did a simple ls.  But it doesn't make a
 difference.  The magicdevs _are_ directories and

   chdir(magicdev);
   open(., O_RDONLY);

 shouldn't open the device.

It won't, the open for . is handled in the VFS, not the filesystem - 
it will open the directory.  (Without needing to be told it's a 
directory via O_DIRECTORY.)  If you do open(magicdev) you'll get the 
device, because that's handled by magicdevfs.

I'm not claiming there isn't breakage somewhere, just that we didn't 
find it on this attempt.

--
Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> Oops, oh wait, there's already another open point: your breakage
> examples both rely on opening ".".  You're right, "." should always be
> a directory and I believe that's enforced by the VFS.  So we don't have
> an example of breakage yet.

That's just because I did a simple "ls".  But it doesn't make a
difference.  The magicdevs _are_ directories and

chdir("magicdev");
open(".", O_RDONLY);

shouldn't open the device.

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Thursday 24 May 2001 22:59, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > > > Readdir fills in a directory type, so ls sees it as a directory
> > > > and does the right thing.  On the other hand, we know we're on
> > > > a device filesystem so we will next open the name as a regular
> > > > file, and find ISCHR or ISBLK: good.
> > >
> > > ??? The kernel may know it, but the app?  Or do you really want
> > > to give different stat data on stat(2) and fstat(2)?  These flags
> > > are currently used by archive/backup prgs.  It's a hint that
> > > these files are not regular files and shouldn't be opened for
> > > reading. Having a 'd' would mean that they would really try to
> > > enter the directory and save it's contents.  Don't know what
> > > happens in this case to your "special" files ;-)
> >
> > I guess that's much like the question 'what happens in proc?'.
>
> And that's already bad enough.  Most of the "files" in proc should
> be fifos!  And using proc as an excuse to introduce another set of
> magic dirs?  No, thanks.

Wait a second, I thought proc was here to stay.  Wait another
second, device nodes are already magic.  Magic is magic, just
choose your color ;-)

This set of magic dirs is supposed to clean things up, not mess things 
up.  We already saw how the side-effects-on-open problem in ls -l goes 
away.  There's a much bigger problem I'd love to deal with: the 'no 
heirarchy can please everybody' problem.  In database terms, aheirarchy 
is an insufficiently general model for real-world problems, in other 
words, they never worked.  Tables work.  That's where I'm trying to go 
with this, so please bear with me.  This is not just a solution in 
search of a problem.

> > Correct me if I'm wrong, but what we learn from the proc example
> > is that tarring your whole source tree starting at / is not
> > something you want to do.
>
> IMHO it would be better to fix proc instead of adding more magic.  At
> the moment you have to exclude /proc.  You want to add /dev.

Well, actually no, ls -R, tar, zip, etc, work pretty well with the 
scheme I've described.

> And
> next? Exclude all $HOME/dev (in case process name spaces get added)? 
> Or make fifos magic too and add all of them to the exclude list?  But
> there's no central place for fifos.  So lets add more magic :-(

No, no, no, agreed and sometimes magic is good.  It's not deep magic.  
The only new thing here is the interpretation of the O_DIRECTORY flag, 
or rather, the lack of it.

> > What *won't* happen is, you won't get side effects from opening
> > your serial ports (you'd have to open them without O_DIRECTORY
> > to get that) so that seems like a little step forward.
>
> As already said: depending on O_DIRECTORY breaks POSIX compliance
> and that alone should kill this idea...

Thanks, two good points:
  - libc5 will get confused when doing ls in /magicdev
  - POSIX specifically forbids this

I'll put this away until I've specifically dug into both of them.  OK, 
over and out, thanks for your commentary.

/me peruses man pages

Oops, oh wait, there's already another open point: your breakage 
examples both rely on opening ".".  You're right, "." should always be 
a directory and I believe that's enforced by the VFS.  So we don't have 
an example of breakage yet.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Friday 25 May 2001 00:00, Hans Reiser wrote:
> Daniel Phillips wrote:
> > I suppose I'm just reiterating the obvious, but we should
> > eventually have a generic filesystem transaction API at the VFS
> > level, once we have enough data points to know what the One True
> > API should be.
>
> Daniel, implementing transactions is not a trivial thing as you
> probably know. It requires that you resolve such issues as, what
> happens if the user forgets to close the transaction, issues of
> lock/transaction duration, of transaction batching, of levels of
> isolation, of concurrent transactions modifying global fs metadata
> and some but not all of those concurrent transactions receiving a
> rollback, and of permissions relating to keeping transactions open. 
> I would encourage you to participate in the reiser4 design discussion
> we will be having over the next 6 months, and give us your opinions. 
> Josh will be leading that design effort for the ReiserFS team.

Graciously accepted.  Coming up with something sensible in a mere 6 
months would be a minor miracle. ;-)

- what happens if the user forgets to close the transaction?

   I plan to set a checkpoint there (because the transaction got
   too big) and log the fact that it's open.

- issues of lock/transaction duration

   Once again relying on checkpoints, when the transaction gets
   uncomfortably big for cache, set a checkpoint.  I haven't thought
   about locks

- transaction batching

   1) Explicit transaction batch close 2) Cache gets past a certain 
   fullness.  In both cases, no new transactions are allowed to start
   and as soon as all current ones are closed we close the batch.

- of levels of isolation
- concurrent transactions modifying global fs metadata
   and some but not all of those concurrent transactions receiving a
   rollback

   First I was going to write 'huh?' here, then I realized you're   
   talking about real database ops, not just filesystem ops.  I had
   in mind something more modest: transactions are 'mv', 'read/write'
   (if the 'atomic read/write' is set), other filesystem operations I've
   forgotten, and anything the user puts between open_xact and  
   close_xact.  You are raising the ante a little ;-)

   In my case (Tux2) I could do an efficient rollback to the beginning
  of the batch (phase), then I would have had to have kept an   
   in-memory log of the transactions for selective replay.  With a  
   journal log you can obviously do the same thing, but perhaps more
   efficiently if your journal design supports undo/redo.

   The above is a pure flight of fancy, we won't be seeing anything
   so fancy as an API across filesystems.

- permissions relating to keeping transactions open. 
   We can see this one in the light of a simple filesystem  
   transaction: what happens if we are in the middle of a mv and
   someone changes the permissions?  Go with the starting or
   ending permissions?

Well, the database side of this is really interesting, but to get 
something generic across filesystems, the scope pretty well has to be 
limited to journal-type transactions, don't you think?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Friday 25 May 2001 00:00, Hans Reiser wrote:
 Daniel Phillips wrote:
  I suppose I'm just reiterating the obvious, but we should
  eventually have a generic filesystem transaction API at the VFS
  level, once we have enough data points to know what the One True
  API should be.

 Daniel, implementing transactions is not a trivial thing as you
 probably know. It requires that you resolve such issues as, what
 happens if the user forgets to close the transaction, issues of
 lock/transaction duration, of transaction batching, of levels of
 isolation, of concurrent transactions modifying global fs metadata
 and some but not all of those concurrent transactions receiving a
 rollback, and of permissions relating to keeping transactions open. 
 I would encourage you to participate in the reiser4 design discussion
 we will be having over the next 6 months, and give us your opinions. 
 Josh will be leading that design effort for the ReiserFS team.

Graciously accepted.  Coming up with something sensible in a mere 6 
months would be a minor miracle. ;-)

- what happens if the user forgets to close the transaction?

   I plan to set a checkpoint there (because the transaction got
   too big) and log the fact that it's open.

- issues of lock/transaction duration

   Once again relying on checkpoints, when the transaction gets
   uncomfortably big for cache, set a checkpoint.  I haven't thought
   about locks

- transaction batching

   1) Explicit transaction batch close 2) Cache gets past a certain 
   fullness.  In both cases, no new transactions are allowed to start
   and as soon as all current ones are closed we close the batch.

- of levels of isolation
- concurrent transactions modifying global fs metadata
   and some but not all of those concurrent transactions receiving a
   rollback

   First I was going to write 'huh?' here, then I realized you're   
   talking about real database ops, not just filesystem ops.  I had
   in mind something more modest: transactions are 'mv', 'read/write'
   (if the 'atomic read/write' is set), other filesystem operations I've
   forgotten, and anything the user puts between open_xact and  
   close_xact.  You are raising the ante a little ;-)

   In my case (Tux2) I could do an efficient rollback to the beginning
  of the batch (phase), then I would have had to have kept an   
   in-memory log of the transactions for selective replay.  With a  
   journal log you can obviously do the same thing, but perhaps more
   efficiently if your journal design supports undo/redo.

   The above is a pure flight of fancy, we won't be seeing anything
   so fancy as an API across filesystems.

- permissions relating to keeping transactions open. 
   We can see this one in the light of a simple filesystem  
   transaction: what happens if we are in the middle of a mv and
   someone changes the permissions?  Go with the starting or
   ending permissions?

Well, the database side of this is really interesting, but to get 
something generic across filesystems, the scope pretty well has to be 
limited to journal-type transactions, don't you think?

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Thursday 24 May 2001 22:59, Edgar Toernig wrote:
 Daniel Phillips wrote:
Readdir fills in a directory type, so ls sees it as a directory
and does the right thing.  On the other hand, we know we're on
a device filesystem so we will next open the name as a regular
file, and find ISCHR or ISBLK: good.
  
   ??? The kernel may know it, but the app?  Or do you really want
   to give different stat data on stat(2) and fstat(2)?  These flags
   are currently used by archive/backup prgs.  It's a hint that
   these files are not regular files and shouldn't be opened for
   reading. Having a 'd' would mean that they would really try to
   enter the directory and save it's contents.  Don't know what
   happens in this case to your special files ;-)
 
  I guess that's much like the question 'what happens in proc?'.

 And that's already bad enough.  Most of the files in proc should
 be fifos!  And using proc as an excuse to introduce another set of
 magic dirs?  No, thanks.

Wait a second, I thought proc was here to stay.  Wait another
second, device nodes are already magic.  Magic is magic, just
choose your color ;-)

This set of magic dirs is supposed to clean things up, not mess things 
up.  We already saw how the side-effects-on-open problem in ls -l goes 
away.  There's a much bigger problem I'd love to deal with: the 'no 
heirarchy can please everybody' problem.  In database terms, aheirarchy 
is an insufficiently general model for real-world problems, in other 
words, they never worked.  Tables work.  That's where I'm trying to go 
with this, so please bear with me.  This is not just a solution in 
search of a problem.

  Correct me if I'm wrong, but what we learn from the proc example
  is that tarring your whole source tree starting at / is not
  something you want to do.

 IMHO it would be better to fix proc instead of adding more magic.  At
 the moment you have to exclude /proc.  You want to add /dev.

Well, actually no, ls -R, tar, zip, etc, work pretty well with the 
scheme I've described.

 And
 next? Exclude all $HOME/dev (in case process name spaces get added)? 
 Or make fifos magic too and add all of them to the exclude list?  But
 there's no central place for fifos.  So lets add more magic :-(

No, no, no, agreed and sometimes magic is good.  It's not deep magic.  
The only new thing here is the interpretation of the O_DIRECTORY flag, 
or rather, the lack of it.

  What *won't* happen is, you won't get side effects from opening
  your serial ports (you'd have to open them without O_DIRECTORY
  to get that) so that seems like a little step forward.

 As already said: depending on O_DIRECTORY breaks POSIX compliance
 and that alone should kill this idea...

Thanks, two good points:
  - libc5 will get confused when doing ls in /magicdev
  - POSIX specifically forbids this

I'll put this away until I've specifically dug into both of them.  OK, 
over and out, thanks for your commentary.

/me peruses man pages

Oops, oh wait, there's already another open point: your breakage 
examples both rely on opening ..  You're right, . should always be 
a directory and I believe that's enforced by the VFS.  So we don't have 
an example of breakage yet.

--
Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Edgar Toernig

Daniel Phillips wrote:
 
 Oops, oh wait, there's already another open point: your breakage
 examples both rely on opening ..  You're right, . should always be
 a directory and I believe that's enforced by the VFS.  So we don't have
 an example of breakage yet.

That's just because I did a simple ls.  But it doesn't make a
difference.  The magicdevs _are_ directories and

chdir(magicdev);
open(., O_RDONLY);

shouldn't open the device.

Ciao, ET.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Hans Reiser

Daniel Phillips wrote:
> 
> On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
> > Peter Braam writes:
> > > File system journal recovery can corrupt a snapshot, because it
> > > copies data that needs to be preserved in a snapshot. During
> > > journal replay such data may be copied again, but the source can
> > > have new data already.
> >
> > The way it is implemented in reiserfs is to wait for existing
> > transactions to complete, entirely flush the journal and block all
> > new transactions from starting.  Stephen implemented a journal flush
> > API to do this for ext3, but the hooks to call it from LVM are not in
> > place yet.  This way the journal is totally empty at the time the
> > snapshot is done, so the read-only copy does not need to do journal
> > recovery, so no problems can arise.
> 
> I suppose I'm just reiterating the obvious, but we should eventually
> have a generic filesystem transaction API at the VFS level, once we
> have enough data points to know what the One True API should be.
> 
> --
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Daniel, implementing transactions is not a trivial thing as you probably know. 
It requires that you resolve such issues as, what happens if the user forgets to
close the transaction, issues of lock/transaction duration, of transaction
batching, of levels of isolation, of concurrent transactions modifying global fs
metadata and some but not all of those concurrent transactions receiving a
rollback, and of permissions relating to keeping transactions open.  I would
encourage you to participate in the reiser4 design discussion we will be having
over the next 6 months, and give us your opinions.  Josh will be leading that
design effort for the ReiserFS team.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Alexander Viro



On Thu, 24 May 2001, Edgar Toernig wrote:

> > What *won't* happen is, you won't get side effects from opening
> > your serial ports (you'd have to open them without O_DIRECTORY
> > to get that) so that seems like a little step forward.
> 
> As already said: depending on O_DIRECTORY breaks POSIX compliance
> and that alone should kill this idea...

What really kills that idea is the fact that you can trick applications
into opening your serial ports _without_ O_DIRECTORY.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips

On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
> Peter Braam writes:
> > File system journal recovery can corrupt a snapshot, because it
> > copies data that needs to be preserved in a snapshot. During
> > journal replay such data may be copied again, but the source can
> > have new data already.
>
> The way it is implemented in reiserfs is to wait for existing
> transactions to complete, entirely flush the journal and block all
> new transactions from starting.  Stephen implemented a journal flush
> API to do this for ext3, but the hooks to call it from LVM are not in
> place yet.  This way the journal is totally empty at the time the
> snapshot is done, so the read-only copy does not need to do journal
> recovery, so no problems can arise.

I suppose I'm just reiterating the obvious, but we should eventually
have a generic filesystem transaction API at the VFS level, once we
have enough data points to know what the One True API should be.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips

On Thursday 24 May 2001 02:23, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > > > It's going to be marked 'd', it's a directory, not a file.
> > >
> > > Aha.  So you lose the S_ISCHR/BLK attribute.
> >
> > Readdir fills in a directory type, so ls sees it as a directory and
> > does the right thing.  On the other hand, we know we're on a device
> > filesystem so we will next open the name as a regular file, and
> > find ISCHR or ISBLK: good.
>
> ??? The kernel may know it, but the app?  Or do you really want to
> give different stat data on stat(2) and fstat(2)?  These flags are
> currently used by archive/backup prgs.  It's a hint that these files
> are not regular files and shouldn't be opened for reading.
> Having a 'd' would mean that they would really try to enter the
> directory and save it's contents.  Don't know what happens in this
> case to your "special" files ;-)

I guess that's much like the question 'what happens in proc?'.

Recursively entering the device directory is ok as long as everything
inside it is ok.  I tried zipping /proc/bus -r and what I got is what I'd
expect if I'd cat'ed every non-directory entry.  This is what I
expected.  Maybe it's not right - zipping /proc/kcore is kind of
interesting.  Regardless, we are no worse than proc here.  In fact,
since we don't anticipate putting an elephant like kcore in as a
device property, we're a little nicer to get along with.

Correct me if I'm wrong, but what we learn from the proc example
is that tarring your whole source tree starting at / is not something
you want to do.  Just extend that idea to /dev - however, if you do
it, it will produce pretty reasonable results.

What *won't* happen is, you won't get side effects from opening
your serial ports (you'd have to open them without O_DIRECTORY
to get that) so that seems like a little step forward.

I'm still thinking about some of your other comments.

--
Daniel

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Malcolm Beattie

[cc list reduced]

Andreas Dilger writes:
> PS - I used to think shrinking a filesystem online was useful, but there
>  are a huge amount of problems with this and very few real-life
>  benefits, as long as you can at least do offline shrinking.  With
>  proper LVM usage, the need to shrink a filesystem never really
>  happens in practise, unlike the partition case where you always
>  have to guess in advance how big a filesystem needs to be, and then
>  add 10% for a safety margin.  With LVM you just create the minimal
>  sized device you need now, and freely grow it in the future.

In an attempt to nudge you back towards your previous opinion: consider
a system-wide spool or tmp filesystem. It would be nice to be able to
add in a few extra volumes for a busy period but then shrink it down
again when usage returns to normal. In the absence of the ability to
shrink a live filesystem, storage management becomes a much harder job.
You can't throw in a spare volume or two where it's needed without
careful thought because you'll be ratchetting up the space on that one
filesystem without being able to change your mind and reduce it again
later. You'll end up with stingy storage admins who refuse to give you
a bunch of extra filesystem space for a while because they can't get it
back again afterwards.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Malcolm Beattie

[cc list reduced]

Andreas Dilger writes:
 PS - I used to think shrinking a filesystem online was useful, but there
  are a huge amount of problems with this and very few real-life
  benefits, as long as you can at least do offline shrinking.  With
  proper LVM usage, the need to shrink a filesystem never really
  happens in practise, unlike the partition case where you always
  have to guess in advance how big a filesystem needs to be, and then
  add 10% for a safety margin.  With LVM you just create the minimal
  sized device you need now, and freely grow it in the future.

In an attempt to nudge you back towards your previous opinion: consider
a system-wide spool or tmp filesystem. It would be nice to be able to
add in a few extra volumes for a busy period but then shrink it down
again when usage returns to normal. In the absence of the ability to
shrink a live filesystem, storage management becomes a much harder job.
You can't throw in a spare volume or two where it's needed without
careful thought because you'll be ratchetting up the space on that one
filesystem without being able to change your mind and reduce it again
later. You'll end up with stingy storage admins who refuse to give you
a bunch of extra filesystem space for a while because they can't get it
back again afterwards.

--Malcolm

-- 
Malcolm Beattie [EMAIL PROTECTED]
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Hans Reiser

Daniel Phillips wrote:
 
 On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
  Peter Braam writes:
   File system journal recovery can corrupt a snapshot, because it
   copies data that needs to be preserved in a snapshot. During
   journal replay such data may be copied again, but the source can
   have new data already.
 
  The way it is implemented in reiserfs is to wait for existing
  transactions to complete, entirely flush the journal and block all
  new transactions from starting.  Stephen implemented a journal flush
  API to do this for ext3, but the hooks to call it from LVM are not in
  place yet.  This way the journal is totally empty at the time the
  snapshot is done, so the read-only copy does not need to do journal
  recovery, so no problems can arise.
 
 I suppose I'm just reiterating the obvious, but we should eventually
 have a generic filesystem transaction API at the VFS level, once we
 have enough data points to know what the One True API should be.
 
 --
 Daniel
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


Daniel, implementing transactions is not a trivial thing as you probably know. 
It requires that you resolve such issues as, what happens if the user forgets to
close the transaction, issues of lock/transaction duration, of transaction
batching, of levels of isolation, of concurrent transactions modifying global fs
metadata and some but not all of those concurrent transactions receiving a
rollback, and of permissions relating to keeping transactions open.  I would
encourage you to participate in the reiser4 design discussion we will be having
over the next 6 months, and give us your opinions.  Josh will be leading that
design effort for the ReiserFS team.

Hans
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Alexander Viro



On Thu, 24 May 2001, Edgar Toernig wrote:

  What *won't* happen is, you won't get side effects from opening
  your serial ports (you'd have to open them without O_DIRECTORY
  to get that) so that seems like a little step forward.
 
 As already said: depending on O_DIRECTORY breaks POSIX compliance
 and that alone should kill this idea...

What really kills that idea is the fact that you can trick applications
into opening your serial ports _without_ O_DIRECTORY.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips

On Thursday 24 May 2001 02:23, Edgar Toernig wrote:
 Daniel Phillips wrote:
It's going to be marked 'd', it's a directory, not a file.
  
   Aha.  So you lose the S_ISCHR/BLK attribute.
 
  Readdir fills in a directory type, so ls sees it as a directory and
  does the right thing.  On the other hand, we know we're on a device
  filesystem so we will next open the name as a regular file, and
  find ISCHR or ISBLK: good.

 ??? The kernel may know it, but the app?  Or do you really want to
 give different stat data on stat(2) and fstat(2)?  These flags are
 currently used by archive/backup prgs.  It's a hint that these files
 are not regular files and shouldn't be opened for reading.
 Having a 'd' would mean that they would really try to enter the
 directory and save it's contents.  Don't know what happens in this
 case to your special files ;-)

I guess that's much like the question 'what happens in proc?'.

Recursively entering the device directory is ok as long as everything
inside it is ok.  I tried zipping /proc/bus -r and what I got is what I'd
expect if I'd cat'ed every non-directory entry.  This is what I
expected.  Maybe it's not right - zipping /proc/kcore is kind of
interesting.  Regardless, we are no worse than proc here.  In fact,
since we don't anticipate putting an elephant like kcore in as a
device property, we're a little nicer to get along with.

Correct me if I'm wrong, but what we learn from the proc example
is that tarring your whole source tree starting at / is not something
you want to do.  Just extend that idea to /dev - however, if you do
it, it will produce pretty reasonable results.

What *won't* happen is, you won't get side effects from opening
your serial ports (you'd have to open them without O_DIRECTORY
to get that) so that seems like a little step forward.

I'm still thinking about some of your other comments.

--
Daniel

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips

On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
 Peter Braam writes:
  File system journal recovery can corrupt a snapshot, because it
  copies data that needs to be preserved in a snapshot. During
  journal replay such data may be copied again, but the source can
  have new data already.

 The way it is implemented in reiserfs is to wait for existing
 transactions to complete, entirely flush the journal and block all
 new transactions from starting.  Stephen implemented a journal flush
 API to do this for ext3, but the hooks to call it from LVM are not in
 place yet.  This way the journal is totally empty at the time the
 snapshot is done, so the read-only copy does not need to do journal
 recovery, so no problems can arise.

I suppose I'm just reiterating the obvious, but we should eventually
have a generic filesystem transaction API at the VFS level, once we
have enough data points to know what the One True API should be.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Edgar Toernig

Daniel Phillips wrote:
> On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> > Daniel Phillips wrote:
> > > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > > > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > > > What I'd like to see:
> > > > > >
> > > > > > - An interface for registering an array of related devices
> > > > > > (almost always two: raw and ctl) and their legacy device
> > > > > > numbers with a single userspace callout that does whatever
> > > > > > /dev/ creation needs to be done. Thus, naming and permissions
> > > > > > live in user space. No "device node is also a directory"
> > > > > > weirdness...
> > > > >
> > > > > Could you be specific about what is weird about it?
> > > >
> > > > *boggle*
> > > >
> > > >[general sense of unease]
> >
> > I fully agree with Oliver.  It's an abomination.
> 
> We are, or at least, I am, investigating this question purely on
> technical grounds - name calling is a noop.

Right.  But sometimes new ideas raise these kind of feelings ;)

> > > It's going to be marked 'd', it's a directory, not a file.
> >
> > Aha.  So you lose the S_ISCHR/BLK attribute.
> 
> Readdir fills in a directory type, so ls sees it as a directory and does
> the right thing.  On the other hand, we know we're on a device
> filesystem so we will next open the name as a regular file, and find
> ISCHR or ISBLK: good.

??? The kernel may know it, but the app?  Or do you really want to
give different stat data on stat(2) and fstat(2)?  These flags are
currently used by archive/backup prgs.  It's a hint that these files
are not regular files and shouldn't be opened for reading.
Having a 'd' would mean that they would really try to enter the
directory and save it's contents.  Don't know what happens in this
case to your "special" files ;-)

> The rule for this filesystem is: if you open with O_DIRECTORY then
> directory operations are permitted, nothing else.  If you open without
> O_DIRECTORY then directory operations are forbidden (as
> usual) and normal device semantics apply.

As usual?  I think you've just changed the rules for O_DIRECTORY.  Up
to now it's only a flag that tells open it should fail if the name
does not refer to a directory.  Nothing else.  It was introduced to
remove a race condition in user space applications.  Especially it
is optional - everything works the same whether you give the flag
or not (except the race avoidance of course).  And there are a lot
of programs that do not use O_DIRECTORY (it's a Linux private flag,
not even mentioned in POSIX).  Every program that does:

fd = open(foo, O_RDONLY);
fchdir(fd);
x = opendir(".")

will break.  And that is POSIX conform.  And I know that there are
programs that use this when recursively scanning directories (avoids
name mangling and repeated name lookups of the directory on later
stat calls).

> > Directories are not allowed to be read from/written to.  The VFS may
> > support it, but it's not (current) UNIX.
> 
> Here, we obey this rule: if you open it with O_DIRECTORY then you
> can't read from or write to it.

IMHO you've just invented opendir(2).

> Nothing breaks here, ls works as it always did.
> 
> This is what ls does:
> 
> open("foobar", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
> fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
> brk(0x805b000)  = 0x805b000
> getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
> getdents(3, /* 2 entries */, 2980)  = 28
> getdents(3, /* 0 entries */, 2980)  = 0
> close(3)= 0
> 
> Note that ls doesn't do anything as inconvenient as opening
> foobar as a normal file first, expecting that operation to fail.

Well, your ls does not work "as it always did".  Here's an strace of
my libc5 system ls:

open(".", O_RDONLY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
getdents(3, /* 64 entries */, 4096) = 1216
getdents(3, /* 9 entries */, 4096)  = 168
getdents(3, /* 0 entries */, 4096)  = 0
close(3)= 0

And my find(1) does:

open(".", O_RDONLY) = 3
[scan all dirs]
fchdir(3)   = 0

to return to its initial dir.  Will break too.

> No, you would get side effects only if you open as a regular file.

IMHO your assumption that opening a dir _requires_ O_DIRECTORY is
wrong.  You've put in a new semantic that has not been there and
that will break programs and POSIX conformance.

> Please, if you know something that actually breaks, tell me.

Yeah, see above ;)

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips

On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > > What I'd like to see:
> > > > >
> > > > > - An interface for registering an array of related devices
> > > > > (almost always two: raw and ctl) and their legacy device
> > > > > numbers with a single userspace callout that does whatever
> > > > > /dev/ creation needs to be done. Thus, naming and permissions
> > > > > live in user space. No "device node is also a directory"
> > > > > weirdness...
> > > >
> > > > Could you be specific about what is weird about it?
> > >
> > > *boggle*
> > >
> > >[general sense of unease]
>
> I fully agree with Oliver.  It's an abomination.

We are, or at least, I am, investigating this question purely on
technical grounds - name calling is a noop.  I'd be happy to find a
real reason why this is a bad idea but so far none has been
presented.

Don't get me wrong, the fact that people I respect have reservations
about the idea does mean something to me, but this still needs to be
investigated properly.  Now on to the technical content...

> > > I don't think it's likely to be even workable. Just consider the
> > > directory entry for a moment - is it going to be marked d or
> > > [cb]?
> >
> > It's going to be marked 'd', it's a directory, not a file.
>
> Aha.  So you lose the S_ISCHR/BLK attribute.

Readdir fills in a directory type, so ls sees it as a directory and does
the right thing.  On the other hand, we know we're on a device 
filesystem so we will next open the name as a regular file, and find
ISCHR or ISBLK: good.

The rule for this filesystem is: if you open with O_DIRECTORY then
directory operations are permitted, nothing else.  If you open without
O_DIRECTORY then directory operations are forbidden (as
usual) and normal device semantics apply.

If there is weirdness anywhere, it's right here with this rule.  The
question is: what if anything breaks?

> > > If it doesn't have the directory bit set, Midnight commander
> > > won't let me look at it, and I wouldn't blame cd or ls for
> > > complaining. If it does have the 'd' bit set, I wouldn't blame
> > > cp, tar, find, or a million other programs if they did the wrong
> > > thing. They've had 30 years to expect that files aren't
> > > directories. They're going to act weird.
> >
> > No problem, it's a directory.
>
> Directories are not allowed to be read from/written to.  The VFS may
> support it, but it's not (current) UNIX.

Here, we obey this rule: if you open it with O_DIRECTORY then you
can't read from or write to it.

> > > Linus has been kicking this idea around for a couple years now
> > > and it's still a cute solution looking for a problem. It just
> > > doesn't belong in UNIX.
> >
> > Hmm, ok, do we still have any *technical* reasons?
>
> So with your definition, I have a fs-object that is marked as a
> directory but opening it opens a device.  Pretty nice..

No, you have to open it without O_DIRECTORY to get your device
fd handle.

> How I'm supposed to list it's contents?  open+readdir?

Nothing breaks here, ls works as it always did.

This is what ls does:

open("foobar", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
brk(0x805b000)  = 0x805b000
getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
getdents(3, /* 2 entries */, 2980)  = 28
getdents(3, /* 0 entries */, 2980)  = 0
close(3)= 0

Note that ls doesn't do anything as inconvenient as opening 
foobar as a normal file first, expecting that operation to fail.

> But the open has nasty side effects.
> So you have a directory that you are not allowed
> to list (because of the possible side effects) but is allowed to be
> read from/written to maybe even issue ioctls to?. 

No, you would get side effects only if you open as a regular file.
I'd agree that that sucks, but that's not what we're trying to fix
just now.

> And you call that sane???

I would hope it seems saner now, after the clarification.
Please, if you know something that actually breaks, tell me.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips

On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> IMO the whole idea of arguments following the device name is junk
> (incl a "/ctrl").

You know I didn't suggest that, right?  I find it pretty strange too, but
I'm listening to hear the technical arguments.

> Just think about the implications of the original "/dev/ttyS0/19200"
> suggestion.  It sounds nice and tempting.  But which programs will
> benefit.  Which gets confused.  What will be cleaned up.  After some
> thoughts you'll find out that it's useless ;-)

You know I didn't suggest that either, right?  But I'm with you, I don't
like it at'all, not least because we might change baud rate on the fly.

> And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
> This _may_ work for some kind of devices.  But serial ports are one
> example where it simply will _not_.  It requires that you know the
> name of the device.  For ttys this is often not the case.
> Even if you manage to get some name for stdin for example - now I 
> should simply attach a "ctrl" to that name to get a control channel???
> At least dangerous.  If I'm lucky I only get an EPERM...

Again, I'll provide a sympathetic ear, but it wasn't my suggestion.

> Ciao, ET.

And you were referring to who?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips

On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
 Daniel Phillips wrote:
  On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
   On Mon, 21 May 2001, Daniel Phillips wrote:
On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
 What I'd like to see:

 - An interface for registering an array of related devices
 (almost always two: raw and ctl) and their legacy device
 numbers with a single userspace callout that does whatever
 /dev/ creation needs to be done. Thus, naming and permissions
 live in user space. No device node is also a directory
 weirdness...
   
Could you be specific about what is weird about it?
  
   *boggle*
  
  [general sense of unease]

 I fully agree with Oliver.  It's an abomination.

We are, or at least, I am, investigating this question purely on
technical grounds - name calling is a noop.  I'd be happy to find a
real reason why this is a bad idea but so far none has been
presented.

Don't get me wrong, the fact that people I respect have reservations
about the idea does mean something to me, but this still needs to be
investigated properly.  Now on to the technical content...

   I don't think it's likely to be even workable. Just consider the
   directory entry for a moment - is it going to be marked d or
   [cb]?
 
  It's going to be marked 'd', it's a directory, not a file.

 Aha.  So you lose the S_ISCHR/BLK attribute.

Readdir fills in a directory type, so ls sees it as a directory and does
the right thing.  On the other hand, we know we're on a device 
filesystem so we will next open the name as a regular file, and find
ISCHR or ISBLK: good.

The rule for this filesystem is: if you open with O_DIRECTORY then
directory operations are permitted, nothing else.  If you open without
O_DIRECTORY then directory operations are forbidden (as
usual) and normal device semantics apply.

If there is weirdness anywhere, it's right here with this rule.  The
question is: what if anything breaks?

   If it doesn't have the directory bit set, Midnight commander
   won't let me look at it, and I wouldn't blame cd or ls for
   complaining. If it does have the 'd' bit set, I wouldn't blame
   cp, tar, find, or a million other programs if they did the wrong
   thing. They've had 30 years to expect that files aren't
   directories. They're going to act weird.
 
  No problem, it's a directory.

 Directories are not allowed to be read from/written to.  The VFS may
 support it, but it's not (current) UNIX.

Here, we obey this rule: if you open it with O_DIRECTORY then you
can't read from or write to it.

   Linus has been kicking this idea around for a couple years now
   and it's still a cute solution looking for a problem. It just
   doesn't belong in UNIX.
 
  Hmm, ok, do we still have any *technical* reasons?

 So with your definition, I have a fs-object that is marked as a
 directory but opening it opens a device.  Pretty nice..

No, you have to open it without O_DIRECTORY to get your device
fd handle.

 How I'm supposed to list it's contents?  open+readdir?

Nothing breaks here, ls works as it always did.

This is what ls does:

open(foobar, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
brk(0x805b000)  = 0x805b000
getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
getdents(3, /* 2 entries */, 2980)  = 28
getdents(3, /* 0 entries */, 2980)  = 0
close(3)= 0

Note that ls doesn't do anything as inconvenient as opening 
foobar as a normal file first, expecting that operation to fail.

 But the open has nasty side effects.
 So you have a directory that you are not allowed
 to list (because of the possible side effects) but is allowed to be
 read from/written to maybe even issue ioctls to?. 

No, you would get side effects only if you open as a regular file.
I'd agree that that sucks, but that's not what we're trying to fix
just now.

 And you call that sane???

I would hope it seems saner now, after the clarification.
Please, if you know something that actually breaks, tell me.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Edgar Toernig

Daniel Phillips wrote:
 On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
  Daniel Phillips wrote:
   On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
On Mon, 21 May 2001, Daniel Phillips wrote:
 On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
  What I'd like to see:
 
  - An interface for registering an array of related devices
  (almost always two: raw and ctl) and their legacy device
  numbers with a single userspace callout that does whatever
  /dev/ creation needs to be done. Thus, naming and permissions
  live in user space. No device node is also a directory
  weirdness...

 Could you be specific about what is weird about it?
   
*boggle*
   
   [general sense of unease]
 
  I fully agree with Oliver.  It's an abomination.
 
 We are, or at least, I am, investigating this question purely on
 technical grounds - name calling is a noop.

Right.  But sometimes new ideas raise these kind of feelings ;)

   It's going to be marked 'd', it's a directory, not a file.
 
  Aha.  So you lose the S_ISCHR/BLK attribute.
 
 Readdir fills in a directory type, so ls sees it as a directory and does
 the right thing.  On the other hand, we know we're on a device
 filesystem so we will next open the name as a regular file, and find
 ISCHR or ISBLK: good.

??? The kernel may know it, but the app?  Or do you really want to
give different stat data on stat(2) and fstat(2)?  These flags are
currently used by archive/backup prgs.  It's a hint that these files
are not regular files and shouldn't be opened for reading.
Having a 'd' would mean that they would really try to enter the
directory and save it's contents.  Don't know what happens in this
case to your special files ;-)

 The rule for this filesystem is: if you open with O_DIRECTORY then
 directory operations are permitted, nothing else.  If you open without
 O_DIRECTORY then directory operations are forbidden (as
 usual) and normal device semantics apply.

As usual?  I think you've just changed the rules for O_DIRECTORY.  Up
to now it's only a flag that tells open it should fail if the name
does not refer to a directory.  Nothing else.  It was introduced to
remove a race condition in user space applications.  Especially it
is optional - everything works the same whether you give the flag
or not (except the race avoidance of course).  And there are a lot
of programs that do not use O_DIRECTORY (it's a Linux private flag,
not even mentioned in POSIX).  Every program that does:

fd = open(foo, O_RDONLY);
fchdir(fd);
x = opendir(.)

will break.  And that is POSIX conform.  And I know that there are
programs that use this when recursively scanning directories (avoids
name mangling and repeated name lookups of the directory on later
stat calls).

  Directories are not allowed to be read from/written to.  The VFS may
  support it, but it's not (current) UNIX.
 
 Here, we obey this rule: if you open it with O_DIRECTORY then you
 can't read from or write to it.

IMHO you've just invented opendir(2).

 Nothing breaks here, ls works as it always did.
 
 This is what ls does:
 
 open(foobar, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
 fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
 brk(0x805b000)  = 0x805b000
 getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
 getdents(3, /* 2 entries */, 2980)  = 28
 getdents(3, /* 0 entries */, 2980)  = 0
 close(3)= 0
 
 Note that ls doesn't do anything as inconvenient as opening
 foobar as a normal file first, expecting that operation to fail.

Well, your ls does not work as it always did.  Here's an strace of
my libc5 system ls:

open(., O_RDONLY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
getdents(3, /* 64 entries */, 4096) = 1216
getdents(3, /* 9 entries */, 4096)  = 168
getdents(3, /* 0 entries */, 4096)  = 0
close(3)= 0

And my find(1) does:

open(., O_RDONLY) = 3
[scan all dirs]
fchdir(3)   = 0

to return to its initial dir.  Will break too.

 No, you would get side effects only if you open as a regular file.

IMHO your assumption that opening a dir _requires_ O_DIRECTORY is
wrong.  You've put in a new semantic that has not been there and
that will break programs and POSIX conformance.

 Please, if you know something that actually breaks, tell me.

Yeah, see above ;)

Ciao, ET.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips

On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
 IMO the whole idea of arguments following the device name is junk
 (incl a /ctrl).

You know I didn't suggest that, right?  I find it pretty strange too, but
I'm listening to hear the technical arguments.

 Just think about the implications of the original /dev/ttyS0/19200
 suggestion.  It sounds nice and tempting.  But which programs will
 benefit.  Which gets confused.  What will be cleaned up.  After some
 thoughts you'll find out that it's useless ;-)

You know I didn't suggest that either, right?  But I'm with you, I don't
like it at'all, not least because we might change baud rate on the fly.

 And with special ctrl devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
 This _may_ work for some kind of devices.  But serial ports are one
 example where it simply will _not_.  It requires that you know the
 name of the device.  For ttys this is often not the case.
 Even if you manage to get some name for stdin for example - now I 
 should simply attach a ctrl to that name to get a control channel???
 At least dangerous.  If I'm lucky I only get an EPERM...

Again, I'll provide a sympathetic ear, but it wasn't my suggestion.

 Ciao, ET.

And you were referring to who?

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Alexander Viro



On Wed, 23 May 2001, Edgar Toernig wrote:

> And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
> This _may_ work for some kind of devices.  But serial ports are one
> example where it simply will _not_.  It requires that you know the

That's quite funny, you know...


From: Dennis Ritchie ([EMAIL PROTECTED])
Subject: Re: Plan 9 (was Re: Rubouts)
Newsgroups: alt.folklore.computers
Date: 1998/10/12
   
Neil Franklin wrote:
>
> No ioctl()s?
>
> Something like:echo "38400,8,n,1" > /ioctrl/ttyS0?
>
> Now that would be cool.
>
Exactly like that, though it would be /dev/eia80ctl .
No ioctl().

> Is there anyone who has an URL about Plan 9. Code download?
>

 http://plan9.bell-labs.com/plan9


Dennis


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > What I'd like to see:
> > > >
> > > > - An interface for registering an array of related devices
> > > > (almost always two: raw and ctl) and their legacy device numbers
> > > > with a single userspace callout that does whatever /dev/ creation
> > > > needs to be done. Thus, naming and permissions live in user
> > > > space. No "device node is also a directory" weirdness...
> > >
> > > Could you be specific about what is weird about it?
> >
> > *boggle*
> >
> >[general sense of unease]

I fully agree with Oliver.  It's an abomination.

> > I don't think it's likely to be even workable. Just consider the
> > directory entry for a moment - is it going to be marked d or [cb]?
> 
> It's going to be marked 'd', it's a directory, not a file.

Aha.  So you lose the S_ISCHR/BLK attribute.

> > If it doesn't have the directory bit set, Midnight commander won't
> > let me look at it, and I wouldn't blame cd or ls for complaining. If it
> > does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
> > million other programs if they did the wrong thing. They've had 30
> > years to expect that files aren't directories. They're going to act
> > weird.
> 
> No problem, it's a directory.

Directories are not allowed to be read from/written to.  The VFS may
support it, but it's not (current) UNIX.

> > Linus has been kicking this idea around for a couple years now and
> > it's still a cute solution looking for a problem. It just doesn't
> > belong in UNIX.
> 
> Hmm, ok, do we still have any *technical* reasons?

So with your definition, I have a fs-object that is marked as a directory
but opening it opens a device.  Pretty nice.  How I'm supposed to list
it's contents?  open+readdir?  But the open has nasty side effects.
So you have a directory that you are not allowed to list (because of the
possible side effects) but is allowed to be read from/written to maybe
even issue ioctls to?.  And you call that sane???

IMO the whole idea of arguments following the device name is junk (incl
a "/ctrl").

Just think about the implications of the original "/dev/ttyS0/19200"
suggestion.  It sounds nice and tempting.  But which programs will
benefit.  Which gets confused.  What will be cleaned up.  After some
thoughts you'll find out that it's useless ;-)

And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
This _may_ work for some kind of devices.  But serial ports are one
example where it simply will _not_.  It requires that you know the
name of the device.  For ttys this is often not the case.  Even if
you manage to get some name for stdin for example - now I should
simply attach a "ctrl" to that name to get a control channel???
At least dangerous.  If I'm lucky I only get an EPERM...

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam


Andreas,


I think that the issue is something different.  Suppose the snapshot has
been created. I know that this can be done safely with the API's you
allude to. Life goes on and the journal FS keeps changing the file system
and if the system doesn't crash, everything is fine: blocks get copied
correctly from the primary volume to the snapshot volume.

Now consider a crash -- not during snapshot creation, but way after that
when "life is going on".  Suppose there is a two block transaction that
has made it to the journal and after writing one block to the fs location
the system crashes.  The journal replay will try to write that block
again.

But during recovery, LVM cannot possibly know if the whole process of
copying out the data from the current to the snapshot area completed
during the previous run. Yes, LVM updates the redirection table first and
then copies, but, still, you don't know _where exactly_ the writes stopped
happening and in particular you don't know if the block was copied already
or not.

So during replay it is quite possible that LVM corrupts the snapshot.

It's better to keep the snapshot in the old volume and write the new data
to a separate area (that's what most commercial systems do I think).  It
avoid redirections and copying upon write.  When you delete the snapshot
you have to copy, but you can do that as a low priority process.
Finally, as you pointed out a full volume is handled better too in that
way, since you don't terminate the snapshot but you tell the current
volume that it is full.

Hmm, I was expecting a storm of email explaining what I have
misunderstood, but it has in fact been rather quiet...

- Peter -






On Tue, 22 May 2001, Andreas Dilger wrote:

> Peter Braam writes:
> > On Tue, 22 May 2001, Andreas Dilger wrote:
> > > Actually, the LVM snapshot
> > > interface has (optional) hooks into the filesystem to ensure that it
> > > is consistent at the time the snapshot is created.
> >
> > File system journal recovery can corrupt a snapshot, because it copies
> > data that needs to be preserved in a snapshot. During journal replay such
> > data may be copied again, but the source can have new data already.
>
> The way it is implemented in reiserfs is to wait for existing transactions
> to complete, entirely flush the journal and block all new transactions from
> starting.  Stephen implemented a journal flush API to do this for ext3, but
> the hooks to call it from LVM are not in place yet.  This way the journal is
> totally empty at the time the snapshot is done, so the read-only copy does
> not need to do journal recovery, so no problems can arise.
>
> Cheers, Andreas
>

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips

On Tuesday 22 May 2001 19:49, Oliver Xymoron wrote:
> On Tue, 22 May 2001, Daniel Phillips wrote:
> > > I don't think it's likely to be even workable. Just consider the
> > > directory entry for a moment - is it going to be marked d or
> > > [cb]?
> >
> > It's going to be marked 'd', it's a directory, not a file.
>
> Are we talking about the same proposal?  The one where I can open
> /dev/dsp and /dev/dsp/ctl? But I can still do 'cat /dev/hda >
> /dev/dsp'?

We already support read/write on directories in the VFS, that's not a
problem.

> It's still a file. If it's not a file anymore, it ain't UNIX.

It's a file with the directory bit set, I believe that's UNIX.

> > > If it doesn't have the directory bit set, Midnight commander
> > > won't let me look at it, and I wouldn't blame cd or ls for
> > > complaining. If it does have the 'd' bit set, I wouldn't blame
> > > cp, tar, find, or a million other programs if they did the wrong
> > > thing. They've had 30 years to expect that files aren't
> > > directories. They're going to act weird.
> >
> > No problem, it's a directory.
> >
> > > Linus has been kicking this idea around for a couple years now
> > > and it's still a cute solution looking for a problem. It just
> > > doesn't belong in UNIX.
> >
> > Hmm, ok, do we still have any *technical* reasons?
>
> If you define *technical* to not include design, sure.

Sorry, I don't see what you mean, do you mean the design is
difficult?

> Oh, did I mention unnecessary, solvable in userspace?

That's exactly the point: the generic filesystem allows all the
funny-shaped stuff to be dealt with in user space.  The
filesystem itself is lovely and clean.

BTW, I didn't realize I was reinventing Linus's wheel, this just
seemed very obvious and natural to me.  So I had to believe
there's a technical obstacle somewhere.

Has anyone written code to demonstrate the idea?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam


On Tue, 22 May 2001, Linus Torvalds wrote:
>


> On Tue, 22 May 2001, Andreas Dilger wrote:  Actually, the LVM snapshot
> interface has (optional) hooks into the filesystem to ensure that it
> is consistent at the time the snapshot is created.

But I think that LVM is implemented "the wrong way around".

File system journal recovery can corrupt a snapshot, because it copies
data that needs to be preserved in a snapshot. During journal replay such
data may be copied again, but the source can have new data already.

Most LVM snapshot systems write the new data in the separate volume and
don't copy the old data that eliminates this problem (and also eliminates
the copy of data but introduces data copy when a snapshot is removed).

- Peter -

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Linus Torvalds


On Tue, 22 May 2001, Andreas Dilger wrote:
> 
> Actually, the LVM snapshot interface has (optional) hooks into the filesystem
> to ensure that it is consistent at the time the snapshot is created.

Note that this is still fundamentally a broken interface: the filesystem
may not _have_ a block device underneath it, yet you might very well like
to do defragmentation and backup none-the-less.

Also, lvm snapshots are fundamentally limited to read-only data, which
means that the LVM interfaces cannot be used for defragmentation and lazy
fsck etc anyway. You _have_ to do those at a filesystem level.

disk snapshots are useful, but they are not the answer.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips

On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> On Mon, 21 May 2001, Daniel Phillips wrote:
> > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > What I'd like to see:
> > >
> > > - An interface for registering an array of related devices
> > > (almost always two: raw and ctl) and their legacy device numbers
> > > with a single userspace callout that does whatever /dev/ creation
> > > needs to be done. Thus, naming and permissions live in user
> > > space. No "device node is also a directory" weirdness...
> >
> > Could you be specific about what is weird about it?
>
> *boggle*
>
>[general sense of unease]
>
> I don't think it's likely to be even workable. Just consider the
> directory entry for a moment - is it going to be marked d or [cb]?

It's going to be marked 'd', it's a directory, not a file.

> If it doesn't have the directory bit set, Midnight commander won't
> let me look at it, and I wouldn't blame cd or ls for complaining. If it
> does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
> million other programs if they did the wrong thing. They've had 30
> years to expect that files aren't directories. They're going to act
> weird.

No problem, it's a directory.

> Linus has been kicking this idea around for a couple years now and
> it's still a cute solution looking for a problem. It just doesn't
> belong in UNIX.

Hmm, ok, do we still have any *technical* reasons?

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips

On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> What I'd like to see:
>
> - An interface for registering an array of related devices (almost
> always two: raw and ctl) and their legacy device numbers with a
> single userspace callout that does whatever /dev/ creation needs to
> be done. Thus, naming and permissions live in user space. No "device
> node is also a directory" weirdness...

Could you be specific about what is weird about it?

> ...which is overkill in the vast majority of cases.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips

On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
 On Mon, 21 May 2001, Daniel Phillips wrote:
  On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
   What I'd like to see:
  
   - An interface for registering an array of related devices
   (almost always two: raw and ctl) and their legacy device numbers
   with a single userspace callout that does whatever /dev/ creation
   needs to be done. Thus, naming and permissions live in user
   space. No device node is also a directory weirdness...
 
  Could you be specific about what is weird about it?

 *boggle*

[general sense of unease]

 I don't think it's likely to be even workable. Just consider the
 directory entry for a moment - is it going to be marked d or [cb]?

It's going to be marked 'd', it's a directory, not a file.

 If it doesn't have the directory bit set, Midnight commander won't
 let me look at it, and I wouldn't blame cd or ls for complaining. If it
 does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
 million other programs if they did the wrong thing. They've had 30
 years to expect that files aren't directories. They're going to act
 weird.

No problem, it's a directory.

 Linus has been kicking this idea around for a couple years now and
 it's still a cute solution looking for a problem. It just doesn't
 belong in UNIX.

Hmm, ok, do we still have any *technical* reasons?

--
Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Alexander Viro



On Wed, 23 May 2001, Edgar Toernig wrote:

 And with special ctrl devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
 This _may_ work for some kind of devices.  But serial ports are one
 example where it simply will _not_.  It requires that you know the

That's quite funny, you know...


From: Dennis Ritchie ([EMAIL PROTECTED])
Subject: Re: Plan 9 (was Re: Rubouts)
Newsgroups: alt.folklore.computers
Date: 1998/10/12
   
Neil Franklin wrote:

 No ioctl()s?

 Something like:echo 38400,8,n,1  /ioctrl/ttyS0?

 Now that would be cool.

Exactly like that, though it would be /dev/eia80ctl .
No ioctl().

 Is there anyone who has an URL about Plan 9. Code download?


 http://plan9.bell-labs.com/plan9


Dennis


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Edgar Toernig

Daniel Phillips wrote:
 
 On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
  On Mon, 21 May 2001, Daniel Phillips wrote:
   On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
What I'd like to see:
   
- An interface for registering an array of related devices
(almost always two: raw and ctl) and their legacy device numbers
with a single userspace callout that does whatever /dev/ creation
needs to be done. Thus, naming and permissions live in user
space. No device node is also a directory weirdness...
  
   Could you be specific about what is weird about it?
 
  *boggle*
 
 [general sense of unease]

I fully agree with Oliver.  It's an abomination.

  I don't think it's likely to be even workable. Just consider the
  directory entry for a moment - is it going to be marked d or [cb]?
 
 It's going to be marked 'd', it's a directory, not a file.

Aha.  So you lose the S_ISCHR/BLK attribute.

  If it doesn't have the directory bit set, Midnight commander won't
  let me look at it, and I wouldn't blame cd or ls for complaining. If it
  does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
  million other programs if they did the wrong thing. They've had 30
  years to expect that files aren't directories. They're going to act
  weird.
 
 No problem, it's a directory.

Directories are not allowed to be read from/written to.  The VFS may
support it, but it's not (current) UNIX.

  Linus has been kicking this idea around for a couple years now and
  it's still a cute solution looking for a problem. It just doesn't
  belong in UNIX.
 
 Hmm, ok, do we still have any *technical* reasons?

So with your definition, I have a fs-object that is marked as a directory
but opening it opens a device.  Pretty nice.  How I'm supposed to list
it's contents?  open+readdir?  But the open has nasty side effects.
So you have a directory that you are not allowed to list (because of the
possible side effects) but is allowed to be read from/written to maybe
even issue ioctls to?.  And you call that sane???

IMO the whole idea of arguments following the device name is junk (incl
a /ctrl).

Just think about the implications of the original /dev/ttyS0/19200
suggestion.  It sounds nice and tempting.  But which programs will
benefit.  Which gets confused.  What will be cleaned up.  After some
thoughts you'll find out that it's useless ;-)

And with special ctrl devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
This _may_ work for some kind of devices.  But serial ports are one
example where it simply will _not_.  It requires that you know the
name of the device.  For ttys this is often not the case.  Even if
you manage to get some name for stdin for example - now I should
simply attach a ctrl to that name to get a control channel???
At least dangerous.  If I'm lucky I only get an EPERM...

Ciao, ET.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Linus Torvalds


On Tue, 22 May 2001, Andreas Dilger wrote:
 
 Actually, the LVM snapshot interface has (optional) hooks into the filesystem
 to ensure that it is consistent at the time the snapshot is created.

Note that this is still fundamentally a broken interface: the filesystem
may not _have_ a block device underneath it, yet you might very well like
to do defragmentation and backup none-the-less.

Also, lvm snapshots are fundamentally limited to read-only data, which
means that the LVM interfaces cannot be used for defragmentation and lazy
fsck etc anyway. You _have_ to do those at a filesystem level.

disk snapshots are useful, but they are not the answer.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam


On Tue, 22 May 2001, Linus Torvalds wrote:



 On Tue, 22 May 2001, Andreas Dilger wrote:  Actually, the LVM snapshot
 interface has (optional) hooks into the filesystem to ensure that it
 is consistent at the time the snapshot is created.

But I think that LVM is implemented the wrong way around.

File system journal recovery can corrupt a snapshot, because it copies
data that needs to be preserved in a snapshot. During journal replay such
data may be copied again, but the source can have new data already.

Most LVM snapshot systems write the new data in the separate volume and
don't copy the old data that eliminates this problem (and also eliminates
the copy of data but introduces data copy when a snapshot is removed).

- Peter -

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam


Andreas,


I think that the issue is something different.  Suppose the snapshot has
been created. I know that this can be done safely with the API's you
allude to. Life goes on and the journal FS keeps changing the file system
and if the system doesn't crash, everything is fine: blocks get copied
correctly from the primary volume to the snapshot volume.

Now consider a crash -- not during snapshot creation, but way after that
when life is going on.  Suppose there is a two block transaction that
has made it to the journal and after writing one block to the fs location
the system crashes.  The journal replay will try to write that block
again.

But during recovery, LVM cannot possibly know if the whole process of
copying out the data from the current to the snapshot area completed
during the previous run. Yes, LVM updates the redirection table first and
then copies, but, still, you don't know _where exactly_ the writes stopped
happening and in particular you don't know if the block was copied already
or not.

So during replay it is quite possible that LVM corrupts the snapshot.

It's better to keep the snapshot in the old volume and write the new data
to a separate area (that's what most commercial systems do I think).  It
avoid redirections and copying upon write.  When you delete the snapshot
you have to copy, but you can do that as a low priority process.
Finally, as you pointed out a full volume is handled better too in that
way, since you don't terminate the snapshot but you tell the current
volume that it is full.

Hmm, I was expecting a storm of email explaining what I have
misunderstood, but it has in fact been rather quiet...

- Peter -






On Tue, 22 May 2001, Andreas Dilger wrote:

 Peter Braam writes:
  On Tue, 22 May 2001, Andreas Dilger wrote:
   Actually, the LVM snapshot
   interface has (optional) hooks into the filesystem to ensure that it
   is consistent at the time the snapshot is created.
 
  File system journal recovery can corrupt a snapshot, because it copies
  data that needs to be preserved in a snapshot. During journal replay such
  data may be copied again, but the source can have new data already.

 The way it is implemented in reiserfs is to wait for existing transactions
 to complete, entirely flush the journal and block all new transactions from
 starting.  Stephen implemented a journal flush API to do this for ext3, but
 the hooks to call it from LVM are not in place yet.  This way the journal is
 totally empty at the time the snapshot is done, so the read-only copy does
 not need to do journal recovery, so no problems can arise.

 Cheers, Andreas


-- 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips

On Tuesday 22 May 2001 19:49, Oliver Xymoron wrote:
 On Tue, 22 May 2001, Daniel Phillips wrote:
   I don't think it's likely to be even workable. Just consider the
   directory entry for a moment - is it going to be marked d or
   [cb]?
 
  It's going to be marked 'd', it's a directory, not a file.

 Are we talking about the same proposal?  The one where I can open
 /dev/dsp and /dev/dsp/ctl? But I can still do 'cat /dev/hda 
 /dev/dsp'?

We already support read/write on directories in the VFS, that's not a
problem.

 It's still a file. If it's not a file anymore, it ain't UNIX.

It's a file with the directory bit set, I believe that's UNIX.

   If it doesn't have the directory bit set, Midnight commander
   won't let me look at it, and I wouldn't blame cd or ls for
   complaining. If it does have the 'd' bit set, I wouldn't blame
   cp, tar, find, or a million other programs if they did the wrong
   thing. They've had 30 years to expect that files aren't
   directories. They're going to act weird.
 
  No problem, it's a directory.
 
   Linus has been kicking this idea around for a couple years now
   and it's still a cute solution looking for a problem. It just
   doesn't belong in UNIX.
 
  Hmm, ok, do we still have any *technical* reasons?

 If you define *technical* to not include design, sure.

Sorry, I don't see what you mean, do you mean the design is
difficult?

 Oh, did I mention unnecessary, solvable in userspace?

That's exactly the point: the generic filesystem allows all the
funny-shaped stuff to be dealt with in user space.  The
filesystem itself is lovely and clean.

BTW, I didn't realize I was reinventing Linus's wheel, this just
seemed very obvious and natural to me.  So I had to believe
there's a technical obstacle somewhere.

Has anyone written code to demonstrate the idea?

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

> Yes, and that is exactly the difference between having a side effect
> on the open(2), versus having the effect as a result of a write(2).
> 
> Unfortunately, there are already some cases where an open
> on a device can have unexpected results.  If you don't want
> to get blocked waiting for the carrier-detect signal from the
> modem when opening a tty device, you had better specify the
> O_NONBLOCK option on the open.  If you don't want this flag
> to be active during the actual I/O operations, then you would
> have to do an fcntl to clear the O_NONBLOCK again after the open.
> 
> So I guess things have already been a bit messy in this
> area for many years, even before linux even existed, and
> in some cases you can't really do anything about it because
> the behaviour is mandated by the applicable standards, like
> POSIX, SUS, or whatever.
> (The blocking of the open on a tty device is explicitly
>  documented in my copy of the X/Open specification.)
> 
> Fortunately, blocking the nightly backup program by making it
> accidentally open a tty is not quite as catastrophic as having
> it start a nuclear war, or format the disks, or something,
> just because a user was playing games with symlinks.

Maybe not *as* catastrophic, but security hole, anyway. User should
not be able to block system backups.

Small demonstration for bugtraq, anyone?
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

> A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> no-op. Breaking that assumption is a Bad Thing(tm).

Then we have a problem. Just opening /dev/ttyS0 currently *has* side
effects (it is visible on modem lines from serial port; it can block
you forever). 

If this assumption is somewhere, we should fix that place... Or fix
serial ports.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

> So I guess things have already been a bit messy in this
> area for many years, even before linux even existed, and
> in some cases you can't really do anything about it because
> the behaviour is mandated by the applicable standards, like
> POSIX, SUS, or whatever.
> (The blocking of the open on a tty device is explicitly
>  documented in my copy of the X/Open specification.)

If X/Open documents security hole, then, I guess, X/Open will have to
be changed.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Mon, 21 May 2001, David Lang wrote:

> what makes you think it's safe to say there's only one floppy drive?

Read as: it doesn't make sense to have per-fd state on a single floppy
device given that there's only one actual hardware instance associated
with it and multiple openers don't make sense. Opening a floppy at
different densities with magic filenames was an example Linus used earlier
in the thread. Surely there can be more than one drive and more than one
serial port.

> On Mon, 21 May 2001, Oliver Xymoron wrote:
>
> > On Sat, 19 May 2001, Alexander Viro wrote:
> >
> > > Let's distinguish between per-fd effects (that's what name in
> > > open(name, flags) is for - you are asking for descriptor and telling
> > > what behaviour do you want for IO on it) and system-wide side effects.
> > >
> > > IMO encoding the former into name is perfectly fine, and no write on
> > > another file can be sanely used for that purpose. For the latter, though,
> > > we need to write commands into files and here your miscdevices (or procfs
> > > files, or /dev/foo/ctl - whatever) is needed.
> >
> > I'm a little skeptical about the necessity of these per-fd effects in the
> > first place - after all, Plan 9 does without them.  There's only one
> > floppy drive, yes? No concurrent users of serial ports? The counter that
> > comes to mind is sound devices supporting multiple opens, but I think
> > esound and friends are a better solution to that problem.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Sat, 19 May 2001, Jeff Garzik wrote:

> Why are LVM and EVMS(competing LVM project) needed at all?
>
> Surely the same can be accomplished with
> * md
> * snapshot blkdev (attached in previous e-mail)
> * giving partitions and blkdevs the ability to grow and shrink
> * giving filesystems the ability to grow and shrink

You can migrate data off disks while the filesystems on top of them are
live. Add disk b, migrate a->b, remove disk a. Perhaps this is intrinsic
in the above somehow but I don't see it.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Sat, 19 May 2001, Alexander Viro wrote:

> Let's distinguish between per-fd effects (that's what name in
> open(name, flags) is for - you are asking for descriptor and telling
> what behaviour do you want for IO on it) and system-wide side effects.
>
> IMO encoding the former into name is perfectly fine, and no write on
> another file can be sanely used for that purpose. For the latter, though,
> we need to write commands into files and here your miscdevices (or procfs
> files, or /dev/foo/ctl - whatever) is needed.

I'm a little skeptical about the necessity of these per-fd effects in the
first place - after all, Plan 9 does without them.  There's only one
floppy drive, yes? No concurrent users of serial ports? The counter that
comes to mind is sound devices supporting multiple opens, but I think
esound and friends are a better solution to that problem.

What I'd like to see:

- An interface for registering an array of related devices (almost always
two: raw and ctl) and their legacy device numbers with a single userspace
callout that does whatever /dev/ creation needs to be done. Thus, naming
and permissions live in user space. No "device node is also a directory"
weirdness which is overkill in the vast majority of cases. No kernel names
or permissions leaking into userspace.

- An unregister_devices that does the same, giving userspace a
chance to persist permissions, etc.

- A userspace program that keeps a mapping of kernel names to /dev/ names,
permissions, etc.

- An autofs hook that does the reverse mapping for running with modules
(possibly calling modprobe directly)

Possible future extension:

- Allow exporting proc as a large collection of devices. Manage /proc in
userspace on a tmpfs.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Mon, 21 May 2001, David Lang wrote:

 what makes you think it's safe to say there's only one floppy drive?

Read as: it doesn't make sense to have per-fd state on a single floppy
device given that there's only one actual hardware instance associated
with it and multiple openers don't make sense. Opening a floppy at
different densities with magic filenames was an example Linus used earlier
in the thread. Surely there can be more than one drive and more than one
serial port.

 On Mon, 21 May 2001, Oliver Xymoron wrote:

  On Sat, 19 May 2001, Alexander Viro wrote:
 
   Let's distinguish between per-fd effects (that's what name in
   open(name, flags) is for - you are asking for descriptor and telling
   what behaviour do you want for IO on it) and system-wide side effects.
  
   IMO encoding the former into name is perfectly fine, and no write on
   another file can be sanely used for that purpose. For the latter, though,
   we need to write commands into files and here your miscdevices (or procfs
   files, or /dev/foo/ctl - whatever) is needed.
 
  I'm a little skeptical about the necessity of these per-fd effects in the
  first place - after all, Plan 9 does without them.  There's only one
  floppy drive, yes? No concurrent users of serial ports? The counter that
  comes to mind is sound devices supporting multiple opens, but I think
  esound and friends are a better solution to that problem.

--
 Love the dolphins, she advised him. Write by W.A.S.T.E..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Sat, 19 May 2001, Jeff Garzik wrote:

 Why are LVM and EVMS(competing LVM project) needed at all?

 Surely the same can be accomplished with
 * md
 * snapshot blkdev (attached in previous e-mail)
 * giving partitions and blkdevs the ability to grow and shrink
 * giving filesystems the ability to grow and shrink

You can migrate data off disks while the filesystems on top of them are
live. Add disk b, migrate a-b, remove disk a. Perhaps this is intrinsic
in the above somehow but I don't see it.

--
 Love the dolphins, she advised him. Write by W.A.S.T.E..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

 So I guess things have already been a bit messy in this
 area for many years, even before linux even existed, and
 in some cases you can't really do anything about it because
 the behaviour is mandated by the applicable standards, like
 POSIX, SUS, or whatever.
 (The blocking of the open on a tty device is explicitly
  documented in my copy of the X/Open specification.)

If X/Open documents security hole, then, I guess, X/Open will have to
be changed.
Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Sat, 19 May 2001, Alexander Viro wrote:

 Let's distinguish between per-fd effects (that's what name in
 open(name, flags) is for - you are asking for descriptor and telling
 what behaviour do you want for IO on it) and system-wide side effects.

 IMO encoding the former into name is perfectly fine, and no write on
 another file can be sanely used for that purpose. For the latter, though,
 we need to write commands into files and here your miscdevices (or procfs
 files, or /dev/foo/ctl - whatever) is needed.

I'm a little skeptical about the necessity of these per-fd effects in the
first place - after all, Plan 9 does without them.  There's only one
floppy drive, yes? No concurrent users of serial ports? The counter that
comes to mind is sound devices supporting multiple opens, but I think
esound and friends are a better solution to that problem.

What I'd like to see:

- An interface for registering an array of related devices (almost always
two: raw and ctl) and their legacy device numbers with a single userspace
callout that does whatever /dev/ creation needs to be done. Thus, naming
and permissions live in user space. No device node is also a directory
weirdness which is overkill in the vast majority of cases. No kernel names
or permissions leaking into userspace.

- An unregister_devices that does the same, giving userspace a
chance to persist permissions, etc.

- A userspace program that keeps a mapping of kernel names to /dev/ names,
permissions, etc.

- An autofs hook that does the reverse mapping for running with modules
(possibly calling modprobe directly)

Possible future extension:

- Allow exporting proc as a large collection of devices. Manage /proc in
userspace on a tmpfs.

--
 Love the dolphins, she advised him. Write by W.A.S.T.E..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

 Yes, and that is exactly the difference between having a side effect
 on the open(2), versus having the effect as a result of a write(2).
 
 Unfortunately, there are already some cases where an open
 on a device can have unexpected results.  If you don't want
 to get blocked waiting for the carrier-detect signal from the
 modem when opening a tty device, you had better specify the
 O_NONBLOCK option on the open.  If you don't want this flag
 to be active during the actual I/O operations, then you would
 have to do an fcntl to clear the O_NONBLOCK again after the open.
 
 So I guess things have already been a bit messy in this
 area for many years, even before linux even existed, and
 in some cases you can't really do anything about it because
 the behaviour is mandated by the applicable standards, like
 POSIX, SUS, or whatever.
 (The blocking of the open on a tty device is explicitly
  documented in my copy of the X/Open specification.)
 
 Fortunately, blocking the nightly backup program by making it
 accidentally open a tty is not quite as catastrophic as having
 it start a nuclear war, or format the disks, or something,
 just because a user was playing games with symlinks.

Maybe not *as* catastrophic, but security hole, anyway. User should
not be able to block system backups.

Small demonstration for bugtraq, anyone?
Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-21 Thread Pavel Machek

Hi!

 A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
 no-op. Breaking that assumption is a Bad Thing(tm).

Then we have a problem. Just opening /dev/ttyS0 currently *has* side
effects (it is visible on modem lines from serial port; it can block
you forever). 

If this assumption is somewhere, we should fix that place... Or fix
serial ports.
Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Here's a dumb question, and I apologize if I am questioning computer
science dogma...

Why are LVM and EVMS(competing LVM project) needed at all?

Surely the same can be accomplished with
* md
* snapshot blkdev (attached in previous e-mail)
* giving partitions and blkdevs the ability to grow and shrink
* giving filesystems the ability to grow and shrink

On-line optimization (defrag, etc) shouldn't be hard once you have the
ability to move blocks and files around, which would come with the
ability to grow and shrink blkdevs and fs's.

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Linus Torvalds wrote:
> There are some strong arguments that we should have filesystem
> "backdoors" for maintenance purposes, including backup.

I think I agree with something Al said over IRC, that fs-level snapshots
are preferred over block level snapshots.

fs-level snapshots should become easy if you have a generic transaction
layer.  The OS spits out file ops, which get processed into a set of fs
transactions.  (remember that fs-level stuff like "change this block
bitmap" is also a transaction, just like the more generic "update this
inode's mtime")

Also, I think there should be generic block allocation strategies that
fs's can use.  Implementing fs-specific strategies such as ext2's
readahead or XFS's delayed allocation is not a solution, IMHO, but
working towards solving the real problem.



> You can, of course, so parts of this on a LVM level, and doing backups
> with "disk snapshots" may be a valid approach. However, even that is
> debatable: there is very little that says that the disk image has to be
> up-to-date at any particular point in time, so even with a disk snapshot
> capability (which is not necessarily reasonable under all circumstances)
> there are arguments for maintenance interfaces.

I've been hacking on the attached, a snapshot block device driver, which
doesn't require LVM at all.  (warning: compiled and updated per outside
review, but very alpha...  do not apply)

The point of the driver is to provide a sync point at snapshot time, at
which all metadata and data is flushed to the block device.

My question... is there a fundamental flaw in this plan?  Ideally when
userspace says "start snapshot", the fsync_dev occurs [a
simplification].  At that point, userspace can safely run dump or tar or
whatever on the virtual snapshot device.

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."

Index: linux_2_4/drivers/block/Config.in
diff -u linux_2_4/drivers/block/Config.in:1.1.1.44 
linux_2_4/drivers/block/Config.in:1.1.1.44.4.1
--- linux_2_4/drivers/block/Config.in:1.1.1.44  Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/Config.in   Wed May 16 15:44:59 2001
@@ -46,4 +46,6 @@
 fi
 dep_bool '  Initial RAM disk (initrd) support' CONFIG_BLK_DEV_INITRD 
$CONFIG_BLK_DEV_RAM
 
+tristate 'Snapshot device support' CONFIG_BLK_DEV_SNAP
+
 endmenu
Index: linux_2_4/drivers/block/Makefile
diff -u linux_2_4/drivers/block/Makefile:1.1.1.46 
linux_2_4/drivers/block/Makefile:1.1.1.46.4.1
--- linux_2_4/drivers/block/Makefile:1.1.1.46   Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/MakefileWed May 16 15:44:59 2001
@@ -31,6 +31,7 @@
 obj-$(CONFIG_BLK_DEV_DAC960)   += DAC960.o
 
 obj-$(CONFIG_BLK_DEV_NBD)  += nbd.o
+obj-$(CONFIG_BLK_DEV_SNAP) += snap.o
 
 subdir-$(CONFIG_PARIDE) += paride
 
Index: linux_2_4/drivers/block/snap.c
diff -u /dev/null linux_2_4/drivers/block/snap.c:1.1.6.10
--- /dev/null   Sat May 19 17:36:30 2001
+++ linux_2_4/drivers/block/snap.c  Thu May 17 11:48:54 2001
@@ -0,0 +1,1055 @@
+/*
+   Copyright 2001 Jeff Garzik <[EMAIL PROTECTED]>
+   Copyright (C) 2000 Jens Axboe <[EMAIL PROTECTED]>
+  
+   May be copied or modified under the terms of the GNU General Public
+   License.  See linux/COPYING for more information.
+  
+   Several ideas and some code taken from Jens Axboe's pktcdvd.c 0.0.2j.
+  
+   To-Do list:
+   * Write support.  It's easy, and might be useful in isolated circumstances.
+   * Convert MAX_SNAPDEVS to a module parameter.
+   * Wrap use of "%" operator, to prepare for 64-bit-sized blockdevs on 
+ 32-bit processors
+  
+ */
+
+#define VERSION_CODE   "v0.5.0-take6  17 May 2001  Jeff Garzik 
+<[EMAIL PROTECTED]>"
+#define MODNAME"snap"
+#define PFXMODNAME ": "
+#define MAX_SNAPDEVS   16 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int *snap_sizes;
+static int *snap_blksize;
+static int *snap_readahead;
+static struct snap_device *snap_devs;
+static int snap_major = -1;
+static spinlock_t snap_lock = SPIN_LOCK_UNLOCKED;
+
+
+/*
+ * a bit of a kludge, but we want to be able to pass source, log,
+ * or snap dev and get the right one.
+ */
+static struct snap_device *snap_find_dev(kdev_t dev)
+{
+   int i, j;
+   struct snap_device *sd;
+
+   spin_lock(_lock);
+
+   for (i = 0; i < MAX_SNAPDEVS; i++) {
+   sd = _devs[i];
+   if ((sd->src.dev == dev) || (sd->snap_dev == dev))
+   goto out;
+   for (j = 0; j < sd->n_logs; j++)
+   if (sd->logs[j].dev == dev)
+   goto out;
+   }
+   sd = NULL;
+
+out:
+   spin_unlock(_lock);
+   return sd;
+}
+
+static request_queue_t *snap_get_queue(kdev_t dev)
+{
+   struct snap_device *sd = 

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Linus Torvalds


On Sat, 19 May 2001, Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Edgar Toernig wrote:
> 
> > That assumption is totally bogus.  Even for regular files you have side
> > effects (atime); for anything else they're unpredictable.
> 
> That means only one thing: safe backups are possible only in single-user
> mode.

There are some strong arguments that we should have filesystem
"backdoors" for maintenance purposes, including backup. 

You can, of course, so parts of this on a LVM level, and doing backups
with "disk snapshots" may be a valid approach. However, even that is
debatable: there is very little that says that the disk image has to be
up-to-date at any particular point in time, so even with a disk snapshot
capability (which is not necessarily reasonable under all circumstances)
there are arguments for maintenance interfaces.

Thinks like "lazy fsck" (ie fsck while already running the filesystem) and
defragmentation simply is not feasible on a LVM level.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sun, 20 May 2001, Edgar Toernig wrote:

> That assumption is totally bogus.  Even for regular files you have side
> effects (atime); for anything else they're unpredictable.

That means only one thing: safe backups are possible only in single-user
mode. For values of safe being "not triggering these side effects on
arbitrary files outside of the area you are trying to backup". You can't
pin an object down until you open it. You can check that it's the same
object you think it is, but that will require fstat(). I.e. opening the
thing.

If all effects of open() either disappear on close() or are something you
don't care about - fine. Otherwise you have a problem. On any UNIX.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Edgar Toernig

nitpicking: a system call without side effects would be pretty useless.

Alexander Viro wrote:
> A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> no-op. Breaking that assumption is a Bad Thing(tm).

That assumption is totally bogus.  Even for regular files you have side
effects (atime); for anything else they're unpredictable.

Ciao, ET.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Jeff Garzik wrote:

> Are we talking about device arguments just for chrdevs and blkdevs? 
> (ie. drivers)  or for regular files too?

Let's distinguish between per-fd effects (that's what name in open(name, flags)
is for - you are asking for descriptor and telling what behaviour do you
want for IO on it) and system-wide side effects.

IMO encoding the former into name is perfectly fine, and no write on
another file can be sanely used for that purpose. For the latter, though,
we need to write commands into files and here your miscdevices (or procfs
files, or /dev/foo/ctl - whatever) is needed.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Jeff Garzik wrote:
> Notice also a "metadata miscdev" solves the problem of passing options
> on open -- just pass those options to the miscdev before you open it...

to be more clear, "it" == the data device, not the metadata miscdev

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Are we talking about device arguments just for chrdevs and blkdevs? 
(ie. drivers)  or for regular files too?

Speaking about drivers specifically, a controlling miscdev, one per
device or one per group of devices depending on your needs, is a much
more clean solution for passing ioctl-type data.  You are free to come
up with whatever method of communication with the driver is most
efficient for your needs -- without perverting open(2).

Notice also a "metadata miscdev" solves the problem of passing options
on open -- just pass those options to the miscdev before you open it...

metadata miscdevs are a clean solution to what procfs hacks and ioctls
are trying to accomplish.

Jeff


-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Matthew Wilcox wrote:

> On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
> > clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
> > unopened descriptors. I.e. no IO until you open it (turning the thing into
> > opened one), but we can do lookups (move to child), we can clone and
> > kill them and we can stat them.
> 
> Those who would like a more detailed explanation can find one at
> http://plan9.bell-labs.com/sys/man/5/INDEX.html

Umm... Yes, it's an allusion to 9P, but no, I'm not serious about exporting
that to userland.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Linus Torvalds


On Sat, 19 May 2001, Alexander Viro wrote:
>
>   Folks, before you get all excited about cramming side effects into
> open(2), consider the following case:

Your argument is stupid, imnsho.

Side-effects are perfectly fine if they are _local_ to the file
descriptor. Your example is contrieved and idiotic.

Filename extensions would not replace ioctl's. But they are wonderful ways
to avoid unnecessary binary name-spaces, like the ones we have with
"callout" TTY names, and the one that the fb people had.

For example, do a "ls -l /dev/fd0*", and ponder. Also, realize that we
have these hard-coded names in _addition_ to the magic ioctl to set even
more parameters. These are all stupid and bad, and it would have been a
_lot_ cleaner to be able to do

open("/dev/fd0/H1440", O_RDWR)..

or

open("/dev/fd0/HD,18,85", O_RDWD)

to open special non-standard high-density modes.

We already did this, in a very limited and stupid way, by encoding the
minor number and generating a standard naming scheme. We can do the same
thing in a _much_ more generic way by just realizing that we wanted the
open to be name-based in the first place.

These are _not_ side effects. They are very much naming conventions. If I
want to open a the floppy in one of the special extended modes, it makes a
LOT more sense to just open it with the naming, than to open a "generic"
floppy device only to them use a magic and very unreadable ioctl to set
the mode of the device.

In short, I don't buy your arguments for one single second.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Matthew Wilcox

On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
> clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
> unopened descriptors. I.e. no IO until you open it (turning the thing into
> opened one), but we can do lookups (move to child), we can clone and
> kill them and we can stat them.

Those who would like a more detailed explanation can find one at
http://plan9.bell-labs.com/sys/man/5/INDEX.html

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer

>> Opening device files often has interesting side effects.

> Too bad. They can be triggered by similar races between attacker
> changing the type of object (file<->symlink) and backup.

Yes. This is a well-known security problem.
Doing
stat("file", );
if (action desired) {
action("file");
}
is no good because there is a race.
But doing
fd = open("file", flags);
fstat(fd, );
if (action desired) {
f_action(fd);
}
is no good either because the open() has unknown side effects.
It helps to add flags like O_NONBLOCK and perhaps O_NOCTTY,
but that is not quite good enough.

One would like to have a version of the open() call that was
guaranteed free of side effects, and gave a fd only -
perhaps for stat(), perhaps for ioctl().
This guarantee could perhaps be obtained by omitting the
f->f_op->open(inode,f);
call in dentry_open() when the open call is
open("file", O_FDONLY);
Of course it may be that we afterwards decide that fd must
be used, and then it needs upgrading:
fd = f_open(fd, O_RDWR);

Andries

[Such a construction allows various cleanups.
But no doubt it has problems that I have not yet thought of.]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Abramo Bagnara wrote:

> Can't this easily avoided if the needed action is not
> 
> < /dev/zero/start_nuclear_war 
> or
> > /dev/zero/start_nuclear_war
> 
> but
> 
> echo "I'm evil" > /dev/zero/start_nuclear_war

Sure. And that's the right thing to do (not the implied action, that is -
_that_ would be too messy).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Abramo Bagnara

Alexander Viro wrote:
> 
> Folks, before you get all excited about cramming side effects into
> open(2), consider the following case:
> 
> 1) opening "/dev/zero/start_nuclear_war" has a certain side effect.
> 
> 2) Local user does the following:
> ln -sf /dev/zero/start_nuclear_war bar
> while true; do
> mkdir foo
> rmdir foo
> ln -sf bar foo
> rm foo
> done
> 
> 3) Comes the night and root runs (from crontab) updatedb(8). Said beast
> includes find(1). With sufficiently bad timing find _will_ be tricked
> into attempt to open foo. It will honestly lstat() it, all right. But
> there's no way to make sure that subsequent open() on the found directory
> will get the same object.
> 
> 4) Side effect happens...
> 
> Similar scenarios can be found for other programs run by/as root, but I
> think that the point is obvious - side effects on open() are not a good
> idea. Yes, we can play with checking for O_DIRECTORY, yodda, yodda, but
> I wouldn't bet a dime on security of a system with such side effects.
> A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> no-op. Breaking that assumption is a Bad Thing(tm).

Can't this easily avoided if the needed action is not

< /dev/zero/start_nuclear_war 
or
> /dev/zero/start_nuclear_war

but

echo "I'm evil" > /dev/zero/start_nuclear_war

?

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001 [EMAIL PROTECTED] wrote:

> > A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
> > is a no-op. Breaking that assumption is a Bad Thing(tm).
> 
> Also here I would like to agree. Unfortunately this is false.
> Opening device files often has interesting side effects.

Too bad. They can be triggered by similar races between attacker
changing the type of object (file<->symlink) and backup.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer

Alexander Viro writes:

> Folks, before you get all excited about cramming side effects
> into open(2), consider ...

I agree completely.

> A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
> is a no-op. Breaking that assumption is a Bad Thing(tm).

Also here I would like to agree. Unfortunately this is false.
Opening device files often has interesting side effects.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

Folks, before you get all excited about cramming side effects into
open(2), consider the following case:

1) opening "/dev/zero/start_nuclear_war" has a certain side effect.

2) Local user does the following:
ln -sf /dev/zero/start_nuclear_war bar
while true; do
mkdir foo
rmdir foo
ln -sf bar foo
rm foo
done

3) Comes the night and root runs (from crontab) updatedb(8). Said beast
includes find(1). With sufficiently bad timing find _will_ be tricked
into attempt to open foo. It will honestly lstat() it, all right. But
there's no way to make sure that subsequent open() on the found directory
will get the same object.

4) Side effect happens...

Similar scenarios can be found for other programs run by/as root, but I
think that the point is obvious - side effects on open() are not a good
idea. Yes, we can play with checking for O_DIRECTORY, yodda, yodda, but
I wouldn't bet a dime on security of a system with such side effects.
A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
no-op. Breaking that assumption is a Bad Thing(tm).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Here's a dumb question, and I apologize if I am questioning computer
science dogma...

Why are LVM and EVMS(competing LVM project) needed at all?

Surely the same can be accomplished with
* md
* snapshot blkdev (attached in previous e-mail)
* giving partitions and blkdevs the ability to grow and shrink
* giving filesystems the ability to grow and shrink

On-line optimization (defrag, etc) shouldn't be hard once you have the
ability to move blocks and files around, which would come with the
ability to grow and shrink blkdevs and fs's.

-- 
Jeff Garzik  | Do you have to make light of everything?!
Building 1024| I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sun, 20 May 2001, Edgar Toernig wrote:

 That assumption is totally bogus.  Even for regular files you have side
 effects (atime); for anything else they're unpredictable.

That means only one thing: safe backups are possible only in single-user
mode. For values of safe being not triggering these side effects on
arbitrary files outside of the area you are trying to backup. You can't
pin an object down until you open it. You can check that it's the same
object you think it is, but that will require fstat(). I.e. opening the
thing.

If all effects of open() either disappear on close() or are something you
don't care about - fine. Otherwise you have a problem. On any UNIX.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer

Alexander Viro writes:

 Folks, before you get all excited about cramming side effects
 into open(2), consider ...

I agree completely.

 A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
 is a no-op. Breaking that assumption is a Bad Thing(tm).

Also here I would like to agree. Unfortunately this is false.
Opening device files often has interesting side effects.

Andries
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001 [EMAIL PROTECTED] wrote:

  A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
  is a no-op. Breaking that assumption is a Bad Thing(tm).
 
 Also here I would like to agree. Unfortunately this is false.
 Opening device files often has interesting side effects.

Too bad. They can be triggered by similar races between attacker
changing the type of object (file-symlink) and backup.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Linus Torvalds wrote:
 There are some strong arguments that we should have filesystem
 backdoors for maintenance purposes, including backup.

I think I agree with something Al said over IRC, that fs-level snapshots
are preferred over block level snapshots.

fs-level snapshots should become easy if you have a generic transaction
layer.  The OS spits out file ops, which get processed into a set of fs
transactions.  (remember that fs-level stuff like change this block
bitmap is also a transaction, just like the more generic update this
inode's mtime)

Also, I think there should be generic block allocation strategies that
fs's can use.  Implementing fs-specific strategies such as ext2's
readahead or XFS's delayed allocation is not a solution, IMHO, but
working towards solving the real problem.
/ramble


 You can, of course, so parts of this on a LVM level, and doing backups
 with disk snapshots may be a valid approach. However, even that is
 debatable: there is very little that says that the disk image has to be
 up-to-date at any particular point in time, so even with a disk snapshot
 capability (which is not necessarily reasonable under all circumstances)
 there are arguments for maintenance interfaces.

I've been hacking on the attached, a snapshot block device driver, which
doesn't require LVM at all.  (warning: compiled and updated per outside
review, but very alpha...  do not apply)

The point of the driver is to provide a sync point at snapshot time, at
which all metadata and data is flushed to the block device.

My question... is there a fundamental flaw in this plan?  Ideally when
userspace says start snapshot, the fsync_dev occurs [a
simplification].  At that point, userspace can safely run dump or tar or
whatever on the virtual snapshot device.

-- 
Jeff Garzik  | Do you have to make light of everything?!
Building 1024| I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes.

Index: linux_2_4/drivers/block/Config.in
diff -u linux_2_4/drivers/block/Config.in:1.1.1.44 
linux_2_4/drivers/block/Config.in:1.1.1.44.4.1
--- linux_2_4/drivers/block/Config.in:1.1.1.44  Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/Config.in   Wed May 16 15:44:59 2001
@@ -46,4 +46,6 @@
 fi
 dep_bool '  Initial RAM disk (initrd) support' CONFIG_BLK_DEV_INITRD 
$CONFIG_BLK_DEV_RAM
 
+tristate 'Snapshot device support' CONFIG_BLK_DEV_SNAP
+
 endmenu
Index: linux_2_4/drivers/block/Makefile
diff -u linux_2_4/drivers/block/Makefile:1.1.1.46 
linux_2_4/drivers/block/Makefile:1.1.1.46.4.1
--- linux_2_4/drivers/block/Makefile:1.1.1.46   Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/MakefileWed May 16 15:44:59 2001
@@ -31,6 +31,7 @@
 obj-$(CONFIG_BLK_DEV_DAC960)   += DAC960.o
 
 obj-$(CONFIG_BLK_DEV_NBD)  += nbd.o
+obj-$(CONFIG_BLK_DEV_SNAP) += snap.o
 
 subdir-$(CONFIG_PARIDE) += paride
 
Index: linux_2_4/drivers/block/snap.c
diff -u /dev/null linux_2_4/drivers/block/snap.c:1.1.6.10
--- /dev/null   Sat May 19 17:36:30 2001
+++ linux_2_4/drivers/block/snap.c  Thu May 17 11:48:54 2001
@@ -0,0 +1,1055 @@
+/*
+   Copyright 2001 Jeff Garzik [EMAIL PROTECTED]
+   Copyright (C) 2000 Jens Axboe [EMAIL PROTECTED]
+  
+   May be copied or modified under the terms of the GNU General Public
+   License.  See linux/COPYING for more information.
+  
+   Several ideas and some code taken from Jens Axboe's pktcdvd.c 0.0.2j.
+  
+   To-Do list:
+   * Write support.  It's easy, and might be useful in isolated circumstances.
+   * Convert MAX_SNAPDEVS to a module parameter.
+   * Wrap use of % operator, to prepare for 64-bit-sized blockdevs on 
+ 32-bit processors
+  
+ */
+
+#define VERSION_CODE   v0.5.0-take6  17 May 2001  Jeff Garzik 
+[EMAIL PROTECTED]
+#define MODNAMEsnap
+#define PFXMODNAME : 
+#define MAX_SNAPDEVS   16 
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/slab.h
+#include linux/errno.h
+#include linux/spinlock.h
+#include linux/interrupt.h
+#include linux/file.h
+#include linux/blk.h
+#include linux/blkpg.h
+#include linux/init.h
+#include linux/snap.h
+#include asm/uaccess.h
+
+static int *snap_sizes;
+static int *snap_blksize;
+static int *snap_readahead;
+static struct snap_device *snap_devs;
+static int snap_major = -1;
+static spinlock_t snap_lock = SPIN_LOCK_UNLOCKED;
+
+
+/*
+ * a bit of a kludge, but we want to be able to pass source, log,
+ * or snap dev and get the right one.
+ */
+static struct snap_device *snap_find_dev(kdev_t dev)
+{
+   int i, j;
+   struct snap_device *sd;
+
+   spin_lock(snap_lock);
+
+   for (i = 0; i  MAX_SNAPDEVS; i++) {
+   sd = snap_devs[i];
+   if ((sd-src.dev == dev) || (sd-snap_dev == dev))
+   goto out;
+   for (j = 0; j  sd-n_logs; j++)
+   if (sd-logs[j].dev == dev)
+   goto out;
+   }
+   sd = NULL;
+
+out:
+   

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Abramo Bagnara

Alexander Viro wrote:
 
 Folks, before you get all excited about cramming side effects into
 open(2), consider the following case:
 
 1) opening /dev/zero/start_nuclear_war has a certain side effect.
 
 2) Local user does the following:
 ln -sf /dev/zero/start_nuclear_war bar
 while true; do
 mkdir foo
 rmdir foo
 ln -sf bar foo
 rm foo
 done
 
 3) Comes the night and root runs (from crontab) updatedb(8). Said beast
 includes find(1). With sufficiently bad timing find _will_ be tricked
 into attempt to open foo. It will honestly lstat() it, all right. But
 there's no way to make sure that subsequent open() on the found directory
 will get the same object.
 
 4) Side effect happens...
 
 Similar scenarios can be found for other programs run by/as root, but I
 think that the point is obvious - side effects on open() are not a good
 idea. Yes, we can play with checking for O_DIRECTORY, yodda, yodda, but
 I wouldn't bet a dime on security of a system with such side effects.
 A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
 no-op. Breaking that assumption is a Bad Thing(tm).

Can't this easily avoided if the needed action is not

 /dev/zero/start_nuclear_war 
or
 /dev/zero/start_nuclear_war

but

echo I'm evil  /dev/zero/start_nuclear_war

?

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Jeff Garzik wrote:
 Notice also a metadata miscdev solves the problem of passing options
 on open -- just pass those options to the miscdev before you open it...

to be more clear, it == the data device, not the metadata miscdev

-- 
Jeff Garzik  | Do you have to make light of everything?!
Building 1024| I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Matthew Wilcox wrote:

 On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
  clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
  unopened descriptors. I.e. no IO until you open it (turning the thing into
  opened one), but we can do lookups (move to child), we can clone and
  kill them and we can stat them.
 
 Those who would like a more detailed explanation can find one at
 http://plan9.bell-labs.com/sys/man/5/INDEX.html

Umm... Yes, it's an allusion to 9P, but no, I'm not serious about exporting
that to userland.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Linus Torvalds


On Sat, 19 May 2001, Alexander Viro wrote:
 
 On Sun, 20 May 2001, Edgar Toernig wrote:
 
  That assumption is totally bogus.  Even for regular files you have side
  effects (atime); for anything else they're unpredictable.
 
 That means only one thing: safe backups are possible only in single-user
 mode.

There are some strong arguments that we should have filesystem
backdoors for maintenance purposes, including backup. 

You can, of course, so parts of this on a LVM level, and doing backups
with disk snapshots may be a valid approach. However, even that is
debatable: there is very little that says that the disk image has to be
up-to-date at any particular point in time, so even with a disk snapshot
capability (which is not necessarily reasonable under all circumstances)
there are arguments for maintenance interfaces.

Thinks like lazy fsck (ie fsck while already running the filesystem) and
defragmentation simply is not feasible on a LVM level.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer

 Opening device files often has interesting side effects.

 Too bad. They can be triggered by similar races between attacker
 changing the type of object (file-symlink) and backup.

Yes. This is a well-known security problem.
Doing
stat(file, s);
if (action desired) {
action(file);
}
is no good because there is a race.
But doing
fd = open(file, flags);
fstat(fd, s);
if (action desired) {
f_action(fd);
}
is no good either because the open() has unknown side effects.
It helps to add flags like O_NONBLOCK and perhaps O_NOCTTY,
but that is not quite good enough.

One would like to have a version of the open() call that was
guaranteed free of side effects, and gave a fd only -
perhaps for stat(), perhaps for ioctl().
This guarantee could perhaps be obtained by omitting the
f-f_op-open(inode,f);
call in dentry_open() when the open call is
open(file, O_FDONLY);
Of course it may be that we afterwards decide that fd must
be used, and then it needs upgrading:
fd = f_open(fd, O_RDWR);

Andries

[Such a construction allows various cleanups.
But no doubt it has problems that I have not yet thought of.]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Linus Torvalds


On Sat, 19 May 2001, Alexander Viro wrote:

   Folks, before you get all excited about cramming side effects into
 open(2), consider the following case:

Your argument is stupid, imnsho.

Side-effects are perfectly fine if they are _local_ to the file
descriptor. Your example is contrieved and idiotic.

Filename extensions would not replace ioctl's. But they are wonderful ways
to avoid unnecessary binary name-spaces, like the ones we have with
callout TTY names, and the one that the fb people had.

For example, do a ls -l /dev/fd0*, and ponder. Also, realize that we
have these hard-coded names in _addition_ to the magic ioctl to set even
more parameters. These are all stupid and bad, and it would have been a
_lot_ cleaner to be able to do

open(/dev/fd0/H1440, O_RDWR)..

or

open(/dev/fd0/HD,18,85, O_RDWD)

to open special non-standard high-density modes.

We already did this, in a very limited and stupid way, by encoding the
minor number and generating a standard naming scheme. We can do the same
thing in a _much_ more generic way by just realizing that we wanted the
open to be name-based in the first place.

These are _not_ side effects. They are very much naming conventions. If I
want to open a the floppy in one of the special extended modes, it makes a
LOT more sense to just open it with the naming, than to open a generic
floppy device only to them use a magic and very unreadable ioctl to set
the mode of the device.

In short, I don't buy your arguments for one single second.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik

Are we talking about device arguments just for chrdevs and blkdevs? 
(ie. drivers)  or for regular files too?

Speaking about drivers specifically, a controlling miscdev, one per
device or one per group of devices depending on your needs, is a much
more clean solution for passing ioctl-type data.  You are free to come
up with whatever method of communication with the driver is most
efficient for your needs -- without perverting open(2).

Notice also a metadata miscdev solves the problem of passing options
on open -- just pass those options to the miscdev before you open it...

metadata miscdevs are a clean solution to what procfs hacks and ioctls
are trying to accomplish.

Jeff


-- 
Jeff Garzik  | Do you have to make light of everything?!
Building 1024| I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Edgar Toernig

nitpicking: a system call without side effects would be pretty useless.

Alexander Viro wrote:
 A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
 no-op. Breaking that assumption is a Bad Thing(tm).

That assumption is totally bogus.  Even for regular files you have side
effects (atime); for anything else they're unpredictable.

Ciao, ET.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

Folks, before you get all excited about cramming side effects into
open(2), consider the following case:

1) opening /dev/zero/start_nuclear_war has a certain side effect.

2) Local user does the following:
ln -sf /dev/zero/start_nuclear_war bar
while true; do
mkdir foo
rmdir foo
ln -sf bar foo
rm foo
done

3) Comes the night and root runs (from crontab) updatedb(8). Said beast
includes find(1). With sufficiently bad timing find _will_ be tricked
into attempt to open foo. It will honestly lstat() it, all right. But
there's no way to make sure that subsequent open() on the found directory
will get the same object.

4) Side effect happens...

Similar scenarios can be found for other programs run by/as root, but I
think that the point is obvious - side effects on open() are not a good
idea. Yes, we can play with checking for O_DIRECTORY, yodda, yodda, but
I wouldn't bet a dime on security of a system with such side effects.
A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
no-op. Breaking that assumption is a Bad Thing(tm).

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Abramo Bagnara wrote:

 Can't this easily avoided if the needed action is not
 
  /dev/zero/start_nuclear_war 
 or
  /dev/zero/start_nuclear_war
 
 but
 
 echo I'm evil  /dev/zero/start_nuclear_war

Sure. And that's the right thing to do (not the implied action, that is -
_that_ would be too messy).

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Matthew Wilcox

On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
 clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
 unopened descriptors. I.e. no IO until you open it (turning the thing into
 opened one), but we can do lookups (move to child), we can clone and
 kill them and we can stat them.

Those who would like a more detailed explanation can find one at
http://plan9.bell-labs.com/sys/man/5/INDEX.html

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro



On Sat, 19 May 2001, Jeff Garzik wrote:

 Are we talking about device arguments just for chrdevs and blkdevs? 
 (ie. drivers)  or for regular files too?

Let's distinguish between per-fd effects (that's what name in open(name, flags)
is for - you are asking for descriptor and telling what behaviour do you
want for IO on it) and system-wide side effects.

IMO encoding the former into name is perfectly fine, and no write on
another file can be sanely used for that purpose. For the latter, though,
we need to write commands into files and here your miscdevices (or procfs
files, or /dev/foo/ctl - whatever) is needed.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/