Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Eric W. Biederman

Mikulas Patocka <[EMAIL PROTECTED]> writes:

> Imagine that there is specification of mark_buffer_dirty. That
> specification says that
>   1. it may not block
>   2. it may block
> 
> In case 1. implementators wouldn't change it to block in stable kernel
>   relese because they don't want to violate the specification.
> 
> In case 2. implementators of ext2 wouldn't assume that it doesn't block
>   even if it doesn't in current implementation.

Whenever the question has been asked the answer is always assume anything
in the kernel outside of the current function blocks.  

> In both cases, the bug wouldn't be created.

Nope.  It looks like someone made a mistake in ext2...

> 
> Anytime you change implementation of syscalls, you gotta check all
> applications that use them ;-) Luckily not - because there is
> specification and you can check that syscalls conform to the
> specification, not apps. 

Not normally.  The rule is that syscall don't change period.  The
internal kernel interface is different.  It is allowed to change.

As for syscall changing auditing most apps did happen when the LFS
spec was put together.  So you would have an implementation that would
keep most apps from failing on large files.

> > > Saying "code is the specification" is not good.
> > 
> > I'm not arguing against documentation.  That is dumb.  But the code is
> > ALWAYS canonical.  Not docs.
> 
> Let's see:

> Who is right? If there is no specification

Hmm.  The developers should get together and pow wow when the problem
is noticed.  When it is finally talked out about how it should happen
then the code should get fixed accordingly.

It isn't about right and wrong it is about working code.  Not that
documenting things doesn't help.  And 2.4 is going in that direction...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Mikulas Patocka

> One of these things must happen:
> 
> a. follow the specification, even if that makes code slow and contorted
> b. change the specification
> c. ignore the specification
> d. get rid of the specification
> 
> Option "a" will not be accepted around here. Sorry.

It should be followed in stable releases. (and usually is - except for few
cases - and except that there is no specification, just unwritten rules).

> The best you can
> hope for is option "b". Since that is hard work (want to help?) we
> often end up not using a specification... hopefully by just not
> having one, instead of by ignoring one.


> > Now implementators of TCP will say: that driver is buggy. Everybody should
> > set state=TASK_RUNNING before calling schedule to yield the process. 
> > 
> > Implementators of driver will say: TCP is buggy - no one should call my
> > driver in TASK_[UN]INTERRUPTIBLE state.
> > 
> > Who is right? If there is no specification
> 
> The driver is buggy, unless the TCP maintainer can be convinced
> that TCP is buggy. TCP is a big chunk of code that most people use,
> while the driver is not so huge or critical.
> 
> The TCP maintainers do not seem to be sadistic bastards hell-bent on
> breaking your drivers. API changes usually have a good reason.

Why should block device developers read TCP/IP code? And only after
reading significant amount of it they realize that they can be called in
TASK_INTERRUPTIBLE state. 

They will most likely read other block drivers, find using schedule
without setting state and use it also that way. 

The only way to tell developers to always set state before using schedule
is to write it to specification.

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Albert D. Cahalan

Mikulas Patocka writes:

> Imagine that there is specification of mark_buffer_dirty. That
> specification says that
>   1. it may not block
>   2. it may block
> 
> In case 1. implementators wouldn't change it to block in stable kernel
>   relese because they don't want to violate the specification.

One of these things must happen:

a. follow the specification, even if that makes code slow and contorted
b. change the specification
c. ignore the specification
d. get rid of the specification

Option "a" will not be accepted around here. Sorry. The best you can
hope for is option "b". Since that is hard work (want to help?) we
often end up not using a specification... hopefully by just not
having one, instead of by ignoring one.

Not saying it doesn't suck to have things undocumented, but at least
you don't have to reverse-engineer a multi-megabyte binary kernel to
find out what is going on.

>> Anytime you change implementation, you gotta check all drivers that use
>> them.  I know, I'm one of the grunts that does such reviews and changes.
>
> Anytime you change implementation of syscalls, you gotta check all
> applications that use them ;-) Luckily not - because there is
> specification and you can check that syscalls conform to the
> specification, not apps. 

Syscalls are more stable, but they may be changed after many years
of a transition period. The C library hides some of this from users.

> Now implementators of TCP will say: that driver is buggy. Everybody should
> set state=TASK_RUNNING before calling schedule to yield the process. 
> 
> Implementators of driver will say: TCP is buggy - no one should call my
> driver in TASK_[UN]INTERRUPTIBLE state.
> 
> Who is right? If there is no specification

The driver is buggy, unless the TCP maintainer can be convinced
that TCP is buggy. TCP is a big chunk of code that most people use,
while the driver is not so huge or critical.

The TCP maintainers do not seem to be sadistic bastards hell-bent on
breaking your drivers. API changes usually have a good reason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Mikulas Patocka

> > > > I suspect part of the problem with commercial driver support on Linux is that
> > > > the Linux driver API (such as it is) is relatively poorly documented
> > > 
> > > In-kernel documentation, agreed.
> > > 
> > > _Linux Device Drivers_ is a good reference for 2.2 and below.
> > 
> > And do implementators of generic kernel functions and developers of device
> > drivers respect it? And how can they respect it if it's a commercial book?
> 
> _Linux Device Drivers_ documents the 2.2 (and previous) API, and
> thus refutes the argument that the kernel API is poorly documented.
> Since the publication of the book -succeeds- the publication of the
> APIs, your questions are not applicable.

What does it say about mark_buffer_dirty blocking or schedule and
TASK_[UN]INTERRUPTIBLE issues? If it says nothing, it is bad
documentation. If it says something, kernel developers do not respect it
and it is useless documentation...

> > > > and seems
> > > > to change almost on a week-by-week basis anyway. I've done my share of chasing
> > > > the current kernel revision with drivers that aren't part of the kernel tree:
> > > > by the time you update the driver to work with the current kernel revision,
> > > > there's a new one out, and the driver doesn't compile with it.
> > > 
> > > This is entirely in your imagination.  Driver APIs are stable across the
> > > stable series of kernels: 2.0.0 through 2.0.38, 2.2.0 through 2.2.18,
> > > 2.4.0 through whatever.
> > 
> > No true. Do you remember for example the mark_buffer_dirty change in some
> > 2.2.x that triggered ext2 directory corruption? (mark_buffer_dirty was
> > changed so that it could block). 
> > 
> > Another example of bug that comes from the lack of specification is
> > calling of get_free_pages by non-running processes that caused lockups on
> > all kernels < 2.2.15. And it is still not cleaned up - see tcp_recvmsg(). 
> > 
> > Having documentation could prevent this kind of bugs.
> 
> Hardly.

Imagine that there is specification of mark_buffer_dirty. That
specification says that
1. it may not block
2. it may block

In case 1. implementators wouldn't change it to block in stable kernel
relese because they don't want to violate the specification.

In case 2. implementators of ext2 wouldn't assume that it doesn't block
even if it doesn't in current implementation.

In both cases, the bug wouldn't be created.

> No documentation is often -better- than bad documentation.

Of course. But good documentation is better than no documentation :-)

> > You don't need too
> > long texts, just a brief description: "this function may be called from
> > process/bh/interrupt context, it may/may not block, it may/may not be
> > called in TASK_[UN]INTERURPTIBLE state, it may take these locks."
> > 
> > With documentation developers would be able to change implementation of
> > kernel functions without the need to recheck all drivers that use them. 
> 
> Anytime you change implementation, you gotta check all drivers that use
> them.  I know, I'm one of the grunts that does such reviews and changes.

Anytime you change implementation of syscalls, you gotta check all
applications that use them ;-) Luckily not - because there is
specification and you can check that syscalls conform to the
specification, not apps. 

> > Saying "code is the specification" is not good.
> 
> I'm not arguing against documentation.  That is dumb.  But the code is
> ALWAYS canonical.  Not docs.

Let's see:

There are parts of code (1) that set state to TASK_[UN]INTERRUPTIBLE and
then call some other complex functions, like page fault handlers. (for
example tcp in 2.2)

There are parts of code (2) that call schedule to yield the process
assuming that the state is TASK_RUNNING. (including some drivers) 

Sooner or later will happen, that subroutine called from part (1) get
somehow to part (2) and the process locks up.


Now implementators of TCP will say: that driver is buggy. Everybody should
set state=TASK_RUNNING before calling schedule to yield the process. 

Implementators of driver will say: TCP is buggy - no one should call my
driver in TASK_[UN]INTERRUPTIBLE state.

Who is right? If there is no specification

Mikulas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Mikulas Patocka

I suspect part of the problem with commercial driver support on Linux is that
the Linux driver API (such as it is) is relatively poorly documented
   
   In-kernel documentation, agreed.
   
   _Linux Device Drivers_ is a good reference for 2.2 and below.
  
  And do implementators of generic kernel functions and developers of device
  drivers respect it? And how can they respect it if it's a commercial book?
 
 _Linux Device Drivers_ documents the 2.2 (and previous) API, and
 thus refutes the argument that the kernel API is poorly documented.
 Since the publication of the book -succeeds- the publication of the
 APIs, your questions are not applicable.

What does it say about mark_buffer_dirty blocking or schedule and
TASK_[UN]INTERRUPTIBLE issues? If it says nothing, it is bad
documentation. If it says something, kernel developers do not respect it
and it is useless documentation...

and seems
to change almost on a week-by-week basis anyway. I've done my share of chasing
the current kernel revision with drivers that aren't part of the kernel tree:
by the time you update the driver to work with the current kernel revision,
there's a new one out, and the driver doesn't compile with it.
   
   This is entirely in your imagination.  Driver APIs are stable across the
   stable series of kernels: 2.0.0 through 2.0.38, 2.2.0 through 2.2.18,
   2.4.0 through whatever.
  
  No true. Do you remember for example the mark_buffer_dirty change in some
  2.2.x that triggered ext2 directory corruption? (mark_buffer_dirty was
  changed so that it could block). 
  
  Another example of bug that comes from the lack of specification is
  calling of get_free_pages by non-running processes that caused lockups on
  all kernels  2.2.15. And it is still not cleaned up - see tcp_recvmsg(). 
  
  Having documentation could prevent this kind of bugs.
 
 Hardly.

Imagine that there is specification of mark_buffer_dirty. That
specification says that
1. it may not block
2. it may block

In case 1. implementators wouldn't change it to block in stable kernel
relese because they don't want to violate the specification.

In case 2. implementators of ext2 wouldn't assume that it doesn't block
even if it doesn't in current implementation.

In both cases, the bug wouldn't be created.

 No documentation is often -better- than bad documentation.

Of course. But good documentation is better than no documentation :-)

  You don't need too
  long texts, just a brief description: "this function may be called from
  process/bh/interrupt context, it may/may not block, it may/may not be
  called in TASK_[UN]INTERURPTIBLE state, it may take these locks."
  
  With documentation developers would be able to change implementation of
  kernel functions without the need to recheck all drivers that use them. 
 
 Anytime you change implementation, you gotta check all drivers that use
 them.  I know, I'm one of the grunts that does such reviews and changes.

Anytime you change implementation of syscalls, you gotta check all
applications that use them ;-) Luckily not - because there is
specification and you can check that syscalls conform to the
specification, not apps. 

  Saying "code is the specification" is not good.
 
 I'm not arguing against documentation.  That is dumb.  But the code is
 ALWAYS canonical.  Not docs.

Let's see:

There are parts of code (1) that set state to TASK_[UN]INTERRUPTIBLE and
then call some other complex functions, like page fault handlers. (for
example tcp in 2.2)

There are parts of code (2) that call schedule to yield the process
assuming that the state is TASK_RUNNING. (including some drivers) 

Sooner or later will happen, that subroutine called from part (1) get
somehow to part (2) and the process locks up.


Now implementators of TCP will say: that driver is buggy. Everybody should
set state=TASK_RUNNING before calling schedule to yield the process. 

Implementators of driver will say: TCP is buggy - no one should call my
driver in TASK_[UN]INTERRUPTIBLE state.

Who is right? If there is no specification

Mikulas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Albert D. Cahalan

Mikulas Patocka writes:

 Imagine that there is specification of mark_buffer_dirty. That
 specification says that
   1. it may not block
   2. it may block
 
 In case 1. implementators wouldn't change it to block in stable kernel
   relese because they don't want to violate the specification.

One of these things must happen:

a. follow the specification, even if that makes code slow and contorted
b. change the specification
c. ignore the specification
d. get rid of the specification

Option "a" will not be accepted around here. Sorry. The best you can
hope for is option "b". Since that is hard work (want to help?) we
often end up not using a specification... hopefully by just not
having one, instead of by ignoring one.

Not saying it doesn't suck to have things undocumented, but at least
you don't have to reverse-engineer a multi-megabyte binary kernel to
find out what is going on.

 Anytime you change implementation, you gotta check all drivers that use
 them.  I know, I'm one of the grunts that does such reviews and changes.

 Anytime you change implementation of syscalls, you gotta check all
 applications that use them ;-) Luckily not - because there is
 specification and you can check that syscalls conform to the
 specification, not apps. 

Syscalls are more stable, but they may be changed after many years
of a transition period. The C library hides some of this from users.

 Now implementators of TCP will say: that driver is buggy. Everybody should
 set state=TASK_RUNNING before calling schedule to yield the process. 
 
 Implementators of driver will say: TCP is buggy - no one should call my
 driver in TASK_[UN]INTERRUPTIBLE state.
 
 Who is right? If there is no specification

The driver is buggy, unless the TCP maintainer can be convinced
that TCP is buggy. TCP is a big chunk of code that most people use,
while the driver is not so huge or critical.

The TCP maintainers do not seem to be sadistic bastards hell-bent on
breaking your drivers. API changes usually have a good reason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Mikulas Patocka

 One of these things must happen:
 
 a. follow the specification, even if that makes code slow and contorted
 b. change the specification
 c. ignore the specification
 d. get rid of the specification
 
 Option "a" will not be accepted around here. Sorry.

It should be followed in stable releases. (and usually is - except for few
cases - and except that there is no specification, just unwritten rules).

 The best you can
 hope for is option "b". Since that is hard work (want to help?) we
 often end up not using a specification... hopefully by just not
 having one, instead of by ignoring one.


  Now implementators of TCP will say: that driver is buggy. Everybody should
  set state=TASK_RUNNING before calling schedule to yield the process. 
  
  Implementators of driver will say: TCP is buggy - no one should call my
  driver in TASK_[UN]INTERRUPTIBLE state.
  
  Who is right? If there is no specification
 
 The driver is buggy, unless the TCP maintainer can be convinced
 that TCP is buggy. TCP is a big chunk of code that most people use,
 while the driver is not so huge or critical.
 
 The TCP maintainers do not seem to be sadistic bastards hell-bent on
 breaking your drivers. API changes usually have a good reason.

Why should block device developers read TCP/IP code? And only after
reading significant amount of it they realize that they can be called in
TASK_INTERRUPTIBLE state. 

They will most likely read other block drivers, find using schedule
without setting state and use it also that way. 

The only way to tell developers to always set state before using schedule
is to write it to specification.

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )

2001-02-19 Thread Eric W. Biederman

Mikulas Patocka [EMAIL PROTECTED] writes:

 Imagine that there is specification of mark_buffer_dirty. That
 specification says that
   1. it may not block
   2. it may block
 
 In case 1. implementators wouldn't change it to block in stable kernel
   relese because they don't want to violate the specification.
 
 In case 2. implementators of ext2 wouldn't assume that it doesn't block
   even if it doesn't in current implementation.

Whenever the question has been asked the answer is always assume anything
in the kernel outside of the current function blocks.  

 In both cases, the bug wouldn't be created.

Nope.  It looks like someone made a mistake in ext2...

 
 Anytime you change implementation of syscalls, you gotta check all
 applications that use them ;-) Luckily not - because there is
 specification and you can check that syscalls conform to the
 specification, not apps. 

Not normally.  The rule is that syscall don't change period.  The
internal kernel interface is different.  It is allowed to change.

As for syscall changing auditing most apps did happen when the LFS
spec was put together.  So you would have an implementation that would
keep most apps from failing on large files.

   Saying "code is the specification" is not good.
  
  I'm not arguing against documentation.  That is dumb.  But the code is
  ALWAYS canonical.  Not docs.
 
 Let's see:

 Who is right? If there is no specification

Hmm.  The developers should get together and pow wow when the problem
is noticed.  When it is finally talked out about how it should happen
then the code should get fixed accordingly.

It isn't about right and wrong it is about working code.  Not that
documenting things doesn't help.  And 2.4 is going in that direction...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/