Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Evgeniy Polyakov
On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote:
> On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >* Kirk Kuchov <[EMAIL PROTECTED]> wrote:
> >
> >> I don't believe I'm wasting my time explaining this. They don't exist
> >> as /dev/null, they are just fucking _LINKS_.
> >[...]
> >> > Either stop flaming kernel developers or become one. It is that
> >> > simple.
> >>
> >> If I were to become a kernel developer I would stick with FreeBSD.
> >> [...]
> >
> >Hey, really, this is an excellent idea: what a boon you could become to
> >FreeBSD, again! How much they must be longing for your insightful
> >feedback, how much they must be missing your charming style and tactful
> >approach! I bet they'll want to print your mails out, frame them and
> >hang them over their fireplace, to remember the good old days on cold
> >snowy winter days, with warmth in their hearts! Please?
> >
> 
> http://www.totallytom.com/thecureforgayness.html

Fonts are a bit bad in my browser :)

Kirk, I understand your frustration - yes, Linux is not the perfect
place to include startups ideas, and yes it lacks some features modern
(or old) systems support for years, but things change with time.

I posted a patch which allows to poll for signals, it can be trivially
adopted to support timers and essentially any other events.
Kevent did that too, but some things are just too radical for immediate
support, especially when majority of users do not require additional
functionality.

People do work, and a lot of them do really good work, so no need for
rude talks about how things are bad. Things change - even I support
that, although kevent ignorance should put me into the first line with
you :)

Be good, and be cool.

> --
> Kirk Kuchov

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Kirk Kuchov <[EMAIL PROTECTED]> wrote:

> I don't believe I'm wasting my time explaining this. They don't exist
> as /dev/null, they are just fucking _LINKS_.
[...]
> > Either stop flaming kernel developers or become one. It is that
> > simple.
>
> If I were to become a kernel developer I would stick with FreeBSD.
> [...]

Hey, really, this is an excellent idea: what a boon you could become to
FreeBSD, again! How much they must be longing for your insightful
feedback, how much they must be missing your charming style and tactful
approach! I bet they'll want to print your mails out, frame them and
hang them over their fireplace, to remember the good old days on cold
snowy winter days, with warmth in their hearts! Please?



http://www.totallytom.com/thecureforgayness.html

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Jens Axboe
On Wed, Mar 07 2007, Kirk Kuchov wrote:
> On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >* Kirk Kuchov <[EMAIL PROTECTED]> wrote:
> >
> >> I don't believe I'm wasting my time explaining this. They don't exist
> >> as /dev/null, they are just fucking _LINKS_.
> >[...]
> >> > Either stop flaming kernel developers or become one. It is that
> >> > simple.
> >>
> >> If I were to become a kernel developer I would stick with FreeBSD.
> >> [...]
> >
> >Hey, really, this is an excellent idea: what a boon you could become to
> >FreeBSD, again! How much they must be longing for your insightful
> >feedback, how much they must be missing your charming style and tactful
> >approach! I bet they'll want to print your mails out, frame them and
> >hang them over their fireplace, to remember the good old days on cold
> >snowy winter days, with warmth in their hearts! Please?
> >
> 
> http://www.totallytom.com/thecureforgayness.html

Dude, get a life. But more importantly, go waste somebody elses time
instead of lkml's.

-- 
Jens Axboe, updating killfile

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Ingo Molnar

* Kirk Kuchov <[EMAIL PROTECTED]> wrote:

> I don't believe I'm wasting my time explaining this. They don't exist 
> as /dev/null, they are just fucking _LINKS_.
[...]
> > Either stop flaming kernel developers or become one. It is that 
> > simple.
> 
> If I were to become a kernel developer I would stick with FreeBSD. 
> [...]

Hey, really, this is an excellent idea: what a boon you could become to 
FreeBSD, again! How much they must be longing for your insightful 
feedback, how much they must be missing your charming style and tactful 
approach! I bet they'll want to print your mails out, frame them and 
hang them over their fireplace, to remember the good old days on cold 
snowy winter days, with warmth in their hearts! Please?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Kirk Kuchov wrote:
> 
> I don't believe I'm wasting my time explaining this. They don't exist
> as /dev/null, they are just fucking _LINKS_. I could even "ln -s
> /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but
> that's not the point!

Actually, one large reason for /proc/self/ existing is exactly /dev/stdin 
and friends.

And yes, /proc/self looks like a link too, but that doesn't change the 
fact that it's a very special file. No different from /dev/null or 
friends.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)

2007-03-07 Thread Kirk Kuchov

On 3/7/07, Al Boldi <[EMAIL PROTECTED]> wrote:

Kirk Kuchov wrote:
> > Either stop flaming kernel developers or become one. It is  that
> > simple.
>
> If I were to become a kernel developer I would stick with FreeBSD. At
> least they have kqueue for about seven years now.

I have been playing with this thought for quite some time.  The question is,
can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I
have to leave all the GNU/Linux distributions behind as well?



http://www.debian.org/ports/kfreebsd-gnu/

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)

2007-03-07 Thread Al Boldi
Kirk Kuchov wrote:
> > Either stop flaming kernel developers or become one. It is  that
> > simple.
>
> If I were to become a kernel developer I would stick with FreeBSD. At
> least they have kqueue for about seven years now.

I have been playing with this thought for quite some time.  The question is, 
can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I 
have to leave all the GNU/Linux distributions behind as well?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/6/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,
>
> This is a stupid comparaison. By your logic we should also have /dev/stdin,
> /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr -> fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout -> fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 -> /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 -> /proc/2299/fd
[EMAIL PROTECTED]:~$


I don't believe I'm wasting my time explaining this. They don't exist
as /dev/null, they are just fucking _LINKS_. I could even "ln -s
/proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but
that's not the point!

It remains a stupid comparison because /dev/stdin/stderr/whatever
"must" be plugged, else how could a process write to stdout/stderr
that it coud'nt open it ? The way things are is not because it's
cleaner to have it as a file but because it's the only sane way.
/dev/null is not a must have, it's mainly used for redirecting
purposes. A sys_nullify(fileno(stdout)) would rule out almost any use
of /dev/null.


> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,



> >But here the list could be almost endless.
> >And please don't start the, they don't scale or they need heavy file
> >binding tossfeast. They scale as well as the interface that will receive
> >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
> >the struct file? How many signal/timer fd are you gonna have? Like 100K?
> >Really moot argument when opposed to the benefit of being compatible with
> >existing POSIX interfaces and being more Unix friendly.
>
> So why the HELL don't we have those yet? Why haven't you designed
> epoll with those in mind? Why don't you back your claims with patches?
> (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.



If I were to become a kernel developer I would stick with FreeBSD. At
least they have kqueue for about seven years now.

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/6/07, Pavel Machek [EMAIL PROTECTED] wrote:

 As for why common abstractions like file are a good thing, think about why
 having /dev/null is cleaner that having a special plug DEVNULL_FD fd
 value to be plugged everywhere,

 This is a stupid comparaison. By your logic we should also have /dev/stdin,
 /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr - fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin - fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout - fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 - /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 - /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 - /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 - /proc/2299/fd
[EMAIL PROTECTED]:~$


I don't believe I'm wasting my time explaining this. They don't exist
as /dev/null, they are just fucking _LINKS_. I could even ln -s
/proc/self/fd/0 sucker. A real /dev/stdout can/could even exist, but
that's not the point!

It remains a stupid comparison because /dev/stdin/stderr/whatever
must be plugged, else how could a process write to stdout/stderr
that it coud'nt open it ? The way things are is not because it's
cleaner to have it as a file but because it's the only sane way.
/dev/null is not a must have, it's mainly used for redirecting
purposes. A sys_nullify(fileno(stdout)) would rule out almost any use
of /dev/null.


 As for why common abstractions like file are a good thing, think about why
 having /dev/null is cleaner that having a special plug DEVNULL_FD fd
 value to be plugged everywhere,



 But here the list could be almost endless.
 And please don't start the, they don't scale or they need heavy file
 binding tossfeast. They scale as well as the interface that will receive
 them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
 the struct file? How many signal/timer fd are you gonna have? Like 100K?
 Really moot argument when opposed to the benefit of being compatible with
 existing POSIX interfaces and being more Unix friendly.

 So why the HELL don't we have those yet? Why haven't you designed
 epoll with those in mind? Why don't you back your claims with patches?
 (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.



If I were to become a kernel developer I would stick with FreeBSD. At
least they have kqueue for about seven years now.

--
Kirk Kuchov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Trading Places (was: [patch 00/13] Syslets, Threadlets, generic AIO support, v3)

2007-03-07 Thread Al Boldi
Kirk Kuchov wrote:
  Either stop flaming kernel developers or become one. It is  that
  simple.

 If I were to become a kernel developer I would stick with FreeBSD. At
 least they have kqueue for about seven years now.

I have been playing with this thought for quite some time.  The question is, 
can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I 
have to leave all the GNU/Linux distributions behind as well?


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trading Places (was: [patch 00/13] Syslets, Threadlets, generic AIO support, v3)

2007-03-07 Thread Kirk Kuchov

On 3/7/07, Al Boldi [EMAIL PROTECTED] wrote:

Kirk Kuchov wrote:
  Either stop flaming kernel developers or become one. It is  that
  simple.

 If I were to become a kernel developer I would stick with FreeBSD. At
 least they have kqueue for about seven years now.

I have been playing with this thought for quite some time.  The question is,
can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I
have to leave all the GNU/Linux distributions behind as well?



http://www.debian.org/ports/kfreebsd-gnu/

--
Kirk Kuchov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Kirk Kuchov wrote:
 
 I don't believe I'm wasting my time explaining this. They don't exist
 as /dev/null, they are just fucking _LINKS_. I could even ln -s
 /proc/self/fd/0 sucker. A real /dev/stdout can/could even exist, but
 that's not the point!

Actually, one large reason for /proc/self/ existing is exactly /dev/stdin 
and friends.

And yes, /proc/self looks like a link too, but that doesn't change the 
fact that it's a very special file. No different from /dev/null or 
friends.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Ingo Molnar

* Kirk Kuchov [EMAIL PROTECTED] wrote:

 I don't believe I'm wasting my time explaining this. They don't exist 
 as /dev/null, they are just fucking _LINKS_.
[...]
  Either stop flaming kernel developers or become one. It is that 
  simple.
 
 If I were to become a kernel developer I would stick with FreeBSD. 
 [...]

Hey, really, this is an excellent idea: what a boon you could become to 
FreeBSD, again! How much they must be longing for your insightful 
feedback, how much they must be missing your charming style and tactful 
approach! I bet they'll want to print your mails out, frame them and 
hang them over their fireplace, to remember the good old days on cold 
snowy winter days, with warmth in their hearts! Please?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote:


* Kirk Kuchov [EMAIL PROTECTED] wrote:

 I don't believe I'm wasting my time explaining this. They don't exist
 as /dev/null, they are just fucking _LINKS_.
[...]
  Either stop flaming kernel developers or become one. It is that
  simple.

 If I were to become a kernel developer I would stick with FreeBSD.
 [...]

Hey, really, this is an excellent idea: what a boon you could become to
FreeBSD, again! How much they must be longing for your insightful
feedback, how much they must be missing your charming style and tactful
approach! I bet they'll want to print your mails out, frame them and
hang them over their fireplace, to remember the good old days on cold
snowy winter days, with warmth in their hearts! Please?



http://www.totallytom.com/thecureforgayness.html

--
Kirk Kuchov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Jens Axboe
On Wed, Mar 07 2007, Kirk Kuchov wrote:
 On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote:
 
 * Kirk Kuchov [EMAIL PROTECTED] wrote:
 
  I don't believe I'm wasting my time explaining this. They don't exist
  as /dev/null, they are just fucking _LINKS_.
 [...]
   Either stop flaming kernel developers or become one. It is that
   simple.
 
  If I were to become a kernel developer I would stick with FreeBSD.
  [...]
 
 Hey, really, this is an excellent idea: what a boon you could become to
 FreeBSD, again! How much they must be longing for your insightful
 feedback, how much they must be missing your charming style and tactful
 approach! I bet they'll want to print your mails out, frame them and
 hang them over their fireplace, to remember the good old days on cold
 snowy winter days, with warmth in their hearts! Please?
 
 
 http://www.totallytom.com/thecureforgayness.html

Dude, get a life. But more importantly, go waste somebody elses time
instead of lkml's.

-- 
Jens Axboe, updating killfile

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-07 Thread Evgeniy Polyakov
On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote:
 On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote:
 
 * Kirk Kuchov [EMAIL PROTECTED] wrote:
 
  I don't believe I'm wasting my time explaining this. They don't exist
  as /dev/null, they are just fucking _LINKS_.
 [...]
   Either stop flaming kernel developers or become one. It is that
   simple.
 
  If I were to become a kernel developer I would stick with FreeBSD.
  [...]
 
 Hey, really, this is an excellent idea: what a boon you could become to
 FreeBSD, again! How much they must be longing for your insightful
 feedback, how much they must be missing your charming style and tactful
 approach! I bet they'll want to print your mails out, frame them and
 hang them over their fireplace, to remember the good old days on cold
 snowy winter days, with warmth in their hearts! Please?
 
 
 http://www.totallytom.com/thecureforgayness.html

Fonts are a bit bad in my browser :)

Kirk, I understand your frustration - yes, Linux is not the perfect
place to include startups ideas, and yes it lacks some features modern
(or old) systems support for years, but things change with time.

I posted a patch which allows to poll for signals, it can be trivially
adopted to support timers and essentially any other events.
Kevent did that too, but some things are just too radical for immediate
support, especially when majority of users do not require additional
functionality.

People do work, and a lot of them do really good work, so no need for
rude talks about how things are bad. Things change - even I support
that, although kevent ignorance should put me into the first line with
you :)

Be good, and be cool.

 --
 Kirk Kuchov

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-06 Thread Pavel Machek
> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,
> 
> This is a stupid comparaison. By your logic we should also have /dev/stdin,
> /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr -> fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout -> fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 -> /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 -> /proc/2299/fd
[EMAIL PROTECTED]:~$

> >But here the list could be almost endless.
> >And please don't start the, they don't scale or they need heavy file
> >binding tossfeast. They scale as well as the interface that will receive
> >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
> >the struct file? How many signal/timer fd are you gonna have? Like 100K?
> >Really moot argument when opposed to the benefit of being compatible with
> >existing POSIX interfaces and being more Unix friendly.
> 
> So why the HELL don't we have those yet? Why haven't you designed
> epoll with those in mind? Why don't you back your claims with patches?
> (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-06 Thread Pavel Machek
 As for why common abstractions like file are a good thing, think about why
 having /dev/null is cleaner that having a special plug DEVNULL_FD fd
 value to be plugged everywhere,
 
 This is a stupid comparaison. By your logic we should also have /dev/stdin,
 /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr - fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin - fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout - fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 - /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 - /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 - /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 - /proc/2299/fd
[EMAIL PROTECTED]:~$

 But here the list could be almost endless.
 And please don't start the, they don't scale or they need heavy file
 binding tossfeast. They scale as well as the interface that will receive
 them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
 the struct file? How many signal/timer fd are you gonna have? Like 100K?
 Really moot argument when opposed to the benefit of being compatible with
 existing POSIX interfaces and being more Unix friendly.
 
 So why the HELL don't we have those yet? Why haven't you designed
 epoll with those in mind? Why don't you back your claims with patches?
 (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Michael K. Edwards

On 3/4/07, Kyle Moffett <[EMAIL PROTECTED]> wrote:

Well, even this far into 2.6, Linus' patch from 2003 still (mostly)
applies; the maintenance cost for this kind of code is virtually
zilch.  If it matters that much to you clean it up and make it apply;
add an alarmfd() syscall (another 100 lines of code at most?) and
make a "read" return an architecture-independent siginfo-like
structure and submit it for inclusion.  Adding epoll() support for
random objects is as simple as a 75-line object-filesystem and a 25-
line syscall to return an FD to a new inode.  Have fun!  Go wild!
Something this trivially simple could probably spend a week in -mm
and go to linus for 2.6.22.


Or, if you want to do slightly more work and produce something a great
deal more useful, you could implement additional netlink address
families for additional "event" sources.  The socket - setsockopt -
bind - sendmsg/recvmsg sequence is a well understood and well
documented UNIX paradigm for multiplexing non-blocking I/O to many
destinations over one socket.  Everyone who has read Stevens is
familiar with the basic UDP and "fd open server" techniques, and if
you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll
see how easily they could be extended to file AIO and other kinds of
event sources.

For file AIO, you might have the application open one AIO socket per
mount point, open files indirectly via the SCM_RIGHTS mechanism, and
submit/retire read/write requests via sendmsg/recvmsg with ancillary
data consisting of an lseek64 tuple and a user-provided cookie.
Although the process still has to have one fd open per actual open
file (because trying to authenticate file accesses without opening fds
is madness), the only fds it has to manipulate directly are those
representing entire pools of outstanding requests.  This is usually a
small enough set that select() will do just fine, if you're careful
with fd allocation.  (You can simply punt indirectly opened fds up to
a high numerical range, where they can't be accessed directly from
userspace but still make fine cookies for use in lseek64 tuples within
cmsg headers).

The same basic approach will work for timers, signals, and just about
any other event source.  Userspace is of course still stuck doing its
own state machines / thread scheduling / however you choose to think
of it.  But all the important activity goes through socketcall(), and
the data and control parameters are all packaged up into a struct
msghdr instead of the bare buffer pointers of read/write.  So if
someone else does come along later and design an ultralight threading
mechanism that isn't a total botch, the actual data paths won't need
much rework; the exception handling will just get a lot simpler.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Magnus Naeslund(k)

Kirk Kuchov wrote:
[snip]


This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.



Well, as a matter of fact (on my system):

# ls -l /dev/std*
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stderr -> fd/2
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdin -> fd/0
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdout -> fd/1

Please don't bother to respond to this mail, I just saw that you 
apparently needed the info.


Magnus

P.S.: *PLONK*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Discussing LKML community [OT from the Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3]

2007-03-04 Thread Oleg Verych
> From: "Michael K. Edwards" <[EMAIL PROTECTED]>
> Newsgroups: gmane.linux.kernel
> Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
> Date: Wed, 28 Feb 2007 09:01:07 -0800

Michael,

[]
> In this instance, there didn't seem to be any harm in sending my
> thoughts to LKML as I wrote them, on the off chance that Ingo or
> Davide would get some value out of them in this design cycle (which
> any code I eventually get around to producing will miss).  So far,
> I've gotten some rather dismissive pushback from Ingo and Alan (who
> seem to have no interest outside x86 and less understanding than I
> would have thought of what real userspace code looks like), a "why
> preach to people who know more than you do" from Davide,

this may be sad, unless you've spent time and effort to make a Patch,
i.e. read source, understand why it's written so, why it's being used now
that way, and why it has to be updated on new cycle of kernel
development.

> a brief aside on the dominance of x86 from Oleg,

I didn't have a chance, and probably i will not have one, to communicate
with people like you to learn from your wisdom personally. That's why
i've replied to your, after you've mentioned transputers. And i've got
rather different opinion, than i expected. That shows my test-tube
being, little experience etc. As discussion was about CPUs, it was
technical, thus on-topic for LKML.

> and one off-list "keep up the good work".  Not a very rich harvest from
> (IMHO) pretty good seeds.

Offlist message was my share of view about things, that were offtopic,
and clarifying about lkml thing, and it wasn't on-topic for LKML.

I'm pretty sure, that there libraries of books, written on every single
bit of things Linux currently *implements* in asm/C.

(1) Thus, `return -ENOPATCH', man, regardless what you are saying in
lkml. That's why prominent people, you've joined me with (:, replied in
go-to-kernelnewbie style.

> In short, so far the "Linux kernel community" is upholding its
> reputation for insularity, arrogance, coding without prior design,
> lack of interest in userspace problems, and inability to learn from
> the mistakes of others.  (None of these characterizations depends on
> there being any real insight in anything I have written.)

You, as a person, who have right to be personally wrong, may think that
way. But do not forget, as i've wrote you offlist and in (1), this is
development community, sometimes development of development one, etc;
educated, enthusiastic, wise, Open Source, poor on time (and money :).

> Happy hacking,
> - Michael

And you too. LKML *can* (sometimes may) show how useful this hacking is.

> P. S.  I do think "threadlets" are brilliant, though, and reading
> Ingo's patches gave me a much better idea of what would be involved in
> prototyping Asynchronously Executed I/O Unit opcodes.

You are discussing on-topic thing in the P.S. And this is IMHO wrong
approach.

Also, note, that i've changed subject, stripped cc list, please note,
that i can be young and naive boy barking up the wrong tree.

Kind regards.
--
-o--=O`C  /. .\
 #oo'L O  o
<___=E M^-- (Wuuf)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Davide Libenzi
On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> I don't give a shit.

Here's another good use of /dev/null:

*PLONK*



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kirk Kuchov

On 3/4/07, Davide Libenzi  wrote:

On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> On 3/3/07, Davide Libenzi  wrote:
> > 
> >
> > Those *other* (tons?!?) interfaces can be created *when* the need comes
> > (see Linus signalfd [1] example to show how urgent that was). *When*
> > the need comes, they will work with existing POSIX interfaces, without
> > requiring your own just-another event interface. Those other interfaces
> > could also be more easily adopted by other Unix cousins, because of
> > the fact that they rely on existing POSIX interfaces.
>
> Please stop with this crap, this chicken or the egg argument of yours is utter
> BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.


I don't give a shit.


You started bad, and you end even worse. By listing a some APIs that will
work only with epoll. As I said already, and as it was listed in the
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the
software does not have scalability issues, and select/poll are more
portable and easy to use for simple stuff. On top of that, as I already
said, they are *confined* interfaces that could be more easily adopted by
other Unixes (if they are 100-200 lines on Linux, don't expect them to be
a lot more on other Unixes) [2]. We *already* have the infrastructure
inside Linux to deliver events (f_op->poll subsystem), how about we use
that instead of just-another way? [3]


Man you're so full of shit, your eyes are brown. NOBODY cares about
select/poll or that the interfaces are going to be adopted by other
Unixes. This issue has already been solved by then YEARS ago.

What I want (and a ton of other users) is a SIMPLE and generic way to
receive events from _MULTIPLE_multiple sources. I don't care about
kernel-level portability, easiness or whatever, the linux kernel
developers are good at not knowing what their users want.


As for why common abstractions like file are a good thing, think about why
having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
value to be plugged everywhere,


This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.


or why I can use find/grep/cat/echo/... to
look/edit at my configuration inside /proc, instead of using a frigging
registry editor.


Yet another stupid comparaison, /proc is a MESS! Almost as worse as
the registry. Linux now has three pieces of crap for
configuration/information: /proc, sysfs and sysctl. Nobody knows
exactly what should go into each one of those. Crap design at it's
best.


But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file
binding tossfeast. They scale as well as the interface that will receive
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
the struct file? How many signal/timer fd are you gonna have? Like 100K?
Really moot argument when opposed to the benefit of being compatible with
existing POSIX interfaces and being more Unix friendly.


So why the HELL don't we have those yet? Why haven't you designed
epoll with those in mind? Why don't you back your claims with patches?
(I'm not a kernel developer.)


As for the AIO stuff, if threadlets/syslets will prove effective, you can
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of
userspace code needed to do that, fall inside your definition of "kludge",
we can even find a way to bridge the two.


I don't care about threadlets in this context, I just want to wait for
EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap.
Your arrogance is amusing, stop pushing narrow-minded beliefs down the
throats of all Linux users. Kqueue, event ports,
WaitForMultipleObjects, epoll with multiple sources. That's what users
want, not yet another syscall/whatever hack.


Now, how about we focus on the topic of this thread?

[1] This could be an idea. People already uses pipes for this, but pipes
has some memory overhead inside the kernel (plus use two fds) that
could, if really felt necessary, be avoided.


Yet another hack!! 64kiB of space just to push some user events
around. Great idea!



[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into
slurping-in a whole new interface with tenths of sub, interface-only,
objects. And from this POV, epoll is the friendlier.


Who said I want yet another interface? I just fucking want to receive
events from MULTIPLE sources through epoll. With or without a fd! My
anger and frustration is that we can get past this SIMPLE need!


[3] Notice the similarity between threadlets/syslets and epoll? They
enable pretty darn good scalability, with *existing* infrastructure,
and w/out special ad-hoc code to 

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Davide Libenzi
On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> On 3/3/07, Davide Libenzi  wrote:
> > 
> > 
> > Those *other* (tons?!?) interfaces can be created *when* the need comes
> > (see Linus signalfd [1] example to show how urgent that was). *When*
> > the need comes, they will work with existing POSIX interfaces, without
> > requiring your own just-another event interface. Those other interfaces
> > could also be more easily adopted by other Unix cousins, because of
> > the fact that they rely on existing POSIX interfaces.
> 
> Please stop with this crap, this chicken or the egg argument of yours is utter
> BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.
You started bad, and you end even worse. By listing a some APIs that will 
work only with epoll. As I said already, and as it was listed in the 
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the 
software does not have scalability issues, and select/poll are more 
portable and easy to use for simple stuff. On top of that, as I already 
said, they are *confined* interfaces that could be more easily adopted by 
other Unixes (if they are 100-200 lines on Linux, don't expect them to be 
a lot more on other Unixes) [2]. We *already* have the infrastructure 
inside Linux to deliver events (f_op->poll subsystem), how about we use 
that instead of just-another way? [3]
As for why common abstractions like file are a good thing, think about why 
having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd 
value to be plugged everywhere, or why I can use find/grep/cat/echo/... to 
look/edit at my configuration inside /proc, instead of using a frigging 
registry editor. But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file 
binding tossfeast. They scale as well as the interface that will receive 
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for 
the struct file? How many signal/timer fd are you gonna have? Like 100K? 
Really moot argument when opposed to the benefit of being compatible with 
existing POSIX interfaces and being more Unix friendly.
As for the AIO stuff, if threadlets/syslets will prove effective, you can 
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of 
userspace code needed to do that, fall inside your definition of "kludge", 
we can even find a way to bridge the two.
Now, how about we focus on the topic of this thread?




[1] This could be an idea. People already uses pipes for this, but pipes 
has some memory overhead inside the kernel (plus use two fds) that 
could, if really felt necessary, be avoided.

[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into 
slurping-in a whole new interface with tenths of sub, interface-only, 
objects. And from this POV, epoll is the friendlier.

[3] Notice the similarity between threadlets/syslets and epoll? They 
enable pretty darn good scalability, with *existing* infrastructure, 
and w/out special ad-hoc code to be plugged everywhere. This translate 
directly in easier to maintain code.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kyle Moffett

On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote:
So here we are, 2007. epoll() works with files, pipes, sockets,  
inotify and anything pollable (file descriptors) but aio, timers,  
signals and user-defined event. Can we please get those working  
with epoll ? Something as simple as:


[code snipped]

Would this be acceptable? Can we finally move on?


Well, even this far into 2.6, Linus' patch from 2003 still (mostly)  
applies; the maintenance cost for this kind of code is virtually  
zilch.  If it matters that much to you clean it up and make it apply;  
add an alarmfd() syscall (another 100 lines of code at most?) and  
make a "read" return an architecture-independent siginfo-like  
structure and submit it for inclusion.  Adding epoll() support for  
random objects is as simple as a 75-line object-filesystem and a 25- 
line syscall to return an FD to a new inode.  Have fun!  Go wild!   
Something this trivially simple could probably spend a week in -mm  
and go to linus for 2.6.22.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kirk Kuchov

On 3/3/07, Davide Libenzi  wrote:



Those *other* (tons?!?) interfaces can be created *when* the need comes
(see Linus signalfd [1] example to show how urgent that was). *When*
the need comes, they will work with existing POSIX interfaces, without
requiring your own just-another event interface. Those other interfaces
could also be more easily adopted by other Unix cousins, because of
the fact that they rely on existing POSIX interfaces.


Please stop with this crap, this chicken or the egg argument of yours is utter
BULLSHIT! Just because Linux doesn't have a decent kernel event
notification mechanism it does not mean that users don't need. Nobody
cared about Linus's
signalfd because it wasn't mainline.

Look at any event notification libraries out there, it makes me sick how much
kludge they have to go thru to get near the same functionality of
kqueue on Linux.

Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD
and Mac OS X support kqueue since around 2000. Windows has had event
notification for ages now. These _facilities_ are all widely used,
given the platforms
popularity.

So here we are, 2007. epoll() works with files, pipes, sockets,
inotify and anything
pollable (file descriptors) but aio, timers, signals and user-defined
event. Can we
please get those working with epoll ? Something as simple as:

struct epoll_event ev;

ev.events = EV_TIMER | EPOLLONESHOT;
ev.data.u64 = 1000; /* timeout */

epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, );

or

struct sigevent ev;

ev.sigev_notify = SIGEV_EPOLL;
ev.sigev_signo = epfd;
ev.sigev_value = 

timer_create(CLOCK_MONOTONIC, , );

AIO:

struct sigevent ev;
int fd = io_setup(..); /* oh boy, I wish... but it works */

ev.events = EV_AIO | EPOLLONESHOT;
/* event.data.ptr returns pointer to the iocb */
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, );

or

struct iocb iocb;

iocb.aio_fildes = fileno(stdin);
iocb.aio_lio_opcode = IO_CMD_PREAD;
iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */

Would this be acceptable? Can we finally move on?

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Michael K. Edwards

Please don't take this the wrong way, Ray, but I don't think _you_
understand the problem space that people are (or should be) trying to
address here.

Servers want to always, always block.  Not on a socket, not on a stat,
not on any _one_ thing, but in a condition where the optimum number of
concurrent I/O requests are outstanding (generally of several kinds
with widely varying expected latencies).  I have an embedded server I
wrote that avoids forking internally for any reason, although it
watches the damn serial port signals in parallel with handling network
I/O, audio, and child processes that handle VoIP signaling protocols
(which are separate processes because it was more practical to write
them in a different language with mediocre embeddability).  There's a
lot of things that can block out there, not just disk I/O, but the
only thing a genuinely scalable server process ever blocks on (apart
from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like
select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or
if it genuinely doesn't suck, the thread scheduler).

Furthermore, not only do servers want to block rather than shove more
I/O into the plumbing than it can handle without backing up, they also
want to throttle the concurrency of requests at the kernel level *for
the kernel's benefit*.  In particular, a server wants to submit to the
kernel a ton of stats and I/O in parallel, far more than it makes
sense to actually issue concurrently, so that efficient sequencing of
these requests can be left to the kernel.  But the server wants to
guide the kernel with regard to the ratios of concurrency appropriate
to the various classes and the relative urgency of the individual
requests within each class.  The server also wants to be able to
reprioritize groups of requests or cancel them altogether based on new
information about hardware status and user behavior.

Finally, the biggest argument against syslets/threadlets AFAICS is
that -- if done incorrectly, as currently proposed -- they would unify
the AIO and normal IO paths in the kernel.  This would shackle AIO to
the current semantics of synchronous syscalls, in which buffers are
passed as bare pointers and exceptional results are tangled up with
programming errors.  This would, in turn, make it quite impossible for
future hardware to pipeline and speculatively execute chains of AIO
operations, leaving "syslets" to a few RDBMS programmers with time to
burn.  The unimproved ease of long term maintenance on the kernel (not
to mention the complete failure to make the writing of _correct_,
performant server code any easier) makes them unworthy of
consideration for inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really total irrelevancies.  The principal problem
that needs solving is to model the process's pool of in-flight I/O
requests, together with a much larger number of submitted but not yet
issued requests whose results are foreseeably likely to be needed
soon, using a data structure that efficiently supports _all_ of the
operations needed, including bulk cancellation, reprioritization, and
batch migration based on affinities among requests and locality to the
correct I/O resources.  Memory footprint and gentle-on-real-hardware
scheduling are secondary, but also important, considerations.  If you
happen to be able to service certain things directly from cache,
that's gravy -- but it's not very smart IMHO to put that central to
your design process.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Michael K. Edwards

Please don't take this the wrong way, Ray, but I don't think _you_
understand the problem space that people are (or should be) trying to
address here.

Servers want to always, always block.  Not on a socket, not on a stat,
not on any _one_ thing, but in a condition where the optimum number of
concurrent I/O requests are outstanding (generally of several kinds
with widely varying expected latencies).  I have an embedded server I
wrote that avoids forking internally for any reason, although it
watches the damn serial port signals in parallel with handling network
I/O, audio, and child processes that handle VoIP signaling protocols
(which are separate processes because it was more practical to write
them in a different language with mediocre embeddability).  There's a
lot of things that can block out there, not just disk I/O, but the
only thing a genuinely scalable server process ever blocks on (apart
from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like
select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or
if it genuinely doesn't suck, the thread scheduler).

Furthermore, not only do servers want to block rather than shove more
I/O into the plumbing than it can handle without backing up, they also
want to throttle the concurrency of requests at the kernel level *for
the kernel's benefit*.  In particular, a server wants to submit to the
kernel a ton of stats and I/O in parallel, far more than it makes
sense to actually issue concurrently, so that efficient sequencing of
these requests can be left to the kernel.  But the server wants to
guide the kernel with regard to the ratios of concurrency appropriate
to the various classes and the relative urgency of the individual
requests within each class.  The server also wants to be able to
reprioritize groups of requests or cancel them altogether based on new
information about hardware status and user behavior.

Finally, the biggest argument against syslets/threadlets AFAICS is
that -- if done incorrectly, as currently proposed -- they would unify
the AIO and normal IO paths in the kernel.  This would shackle AIO to
the current semantics of synchronous syscalls, in which buffers are
passed as bare pointers and exceptional results are tangled up with
programming errors.  This would, in turn, make it quite impossible for
future hardware to pipeline and speculatively execute chains of AIO
operations, leaving syslets to a few RDBMS programmers with time to
burn.  The unimproved ease of long term maintenance on the kernel (not
to mention the complete failure to make the writing of _correct_,
performant server code any easier) makes them unworthy of
consideration for inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really total irrelevancies.  The principal problem
that needs solving is to model the process's pool of in-flight I/O
requests, together with a much larger number of submitted but not yet
issued requests whose results are foreseeably likely to be needed
soon, using a data structure that efficiently supports _all_ of the
operations needed, including bulk cancellation, reprioritization, and
batch migration based on affinities among requests and locality to the
correct I/O resources.  Memory footprint and gentle-on-real-hardware
scheduling are secondary, but also important, considerations.  If you
happen to be able to service certain things directly from cache,
that's gravy -- but it's not very smart IMHO to put that central to
your design process.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Kirk Kuchov

On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote:

snip

Those *other* (tons?!?) interfaces can be created *when* the need comes
(see Linus signalfd [1] example to show how urgent that was). *When*
the need comes, they will work with existing POSIX interfaces, without
requiring your own just-another event interface. Those other interfaces
could also be more easily adopted by other Unix cousins, because of
the fact that they rely on existing POSIX interfaces.


Please stop with this crap, this chicken or the egg argument of yours is utter
BULLSHIT! Just because Linux doesn't have a decent kernel event
notification mechanism it does not mean that users don't need. Nobody
cared about Linus's
signalfd because it wasn't mainline.

Look at any event notification libraries out there, it makes me sick how much
kludge they have to go thru to get near the same functionality of
kqueue on Linux.

Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD
and Mac OS X support kqueue since around 2000. Windows has had event
notification for ages now. These _facilities_ are all widely used,
given the platforms
popularity.

So here we are, 2007. epoll() works with files, pipes, sockets,
inotify and anything
pollable (file descriptors) but aio, timers, signals and user-defined
event. Can we
please get those working with epoll ? Something as simple as:

struct epoll_event ev;

ev.events = EV_TIMER | EPOLLONESHOT;
ev.data.u64 = 1000; /* timeout */

epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, ev);

or

struct sigevent ev;

ev.sigev_notify = SIGEV_EPOLL;
ev.sigev_signo = epfd;
ev.sigev_value = ev;

timer_create(CLOCK_MONOTONIC, ev, timerid);

AIO:

struct sigevent ev;
int fd = io_setup(..); /* oh boy, I wish... but it works */

ev.events = EV_AIO | EPOLLONESHOT;
/* event.data.ptr returns pointer to the iocb */
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, ev);

or

struct iocb iocb;

iocb.aio_fildes = fileno(stdin);
iocb.aio_lio_opcode = IO_CMD_PREAD;
iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */

Would this be acceptable? Can we finally move on?

--
Kirk Kuchov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Kyle Moffett

On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote:
So here we are, 2007. epoll() works with files, pipes, sockets,  
inotify and anything pollable (file descriptors) but aio, timers,  
signals and user-defined event. Can we please get those working  
with epoll ? Something as simple as:


[code snipped]

Would this be acceptable? Can we finally move on?


Well, even this far into 2.6, Linus' patch from 2003 still (mostly)  
applies; the maintenance cost for this kind of code is virtually  
zilch.  If it matters that much to you clean it up and make it apply;  
add an alarmfd() syscall (another 100 lines of code at most?) and  
make a read return an architecture-independent siginfo-like  
structure and submit it for inclusion.  Adding epoll() support for  
random objects is as simple as a 75-line object-filesystem and a 25- 
line syscall to return an FD to a new inode.  Have fun!  Go wild!   
Something this trivially simple could probably spend a week in -mm  
and go to linus for 2.6.22.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Davide Libenzi
On Sun, 4 Mar 2007, Kirk Kuchov wrote:

 On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote:
  snip
  
  Those *other* (tons?!?) interfaces can be created *when* the need comes
  (see Linus signalfd [1] example to show how urgent that was). *When*
  the need comes, they will work with existing POSIX interfaces, without
  requiring your own just-another event interface. Those other interfaces
  could also be more easily adopted by other Unix cousins, because of
  the fact that they rely on existing POSIX interfaces.
 
 Please stop with this crap, this chicken or the egg argument of yours is utter
 BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.
You started bad, and you end even worse. By listing a some APIs that will 
work only with epoll. As I said already, and as it was listed in the 
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the 
software does not have scalability issues, and select/poll are more 
portable and easy to use for simple stuff. On top of that, as I already 
said, they are *confined* interfaces that could be more easily adopted by 
other Unixes (if they are 100-200 lines on Linux, don't expect them to be 
a lot more on other Unixes) [2]. We *already* have the infrastructure 
inside Linux to deliver events (f_op-poll subsystem), how about we use 
that instead of just-another way? [3]
As for why common abstractions like file are a good thing, think about why 
having /dev/null is cleaner that having a special plug DEVNULL_FD fd 
value to be plugged everywhere, or why I can use find/grep/cat/echo/... to 
look/edit at my configuration inside /proc, instead of using a frigging 
registry editor. But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file 
binding tossfeast. They scale as well as the interface that will receive 
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for 
the struct file? How many signal/timer fd are you gonna have? Like 100K? 
Really moot argument when opposed to the benefit of being compatible with 
existing POSIX interfaces and being more Unix friendly.
As for the AIO stuff, if threadlets/syslets will prove effective, you can 
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of 
userspace code needed to do that, fall inside your definition of kludge, 
we can even find a way to bridge the two.
Now, how about we focus on the topic of this thread?




[1] This could be an idea. People already uses pipes for this, but pipes 
has some memory overhead inside the kernel (plus use two fds) that 
could, if really felt necessary, be avoided.

[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into 
slurping-in a whole new interface with tenths of sub, interface-only, 
objects. And from this POV, epoll is the friendlier.

[3] Notice the similarity between threadlets/syslets and epoll? They 
enable pretty darn good scalability, with *existing* infrastructure, 
and w/out special ad-hoc code to be plugged everywhere. This translate 
directly in easier to maintain code.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Kirk Kuchov

On 3/4/07, Davide Libenzi davidel@xmailserver.org wrote:

On Sun, 4 Mar 2007, Kirk Kuchov wrote:

 On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote:
  snip
 
  Those *other* (tons?!?) interfaces can be created *when* the need comes
  (see Linus signalfd [1] example to show how urgent that was). *When*
  the need comes, they will work with existing POSIX interfaces, without
  requiring your own just-another event interface. Those other interfaces
  could also be more easily adopted by other Unix cousins, because of
  the fact that they rely on existing POSIX interfaces.

 Please stop with this crap, this chicken or the egg argument of yours is utter
 BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.


I don't give a shit.


You started bad, and you end even worse. By listing a some APIs that will
work only with epoll. As I said already, and as it was listed in the
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the
software does not have scalability issues, and select/poll are more
portable and easy to use for simple stuff. On top of that, as I already
said, they are *confined* interfaces that could be more easily adopted by
other Unixes (if they are 100-200 lines on Linux, don't expect them to be
a lot more on other Unixes) [2]. We *already* have the infrastructure
inside Linux to deliver events (f_op-poll subsystem), how about we use
that instead of just-another way? [3]


Man you're so full of shit, your eyes are brown. NOBODY cares about
select/poll or that the interfaces are going to be adopted by other
Unixes. This issue has already been solved by then YEARS ago.

What I want (and a ton of other users) is a SIMPLE and generic way to
receive events from _MULTIPLE_multiple sources. I don't care about
kernel-level portability, easiness or whatever, the linux kernel
developers are good at not knowing what their users want.


As for why common abstractions like file are a good thing, think about why
having /dev/null is cleaner that having a special plug DEVNULL_FD fd
value to be plugged everywhere,


This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.


or why I can use find/grep/cat/echo/... to
look/edit at my configuration inside /proc, instead of using a frigging
registry editor.


Yet another stupid comparaison, /proc is a MESS! Almost as worse as
the registry. Linux now has three pieces of crap for
configuration/information: /proc, sysfs and sysctl. Nobody knows
exactly what should go into each one of those. Crap design at it's
best.


But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file
binding tossfeast. They scale as well as the interface that will receive
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
the struct file? How many signal/timer fd are you gonna have? Like 100K?
Really moot argument when opposed to the benefit of being compatible with
existing POSIX interfaces and being more Unix friendly.


So why the HELL don't we have those yet? Why haven't you designed
epoll with those in mind? Why don't you back your claims with patches?
(I'm not a kernel developer.)


As for the AIO stuff, if threadlets/syslets will prove effective, you can
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of
userspace code needed to do that, fall inside your definition of kludge,
we can even find a way to bridge the two.


I don't care about threadlets in this context, I just want to wait for
EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap.
Your arrogance is amusing, stop pushing narrow-minded beliefs down the
throats of all Linux users. Kqueue, event ports,
WaitForMultipleObjects, epoll with multiple sources. That's what users
want, not yet another syscall/whatever hack.


Now, how about we focus on the topic of this thread?

[1] This could be an idea. People already uses pipes for this, but pipes
has some memory overhead inside the kernel (plus use two fds) that
could, if really felt necessary, be avoided.


Yet another hack!! 64kiB of space just to push some user events
around. Great idea!



[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into
slurping-in a whole new interface with tenths of sub, interface-only,
objects. And from this POV, epoll is the friendlier.


Who said I want yet another interface? I just fucking want to receive
events from MULTIPLE sources through epoll. With or without a fd! My
anger and frustration is that we can get past this SIMPLE need!


[3] Notice the similarity between threadlets/syslets and epoll? They
enable pretty darn good scalability, with *existing* infrastructure,
and 

Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Davide Libenzi
On Sun, 4 Mar 2007, Kirk Kuchov wrote:

 I don't give a shit.

Here's another good use of /dev/null:

*PLONK*



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Discussing LKML community [OT from the Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3]

2007-03-04 Thread Oleg Verych
 From: Michael K. Edwards [EMAIL PROTECTED]
 Newsgroups: gmane.linux.kernel
 Subject: Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
 Date: Wed, 28 Feb 2007 09:01:07 -0800

Michael,

[]
 In this instance, there didn't seem to be any harm in sending my
 thoughts to LKML as I wrote them, on the off chance that Ingo or
 Davide would get some value out of them in this design cycle (which
 any code I eventually get around to producing will miss).  So far,
 I've gotten some rather dismissive pushback from Ingo and Alan (who
 seem to have no interest outside x86 and less understanding than I
 would have thought of what real userspace code looks like), a why
 preach to people who know more than you do from Davide,

this may be sad, unless you've spent time and effort to make a Patch,
i.e. read source, understand why it's written so, why it's being used now
that way, and why it has to be updated on new cycle of kernel
development.

 a brief aside on the dominance of x86 from Oleg,

I didn't have a chance, and probably i will not have one, to communicate
with people like you to learn from your wisdom personally. That's why
i've replied to your, after you've mentioned transputers. And i've got
rather different opinion, than i expected. That shows my test-tube
being, little experience etc. As discussion was about CPUs, it was
technical, thus on-topic for LKML.

 and one off-list keep up the good work.  Not a very rich harvest from
 (IMHO) pretty good seeds.

Offlist message was my share of view about things, that were offtopic,
and clarifying about lkml thing, and it wasn't on-topic for LKML.

I'm pretty sure, that there libraries of books, written on every single
bit of things Linux currently *implements* in asm/C.

(1) Thus, `return -ENOPATCH', man, regardless what you are saying in
lkml. That's why prominent people, you've joined me with (:, replied in
go-to-kernelnewbie style.

 In short, so far the Linux kernel community is upholding its
 reputation for insularity, arrogance, coding without prior design,
 lack of interest in userspace problems, and inability to learn from
 the mistakes of others.  (None of these characterizations depends on
 there being any real insight in anything I have written.)

You, as a person, who have right to be personally wrong, may think that
way. But do not forget, as i've wrote you offlist and in (1), this is
development community, sometimes development of development one, etc;
educated, enthusiastic, wise, Open Source, poor on time (and money :).

 Happy hacking,
 - Michael

And you too. LKML *can* (sometimes may) show how useful this hacking is.

 P. S.  I do think threadlets are brilliant, though, and reading
 Ingo's patches gave me a much better idea of what would be involved in
 prototyping Asynchronously Executed I/O Unit opcodes.

You are discussing on-topic thing in the P.S. And this is IMHO wrong
approach.

Also, note, that i've changed subject, stripped cc list, please note,
that i can be young and naive boy barking up the wrong tree.

Kind regards.
--
-o--=O`C  /. .\
 #oo'L O  o
___=E M^-- (Wuuf)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Magnus Naeslund(k)

Kirk Kuchov wrote:
[snip]


This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.



Well, as a matter of fact (on my system):

# ls -l /dev/std*
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stderr - fd/2
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdin - fd/0
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdout - fd/1

Please don't bother to respond to this mail, I just saw that you 
apparently needed the info.


Magnus

P.S.: *PLONK*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-04 Thread Michael K. Edwards

On 3/4/07, Kyle Moffett [EMAIL PROTECTED] wrote:

Well, even this far into 2.6, Linus' patch from 2003 still (mostly)
applies; the maintenance cost for this kind of code is virtually
zilch.  If it matters that much to you clean it up and make it apply;
add an alarmfd() syscall (another 100 lines of code at most?) and
make a read return an architecture-independent siginfo-like
structure and submit it for inclusion.  Adding epoll() support for
random objects is as simple as a 75-line object-filesystem and a 25-
line syscall to return an FD to a new inode.  Have fun!  Go wild!
Something this trivially simple could probably spend a week in -mm
and go to linus for 2.6.22.


Or, if you want to do slightly more work and produce something a great
deal more useful, you could implement additional netlink address
families for additional event sources.  The socket - setsockopt -
bind - sendmsg/recvmsg sequence is a well understood and well
documented UNIX paradigm for multiplexing non-blocking I/O to many
destinations over one socket.  Everyone who has read Stevens is
familiar with the basic UDP and fd open server techniques, and if
you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll
see how easily they could be extended to file AIO and other kinds of
event sources.

For file AIO, you might have the application open one AIO socket per
mount point, open files indirectly via the SCM_RIGHTS mechanism, and
submit/retire read/write requests via sendmsg/recvmsg with ancillary
data consisting of an lseek64 tuple and a user-provided cookie.
Although the process still has to have one fd open per actual open
file (because trying to authenticate file accesses without opening fds
is madness), the only fds it has to manipulate directly are those
representing entire pools of outstanding requests.  This is usually a
small enough set that select() will do just fine, if you're careful
with fd allocation.  (You can simply punt indirectly opened fds up to
a high numerical range, where they can't be accessed directly from
userspace but still make fine cookies for use in lseek64 tuples within
cmsg headers).

The same basic approach will work for timers, signals, and just about
any other event source.  Userspace is of course still stuck doing its
own state machines / thread scheduling / however you choose to think
of it.  But all the important activity goes through socketcall(), and
the data and control parameters are all packaged up into a struct
msghdr instead of the bare buffer pointers of read/write.  So if
someone else does come along later and design an ultralight threading
mechanism that isn't a total botch, the actual data paths won't need
much rework; the exception handling will just get a lot simpler.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ray Lee
Ihar `Philips` Filipau wrote:
> On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote:
>> On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:
>> > What I'm trying to get to: keep things simple. The proposed
>> > optimization by Ingo does nothing else but allowing AIO to probe file
>> > cache - if data there to go with fast path. So why not to implement
>> > what the people want - probing of cache? Because it sounds bad? But
>> > they are in fact proposing precisely that just masked with "fast
>> > threads".
>>
>>
>> Servers want to never, ever block. Not on a socket, not on a stat, not
>> on anything. (I have an embedded server I wrote that has to fork
>> internally just to watch the damn serial port signals in parallel with
>> handling network I/O, audio, and child processes that handle H323.)
>> There's a lot of things that can block out there, and it's not just
>> disk I/O.
>>
> 
> Why select/poll/epoll/friends do not work? I have programmed on both
> sides - user-space network servers and in-kernel network protocols -
> and "never blocking" thing was implemented in *nix in the times I was
> walking under table.
> 

Then you've never had to write something that watches serial port
signals. Google on TIOCMIWAIT to see what I'm talking about. The only
option for a userspace programmer to deal with that is to fork() or poll
the signals every so many milliseconds. There are probably more easy
examples, but that's the one off the top of my head that affected me.

In short, this isn't just about network IO, this isn't just about file IO.

> One can poll() more or less *any* device in system. With frigging
> exception of - right - files.

The problem is the "more or less." Say you're right, and 95% of the
system calls are either already asynchronous or non-blocking/poll()able.
One of the questions on the table is how to extend it to the last 5%.

> User-space-wise, check how squid (caching http proxy) does it: you
> have several (forked) instances to serve network requests and you have
> one/several disk I/O daemons. (So called "diskd storeio") Why? Because
> you cannot poll() file descriptors, but you can poll unix socket
> connected to diskd. If diskd blocks, squid still can serve requests.
> How threadlets are better then pool of diskd instances? All nastiness
> of shared memory set loose...

Samba/lighttpd/git want to issue dozens of stats in parallel so that the
kernel can have an opportunity to sort them better. Are you saying they
should fork() a process per stat that they want to issue in parallel?

> What I'm trying to get to. Threadlets wouldn't help existing
> single-threaded applications - what is about 95% of all applications.

Eh, I don't think that's right. Part of the reason threadlets and
syslets are on the table because it may be a more efficient way to do
AIO. And the differences between the syslet API and the current kernel
Async IO API can be abstracted away by glibc, so that today's apps that
do AIO would immediately benefit.

> What's more, as having some limited experience of kernel programming,
> I fail to see what threadlets would simplify on kernel side.

You can yank the entire separate AIO path, and just treat them as
another blocking API that syslets makes nonblocking. Immediate reduction
of code, and everybody is now using the same code paths, which means
higher test coverage and reduced maintenance cost.

This last point is really important. Even if no extra functionality
eventually makes it to userspace, this last point would still be enough
to make the powers that be consider inclusion.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote:

On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".


Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.



Why select/poll/epoll/friends do not work? I have programmed on both
sides - user-space network servers and in-kernel network protocols -
and "never blocking" thing was implemented in *nix in the times I was
walking under table.

One can poll() more or less *any* device in system. With frigging
exception of - right - files. IOW for 75% of I/O problem doesn't
exists since there is proper interface - e.g. sockets - in place.

User-space-wise, check how squid (caching http proxy) does it: you
have several (forked) instances to serve network requests and you have
one/several disk I/O daemons. (So called "diskd storeio") Why? Because
you cannot poll() file descriptors, but you can poll unix socket
connected to diskd. If diskd blocks, squid still can serve requests.
How threadlets are better then pool of diskd instances? All nastiness
of shared memory set loose...

What I'm trying to get to. Threadlets wouldn't help existing
single-threaded applications - what is about 95% of all applications.
And multi-threaded applications would gain little because few real
application create threads dynamically: creation need resources and
can fail, uncontrollable thread spawning hurts overall manageability
and additional care is needed regarding deadlocks/lock contentions
proofing. (The category of applications which want the performance
gain are also the applications which need to ensure greater stability
over long non-stop runs. Uncontrollable dynamism helps nothing.)

Having implemented several "file servers" - daemons serving file I/O
to other daemons - I honestly hardly see any improvements. Now people
configure such file servers to issue e.g. 10 file operations
simultaneously - using pool of 10 threads. What threadlets change? In
the end just to keep in check with threadlets I would need to issue
pthread_join() after some number of threadlets created. And the latter
number is the former "e.g. 10". IOW, programmer-wise the
implementation remain same - and all the limitations remain the same.
And all overhead of user-space locking remain the same. (*)

What's more, as having some limited experience of kernel programming,
I fail to see what threadlets would simplify on kernel side. End
result as I see it: user space becomes bit more complicated because of
dynamic multi-threading and kernel-space becomes also more complicated
because of the same added dynamism.

(*) Hm... On other side, if application would be able to tell kernel
to limit number of issued threadlets to N, then it might simplify the
job. Application can tell kernel "I need at most 10 blocking
threadlets, block me if there are more" and then dumbly throw I/O
threadlets at kernel as they are coming in. And kernel would then put
process to sleep if N+1 thredlets are blocking. That would definitely
simplify the job in user-space: it wouldn't need to call
pthread_join(). But it is still no replacement to poll()able file
descriptor or truly async mmap().

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Davide Libenzi wrote:

> Those *other* (tons?!?) interfaces can be created *when* the need comes 
> (see Linus signalfd [1] example to show how urgent that was). *When* 
> the need comes, they will work with existing POSIX interfaces, without 
> requiring your own just-another event interface. Those other interfaces 
> could also be more easily adopted by other Unix cousins, because of 
> the fact that they rely on existing POSIX interfaces. One of the reason 
> about the Unix file abstraction interfaces, is that you do *not* have to 
> plan and bloat interfaces before. As long as your new abstraction behave 
> in a file-fashion, it can be automatically used with existing interfaces. 
> And you create them *when* the need comes.

Now, if you don't mind, my spare time is really limited and I prefer to 
spend it looking at stuff the topic of this thread talks about.
Even because the whole epoll/kevent discussion is heavily dependent on the 
fact that syslets/threadlets will or will not result a viable method for 
generic AIO. Savvy?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

> > I was referring to dropping an event directly to a userspace buffer, from 
> > the poll callback. If pages are not there, you might sleep, and you can't 
> > since the wakeup function holds a spinlock on the waitqueue head while 
> > looping through the waiters to issue the wakeup. Also, you don't know from 
> > where the poll wakeup is called.
> 
> Ugh, no, that is very limited solution - memory must be either pinned
> (which leads to dos and limited ring buffer), or callback must sleep.
> Actually in any way there _must_ exist a queue - if ring buffer is full
> event is not allowed to be dropped - it must be stored in some other
> place, for example in queue from where entries will be read (copied)
> which ring buffer will have entries (that is how it is implemented in
> kevent at least).

I was not advocating for that, if you read carefully. The fact that epoll 
does not do that, should be a clear hint. The old /dev/epoll IIRC was only 
10% faster than the current epoll under an *heavy* event frequency 
micro-bench like pipetest (and that version of epoll did not have the 
single pass over the ready set optimization). And /dev/epoll was 
delivering events *directly* on userspace visible (mmaped) memory in a 
zero-copy fashion.




> > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
> > fd. Code remained there and nobody cared. Question: Was it because
> > 1) it had file bindings or 2) because nobody really cared to deliver 
> > signals to an event collector?
> > And *if* later requirements come, you don't need to change the API by 
> > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
> > XXEVENT-only submission structure. You create an API that automatically 
> > makes that new abstraction work with POSIX poll/select, and you get epoll 
> > support for free. Without even changing a bit in the epoll API.
> 
> Well, we get epoll support for free, but we need to create tons of other
> interfaces and infrastructure for kernel users, and we need to change 
> userspace anyway.

Those *other* (tons?!?) interfaces can be created *when* the need comes 
(see Linus signalfd [1] example to show how urgent that was). *When* 
the need comes, they will work with existing POSIX interfaces, without 
requiring your own just-another event interface. Those other interfaces 
could also be more easily adopted by other Unix cousins, because of 
the fact that they rely on existing POSIX interfaces. One of the reason 
about the Unix file abstraction interfaces, is that you do *not* have to 
plan and bloat interfaces before. As long as your new abstraction behave 
in a file-fashion, it can be automatically used with existing interfaces. 
And you create them *when* the need comes.




[1] That was like 100 lines of code or so. See here:

http://tinyurl.com/3yuna5



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ray Lee

On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with "fast
threads".


Please don't take this the wrong way, but I don't think you understand
the problem space that people are trying to address here.

Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.

Further, not only do servers not want to block, they also want to cram
a lot more requests into the kernel at once *for the kernel's
benefit*. In particular, a server wants to issue a ton of stats and
I/O in parallel so that the kernel can optimize which order to handle
the requests.

Finally, the biggest argument in favor of syslets/threadlets AFAICS is
that -- if done correctly -- it would unify the AIO and normal IO
paths in the kernel. The improved ease of long term maintenance on the
kernel (and more test coverage, and more directed optimization,
etc...) just for this point alone makes them worth considering for
inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really special cases of the entire package that the
rest of us want.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:
> 
> > > You've to excuse me if my memory is bad, but IIRC the whole discussion 
> > > and loong benchmark feast born with you throwing a benchmark at Ingo 
> > > (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> > > making any other point.
> > 
> > So, how does it sound?
> > "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
> > 
> > I said threadlets are bad for IO (and we agreed that both approaches
> > shouldbe usedfor the maximum performance) because of rescheduling overhead -
> > tasks are quite heavy structuresa to move around - even pt_regs copy
> > takes more than event structure, but not because there is something in other
> > galaxy which might work faster than another something in another galaxy.
> > That was stupid even to think about that.
> 
> Evgeny, other folks on this thread read what you said, so let's not drag 
> this over.
 
Sure, I was wrong to start this again, but try to get my position - I
really tired from trying to prove that I'm not a camel just because we
had some misunderstanding at the start.

I do think that threadlets are relly cool solution and are indeed very
good approach for majority of the parallel processing, but my point is
still that it is not a perfect solution for all tasks.

Just to draw a line: kevent example is extrapolation of what can be
achieved with event-driven model, but that does not mean that it must be
_only_ used for AIO model - threadlets _and_ event driven model (yes, I
accepted Ingo's point about its declining) is the best solution.
 
> > > And if you really feel raw about the single O(nready) loop that epoll
> > > currently does, a new epoll_wait2 (or whatever) API could be used to
> > > deliver the event directly into a userspace buffer [1], directly from the
> > > poll callback, w/out extra delivery loops 
> > > (IRQ/event->epoll_callback->event_buffer).
> >
> > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> > > mlocked userspace buffer, or some kernel pages mapped to userspace.
> > 
> > Callbacks never sleep - they add event into list just like current
> > implementation (maybe some lock must be changed from mutex to spinlock,
> > I do not rememeber), main problem is binding to the file structure,
> > which is heavy.
> 
> I was referring to dropping an event directly to a userspace buffer, from 
> the poll callback. If pages are not there, you might sleep, and you can't 
> since the wakeup function holds a spinlock on the waitqueue head while 
> looping through the waiters to issue the wakeup. Also, you don't know from 
> where the poll wakeup is called.

Ugh, no, that is very limited solution - memory must be either pinned
(which leads to dos and limited ring buffer), or callback must sleep.
Actually in any way there _must_ exist a queue - if ring buffer is full
event is not allowed to be dropped - it must be stored in some other
place, for example in queue from where entries will be read (copied)
which ring buffer will have entries (that is how it is implemented in
kevent at least).

> File binding heavy? The first, and by *far* biggest, source of events 
> inside an event collector, of someone that cares about scalability, are 
> sockets. And those are already files. Second would be AIO, and those (if 
> performance figures agrees) can be hosted inside syslets/threadlets.
> Then you fall into the no-care category, where the extra 100 bytes do not 
> make a case against the ability of using it with an existing POSIX 
> infrastructure (poll/select).

Well, sockets are the files indeed, and sockets already are perfectly
handled by epoll - but there are other users of petential interace - and
it must be designed to scale in _any_ situation very well.
Even if we right now do not have problems with some types of events, we
must scale with any new one.

> BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
> fd. Code remained there and nobody cared. Question: Was it because
> 1) it had file bindings or 2) because nobody really cared to deliver 
> signals to an event collector?
> And *if* later requirements come, you don't need to change the API by 
> adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
> XXEVENT-only submission structure. You create an API that automatically 
> makes that new abstraction work with POSIX poll/select, and you get epoll 
> support for free. Without even changing a bit in the epoll API.

Well, we get epoll support for free, but we need to create tons of other
interfaces and infrastructure for kernel users, and we need to change 
userspace anyway.
But epoll support requires to have quite heavy bindings to file
structure, so why don't we want to design new interface (since we need
to change userspace anyway) so that it could allow to scale and be very
memory 

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

> > You've to excuse me if my memory is bad, but IIRC the whole discussion 
> > and loong benchmark feast born with you throwing a benchmark at Ingo 
> > (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> > making any other point.
> 
> So, how does it sound?
> "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
> 
> I said threadlets are bad for IO (and we agreed that both approaches
> shouldbe usedfor the maximum performance) because of rescheduling overhead -
> tasks are quite heavy structuresa to move around - even pt_regs copy
> takes more than event structure, but not because there is something in other
> galaxy which might work faster than another something in another galaxy.
> That was stupid even to think about that.

Evgeny, other folks on this thread read what you said, so let's not drag 
this over.



> > And if you really feel raw about the single O(nready) loop that epoll
> > currently does, a new epoll_wait2 (or whatever) API could be used to
> > deliver the event directly into a userspace buffer [1], directly from the
> > poll callback, w/out extra delivery loops 
> > (IRQ/event->epoll_callback->event_buffer).
>
> > [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> > mlocked userspace buffer, or some kernel pages mapped to userspace.
> 
> Callbacks never sleep - they add event into list just like current
> implementation (maybe some lock must be changed from mutex to spinlock,
> I do not rememeber), main problem is binding to the file structure,
> which is heavy.

I was referring to dropping an event directly to a userspace buffer, from 
the poll callback. If pages are not there, you might sleep, and you can't 
since the wakeup function holds a spinlock on the waitqueue head while 
looping through the waiters to issue the wakeup. Also, you don't know from 
where the poll wakeup is called.
File binding heavy? The first, and by *far* biggest, source of events 
inside an event collector, of someone that cares about scalability, are 
sockets. And those are already files. Second would be AIO, and those (if 
performance figures agrees) can be hosted inside syslets/threadlets.
Then you fall into the no-care category, where the extra 100 bytes do not 
make a case against the ability of using it with an existing POSIX 
infrastructure (poll/select).
BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
fd. Code remained there and nobody cared. Question: Was it because
1) it had file bindings or 2) because nobody really cared to deliver 
signals to an event collector?
And *if* later requirements come, you don't need to change the API by 
adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
XXEVENT-only submission structure. You create an API that automatically 
makes that new abstraction work with POSIX poll/select, and you get epoll 
support for free. Without even changing a bit in the epoll API.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> >Threadlets can work with any functionas a base - if it would be
> >recv-like it will limit possible case for parallel programming, so you
> >can code anything in threadlets - it is not only about IO.
>
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.



I'm not talking about cache warm up. And if we do - and that the whole
freaking point of AIO - Linux IIRC pins freshly loaded clean pages
anyway. So there would be problem but only under memory pressure. If
you under memory pressure - you already lost the game and do not care
about performance/what threads you are using.

It is the whole "threadlets to threads on blocking" things doesn't
sound convincing. It sounds more like "premature optimization". But
anyway, not that I'm AIO specialist. For networking it is totally
unnecessary since most applications which care have already rate
control and buffer management built in. Network connections/sockets
allows greater level of application control on what and how they do.
Compared to blockdev's plain dumb read()/write() going through global
cache. And not that (judging from interface) AIO changes that much -
it is still dumb read() what IMHO makes no sense whatsoever to mmap()
oriented Linux.

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
> On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau 
> >([EMAIL PROTECTED]) wrote:
> >> I'm not well versed in modern kernel development discussions, and
> >> since you have put the thing into the networked context anyway, can
> >> you please ask on lkml why (if they want threadlets solely for AIO)
> >> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
> >> Developers already know the inteface, socket infrastructure is already
> >> in kernel, etc. And it might do precisely what they want: access file
> >> in disk cache - just like in case of socket it does access recv buffer
> >> of socket. Why bother with implicit threads/waiting/etc - if all they
> >> want some way to probe cache?
> >
> >Threadlets can work with any functionas a base - if it would be
> >recv-like it will limit possible case for parallel programming, so you
> >can code anything in threadlets - it is not only about IO.
> >
> 
> Ingo defined them as "plain function calls as long as they do not block".
> 
> But when/what function could block?
> 
> (1) File descriptors. Read. If data are in cache it wouldn't block.
> Otherwise it would. Write. If there is space in cache it wouldn't
> block. Otherwise it would.
> 
> (2) Network sockets. Recv. If data are in buffer they wouldn't block.
> Otherwise they would. Send. If there is space in send buffer it
> wouldn't block. Otherwise it would.
> 
> (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the
> reliable local communication used mostly for control information
> passing. If you have to block on such socket it most likely important
> information anyway. (e.g. X server communication or sql query to SQL
> server). (Or even less important here case of shell pipes.) And most
> users here are all single threaded and I/O bound: they would gain
> nothing from multi-threading - only PITA of added locking.
> 
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.
 
> -- 
> Don't walk behind me, I may not lead.
> Don't walk in front of me, I may not follow.
> Just walk beside me and be my friend.
>-- Albert Camus (attributed to)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
> I'm not well versed in modern kernel development discussions, and
> since you have put the thing into the networked context anyway, can
> you please ask on lkml why (if they want threadlets solely for AIO)
> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
> Developers already know the inteface, socket infrastructure is already
> in kernel, etc. And it might do precisely what they want: access file
> in disk cache - just like in case of socket it does access recv buffer
> of socket. Why bother with implicit threads/waiting/etc - if all they
> want some way to probe cache?

Threadlets can work with any functionas a base - if it would be
recv-like it will limit possible case for parallel programming, so you
can code anything in threadlets - it is not only about IO.



Ingo defined them as "plain function calls as long as they do not block".

But when/what function could block?

(1) File descriptors. Read. If data are in cache it wouldn't block.
Otherwise it would. Write. If there is space in cache it wouldn't
block. Otherwise it would.

(2) Network sockets. Recv. If data are in buffer they wouldn't block.
Otherwise they would. Send. If there is space in send buffer it
wouldn't block. Otherwise it would.

(3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the
reliable local communication used mostly for control information
passing. If you have to block on such socket it most likely important
information anyway. (e.g. X server communication or sql query to SQL
server). (Or even less important here case of shell pipes.) And most
users here are all single threaded and I/O bound: they would gain
nothing from multi-threading - only PITA of added locking.

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with "fast
threads".


--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
> 
> > do we really want to have per process signalfs, timerfs and so on - each 
> > simple structure must be bound to a file, which becomes too cost.
> 
> I may be old school, but if you ask me, and if you *really* want those 
> events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
> use them with *existing* POSIX poll/select. Remember, not every app 
> requires huge scalability efforts, so working with simpler and familiar 
> APIs is always welcome.
> The *only* thing that was not practical to have as fd, was block requests. 
> But maybe threadlets/syslets will handle those just fine, and close the gap.
 
That means that we bind very small object like timer or signal to the
whoe file structure - yes, as I stated - it is doable, but do we really
have to create a file each time create_timer() or signal() is called?
Signals as a filesystem are limited in that regard that we need to
create additional structures to have signal number<->private data
relations.
I designed kevent to be as small as possible, so I removed file binding
idea first. I do not say it is wrong or epoll (and threadlets) are broken 
(fsck, I hope people do understand that), but as is it can not handle
that scenario, so it must be extended and/or a lot of other stuff
written to be compatible with epoll design. Kevent has different design
(which allows to work with old one though - there is a patch to
implement epoll over kevent).
 
> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
> 
> > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
> > (davidel@xmailserver.org) wrote:
> > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> > > 
> > > > Ingo, do you really think I will send mails with faked benchmarks? :))
> > > 
> > > I don't think he ever implied that. He was only suggesting that when you 
> > > post benchmarks, and even more when you make claims based on benchmarks, 
> > > you need to be extra carefull about what you measure. Otherwise the 
> > > external view that you give to others does not look good.
> > > Kevent can be really faster than epoll, but if you post broken benchmarks 
> > > (that can be, unrealiable HTTP loaders, broken server implemenations, 
> > > etc..) and make claims based on that, the only effect that you have is to 
> > > lose your point.
> >  
> > So, I only talked that kevent is superior compared to epoll because (and
> > it is _main_ issue) of its ability to handle essentially any kind of
> > events with very small overhead (the same as epoll has in struct file -
> > list and spinlock) and without significant price of struct file binding
> > to event.
> 
> You've to excuse me if my memory is bad, but IIRC the whole discussion 
> and loong benchmark feast born with you throwing a benchmark at Ingo 
> (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> making any other point.

So, how does it sound?
"Threadlets are bad for IO because kevent is 2 times faster than epoll?"

I said threadlets are bad for IO (and we agreed that both approaches
shouldbe usedfor the maximum performance) because of rescheduling overhead -
tasks are quite heavy structuresa to move around - even pt_regs copy
takes more than event structure, but not because there is something in other
galaxy which might work faster than another something in another galaxy.
That was stupid even to think about that.

> As far as epoll not being able to handle other events. Said who? Of 
> course, with zero modifications, you can handle zero additional events. 
> With modifications, you can handle other events. But lets talk about those 
> other events. The *only* kind of event that ppl (and being the epoll 
> maintainer I tend to receive those requests) missed in epoll, was AIO 
> events, That's the *only* thing that was missed by real life application 
> developers. And if something like threadlets/syslets will prove effective, 
> the gap is closed WRT that requirement.
> Epoll handle already the whole class of pollable devices inside the 
> kernel, and if you exclude block AIO, that's a pretty wide class already. 
> The *existing* f_op->poll subsystem can be used to deliver events at the 
> poll-head wakeup time (by using the "key" member of the poll callback), so 
> that you don't even need the extra f_op->poll call to fetch events.
> And if you really feel raw about the single O(nready) loop that epoll 
> currently does, a new epoll_wait2 (or whatever) API could be used to 
> deliver the event directly into a userspace buffer [1], directly from the 
> poll callback, w/out extra delivery loops 
> (IRQ/event->epoll_callback->event_buffer).

Signals, futexes, timers and userspace events I was requested to add into 
kevent, so far only futexes are missed because I was asked to freeze
development so other hackers could check the project.

> 
> [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> mlocked userspace buffer, or some kernel pages mapped to userspace.

Callbacks never sleep - they add event into list just like current
implementation (maybe some lock must be changed from mutex to spinlock,
I do not rememeber), main problem is binding to the file structure,
which is heavy.

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Ingo Molnar wrote:

> * Davide Libenzi  wrote:
> 
> > [...] Status word and control bits should not be changed from 
> > underneath userspace AFAIK. [...]
> 
> Note that the control bits do not just magically change during normal 
> FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
> to change those per-thread anyway. This is a non-issue anyway - what is 
> important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
> callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
> unlazy anything or to do expensive FPU state saves or other FPU juggling 
> around threadlet (or even syslet) use.

Well, the unlazy/sync happen in any case later when we switch (given 
TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. 
Wouldn't it makes sense to carry over only the status word and the control 
bits eventually?
Also, if the caller saves the whole context, and if we're scheduled while 
inside a system call (not totally unfrequent case), can't we implement a 
smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after 
schedule-in (do not do stts on this condition, so the newly scheduled 
task don't get a fault at all)? If the above conditions are true (no need 
context-copy for new head in async_exec), this should be possible too.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Ingo Molnar wrote:

 * Davide Libenzi davidel@xmailserver.org wrote:
 
  [...] Status word and control bits should not be changed from 
  underneath userspace AFAIK. [...]
 
 Note that the control bits do not just magically change during normal 
 FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
 to change those per-thread anyway. This is a non-issue anyway - what is 
 important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
 callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
 unlazy anything or to do expensive FPU state saves or other FPU juggling 
 around threadlet (or even syslet) use.

Well, the unlazy/sync happen in any case later when we switch (given 
TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. 
Wouldn't it makes sense to carry over only the status word and the control 
bits eventually?
Also, if the caller saves the whole context, and if we're scheduled while 
inside a system call (not totally unfrequent case), can't we implement a 
smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after 
schedule-in (do not do stts on this condition, so the newly scheduled 
task don't get a fault at all)? If the above conditions are true (no need 
context-copy for new head in async_exec), this should be possible too.


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
 On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
 
  On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
  (davidel@xmailserver.org) wrote:
   On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
   
Ingo, do you really think I will send mails with faked benchmarks? :))
   
   I don't think he ever implied that. He was only suggesting that when you 
   post benchmarks, and even more when you make claims based on benchmarks, 
   you need to be extra carefull about what you measure. Otherwise the 
   external view that you give to others does not look good.
   Kevent can be really faster than epoll, but if you post broken benchmarks 
   (that can be, unrealiable HTTP loaders, broken server implemenations, 
   etc..) and make claims based on that, the only effect that you have is to 
   lose your point.
   
  So, I only talked that kevent is superior compared to epoll because (and
  it is _main_ issue) of its ability to handle essentially any kind of
  events with very small overhead (the same as epoll has in struct file -
  list and spinlock) and without significant price of struct file binding
  to event.
 
 You've to excuse me if my memory is bad, but IIRC the whole discussion 
 and loong benchmark feast born with you throwing a benchmark at Ingo 
 (with kevent showing a 1.9x performance boost WRT epoll), not with you 
 making any other point.

So, how does it sound?
Threadlets are bad for IO because kevent is 2 times faster than epoll?

I said threadlets are bad for IO (and we agreed that both approaches
shouldbe usedfor the maximum performance) because of rescheduling overhead -
tasks are quite heavy structuresa to move around - even pt_regs copy
takes more than event structure, but not because there is something in other
galaxy which might work faster than another something in another galaxy.
That was stupid even to think about that.

 As far as epoll not being able to handle other events. Said who? Of 
 course, with zero modifications, you can handle zero additional events. 
 With modifications, you can handle other events. But lets talk about those 
 other events. The *only* kind of event that ppl (and being the epoll 
 maintainer I tend to receive those requests) missed in epoll, was AIO 
 events, That's the *only* thing that was missed by real life application 
 developers. And if something like threadlets/syslets will prove effective, 
 the gap is closed WRT that requirement.
 Epoll handle already the whole class of pollable devices inside the 
 kernel, and if you exclude block AIO, that's a pretty wide class already. 
 The *existing* f_op-poll subsystem can be used to deliver events at the 
 poll-head wakeup time (by using the key member of the poll callback), so 
 that you don't even need the extra f_op-poll call to fetch events.
 And if you really feel raw about the single O(nready) loop that epoll 
 currently does, a new epoll_wait2 (or whatever) API could be used to 
 deliver the event directly into a userspace buffer [1], directly from the 
 poll callback, w/out extra delivery loops 
 (IRQ/event-epoll_callback-event_buffer).

Signals, futexes, timers and userspace events I was requested to add into 
kevent, so far only futexes are missed because I was asked to freeze
development so other hackers could check the project.

 
 [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
 mlocked userspace buffer, or some kernel pages mapped to userspace.

Callbacks never sleep - they add event into list just like current
implementation (maybe some lock must be changed from mutex to spinlock,
I do not rememeber), main problem is binding to the file structure,
which is heavy.

 - Davide
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
 On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
 
  do we really want to have per process signalfs, timerfs and so on - each 
  simple structure must be bound to a file, which becomes too cost.
 
 I may be old school, but if you ask me, and if you *really* want those 
 events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
 use them with *existing* POSIX poll/select. Remember, not every app 
 requires huge scalability efforts, so working with simpler and familiar 
 APIs is always welcome.
 The *only* thing that was not practical to have as fd, was block requests. 
 But maybe threadlets/syslets will handle those just fine, and close the gap.
 
That means that we bind very small object like timer or signal to the
whoe file structure - yes, as I stated - it is doable, but do we really
have to create a file each time create_timer() or signal() is called?
Signals as a filesystem are limited in that regard that we need to
create additional structures to have signal number-private data
relations.
I designed kevent to be as small as possible, so I removed file binding
idea first. I do not say it is wrong or epoll (and threadlets) are broken 
(fsck, I hope people do understand that), but as is it can not handle
that scenario, so it must be extended and/or a lot of other stuff
written to be compatible with epoll design. Kevent has different design
(which allows to work with old one though - there is a patch to
implement epoll over kevent).
 
 - Davide
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:

On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
 I'm not well versed in modern kernel development discussions, and
 since you have put the thing into the networked context anyway, can
 you please ask on lkml why (if they want threadlets solely for AIO)
 not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
 Developers already know the inteface, socket infrastructure is already
 in kernel, etc. And it might do precisely what they want: access file
 in disk cache - just like in case of socket it does access recv buffer
 of socket. Why bother with implicit threads/waiting/etc - if all they
 want some way to probe cache?

Threadlets can work with any functionas a base - if it would be
recv-like it will limit possible case for parallel programming, so you
can code anything in threadlets - it is not only about IO.



Ingo defined them as plain function calls as long as they do not block.

But when/what function could block?

(1) File descriptors. Read. If data are in cache it wouldn't block.
Otherwise it would. Write. If there is space in cache it wouldn't
block. Otherwise it would.

(2) Network sockets. Recv. If data are in buffer they wouldn't block.
Otherwise they would. Send. If there is space in send buffer it
wouldn't block. Otherwise it would.

(3) Pipes, fifos  unix sockets. Unfortunately gain nothing since the
reliable local communication used mostly for control information
passing. If you have to block on such socket it most likely important
information anyway. (e.g. X server communication or sql query to SQL
server). (Or even less important here case of shell pipes.) And most
users here are all single threaded and I/O bound: they would gain
nothing from multi-threading - only PITA of added locking.

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with fast
threads.


--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
 On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau 
 ([EMAIL PROTECTED]) wrote:
  I'm not well versed in modern kernel development discussions, and
  since you have put the thing into the networked context anyway, can
  you please ask on lkml why (if they want threadlets solely for AIO)
  not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
  Developers already know the inteface, socket infrastructure is already
  in kernel, etc. And it might do precisely what they want: access file
  in disk cache - just like in case of socket it does access recv buffer
  of socket. Why bother with implicit threads/waiting/etc - if all they
  want some way to probe cache?
 
 Threadlets can work with any functionas a base - if it would be
 recv-like it will limit possible case for parallel programming, so you
 can code anything in threadlets - it is not only about IO.
 
 
 Ingo defined them as plain function calls as long as they do not block.
 
 But when/what function could block?
 
 (1) File descriptors. Read. If data are in cache it wouldn't block.
 Otherwise it would. Write. If there is space in cache it wouldn't
 block. Otherwise it would.
 
 (2) Network sockets. Recv. If data are in buffer they wouldn't block.
 Otherwise they would. Send. If there is space in send buffer it
 wouldn't block. Otherwise it would.
 
 (3) Pipes, fifos  unix sockets. Unfortunately gain nothing since the
 reliable local communication used mostly for control information
 passing. If you have to block on such socket it most likely important
 information anyway. (e.g. X server communication or sql query to SQL
 server). (Or even less important here case of shell pipes.) And most
 users here are all single threaded and I/O bound: they would gain
 nothing from multi-threading - only PITA of added locking.
 
 What I'm trying to get to: keep things simple. The proposed
 optimization by Ingo does nothing else but allowing AIO to probe file
 cache - if data there to go with fast path. So why not to implement
 what the people want - probing of cache? Because it sounds bad? But
 they are in fact proposing precisely that just masked with fast
 threads.

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.
 
 -- 
 Don't walk behind me, I may not lead.
 Don't walk in front of me, I may not follow.
 Just walk beside me and be my friend.
-- Albert Camus (attributed to)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 Threadlets can work with any functionas a base - if it would be
 recv-like it will limit possible case for parallel programming, so you
 can code anything in threadlets - it is not only about IO.

 What I'm trying to get to: keep things simple. The proposed
 optimization by Ingo does nothing else but allowing AIO to probe file
 cache - if data there to go with fast path. So why not to implement
 what the people want - probing of cache? Because it sounds bad? But
 they are in fact proposing precisely that just masked with fast
 threads.

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.



I'm not talking about cache warm up. And if we do - and that the whole
freaking point of AIO - Linux IIRC pins freshly loaded clean pages
anyway. So there would be problem but only under memory pressure. If
you under memory pressure - you already lost the game and do not care
about performance/what threads you are using.

It is the whole threadlets to threads on blocking things doesn't
sound convincing. It sounds more like premature optimization. But
anyway, not that I'm AIO specialist. For networking it is totally
unnecessary since most applications which care have already rate
control and buffer management built in. Network connections/sockets
allows greater level of application control on what and how they do.
Compared to blockdev's plain dumb read()/write() going through global
cache. And not that (judging from interface) AIO changes that much -
it is still dumb read() what IMHO makes no sense whatsoever to mmap()
oriented Linux.

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

  You've to excuse me if my memory is bad, but IIRC the whole discussion 
  and loong benchmark feast born with you throwing a benchmark at Ingo 
  (with kevent showing a 1.9x performance boost WRT epoll), not with you 
  making any other point.
 
 So, how does it sound?
 Threadlets are bad for IO because kevent is 2 times faster than epoll?
 
 I said threadlets are bad for IO (and we agreed that both approaches
 shouldbe usedfor the maximum performance) because of rescheduling overhead -
 tasks are quite heavy structuresa to move around - even pt_regs copy
 takes more than event structure, but not because there is something in other
 galaxy which might work faster than another something in another galaxy.
 That was stupid even to think about that.

Evgeny, other folks on this thread read what you said, so let's not drag 
this over.



  And if you really feel raw about the single O(nready) loop that epoll
  currently does, a new epoll_wait2 (or whatever) API could be used to
  deliver the event directly into a userspace buffer [1], directly from the
  poll callback, w/out extra delivery loops 
  (IRQ/event-epoll_callback-event_buffer).

  [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
  mlocked userspace buffer, or some kernel pages mapped to userspace.
 
 Callbacks never sleep - they add event into list just like current
 implementation (maybe some lock must be changed from mutex to spinlock,
 I do not rememeber), main problem is binding to the file structure,
 which is heavy.

I was referring to dropping an event directly to a userspace buffer, from 
the poll callback. If pages are not there, you might sleep, and you can't 
since the wakeup function holds a spinlock on the waitqueue head while 
looping through the waiters to issue the wakeup. Also, you don't know from 
where the poll wakeup is called.
File binding heavy? The first, and by *far* biggest, source of events 
inside an event collector, of someone that cares about scalability, are 
sockets. And those are already files. Second would be AIO, and those (if 
performance figures agrees) can be hosted inside syslets/threadlets.
Then you fall into the no-care category, where the extra 100 bytes do not 
make a case against the ability of using it with an existing POSIX 
infrastructure (poll/select).
BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
fd. Code remained there and nobody cared. Question: Was it because
1) it had file bindings or 2) because nobody really cared to deliver 
signals to an event collector?
And *if* later requirements come, you don't need to change the API by 
adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
XXEVENT-only submission structure. You create an API that automatically 
makes that new abstraction work with POSIX poll/select, and you get epoll 
support for free. Without even changing a bit in the epoll API.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov
On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
 On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:
 
   You've to excuse me if my memory is bad, but IIRC the whole discussion 
   and loong benchmark feast born with you throwing a benchmark at Ingo 
   (with kevent showing a 1.9x performance boost WRT epoll), not with you 
   making any other point.
  
  So, how does it sound?
  Threadlets are bad for IO because kevent is 2 times faster than epoll?
  
  I said threadlets are bad for IO (and we agreed that both approaches
  shouldbe usedfor the maximum performance) because of rescheduling overhead -
  tasks are quite heavy structuresa to move around - even pt_regs copy
  takes more than event structure, but not because there is something in other
  galaxy which might work faster than another something in another galaxy.
  That was stupid even to think about that.
 
 Evgeny, other folks on this thread read what you said, so let's not drag 
 this over.
 
Sure, I was wrong to start this again, but try to get my position - I
really tired from trying to prove that I'm not a camel just because we
had some misunderstanding at the start.

I do think that threadlets are relly cool solution and are indeed very
good approach for majority of the parallel processing, but my point is
still that it is not a perfect solution for all tasks.

Just to draw a line: kevent example is extrapolation of what can be
achieved with event-driven model, but that does not mean that it must be
_only_ used for AIO model - threadlets _and_ event driven model (yes, I
accepted Ingo's point about its declining) is the best solution.
 
   And if you really feel raw about the single O(nready) loop that epoll
   currently does, a new epoll_wait2 (or whatever) API could be used to
   deliver the event directly into a userspace buffer [1], directly from the
   poll callback, w/out extra delivery loops 
   (IRQ/event-epoll_callback-event_buffer).
 
   [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
   mlocked userspace buffer, or some kernel pages mapped to userspace.
  
  Callbacks never sleep - they add event into list just like current
  implementation (maybe some lock must be changed from mutex to spinlock,
  I do not rememeber), main problem is binding to the file structure,
  which is heavy.
 
 I was referring to dropping an event directly to a userspace buffer, from 
 the poll callback. If pages are not there, you might sleep, and you can't 
 since the wakeup function holds a spinlock on the waitqueue head while 
 looping through the waiters to issue the wakeup. Also, you don't know from 
 where the poll wakeup is called.

Ugh, no, that is very limited solution - memory must be either pinned
(which leads to dos and limited ring buffer), or callback must sleep.
Actually in any way there _must_ exist a queue - if ring buffer is full
event is not allowed to be dropped - it must be stored in some other
place, for example in queue from where entries will be read (copied)
which ring buffer will have entries (that is how it is implemented in
kevent at least).

 File binding heavy? The first, and by *far* biggest, source of events 
 inside an event collector, of someone that cares about scalability, are 
 sockets. And those are already files. Second would be AIO, and those (if 
 performance figures agrees) can be hosted inside syslets/threadlets.
 Then you fall into the no-care category, where the extra 100 bytes do not 
 make a case against the ability of using it with an existing POSIX 
 infrastructure (poll/select).

Well, sockets are the files indeed, and sockets already are perfectly
handled by epoll - but there are other users of petential interace - and
it must be designed to scale in _any_ situation very well.
Even if we right now do not have problems with some types of events, we
must scale with any new one.

 BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
 fd. Code remained there and nobody cared. Question: Was it because
 1) it had file bindings or 2) because nobody really cared to deliver 
 signals to an event collector?
 And *if* later requirements come, you don't need to change the API by 
 adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
 XXEVENT-only submission structure. You create an API that automatically 
 makes that new abstraction work with POSIX poll/select, and you get epoll 
 support for free. Without even changing a bit in the epoll API.

Well, we get epoll support for free, but we need to create tons of other
interfaces and infrastructure for kernel users, and we need to change 
userspace anyway.
But epoll support requires to have quite heavy bindings to file
structure, so why don't we want to design new interface (since we need
to change userspace anyway) so that it could allow to scale and be very
memory optimized from the beginning?

 - Davide
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this 

Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Ray Lee

On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote:

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with fast
threads.


Please don't take this the wrong way, but I don't think you understand
the problem space that people are trying to address here.

Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.

Further, not only do servers not want to block, they also want to cram
a lot more requests into the kernel at once *for the kernel's
benefit*. In particular, a server wants to issue a ton of stats and
I/O in parallel so that the kernel can optimize which order to handle
the requests.

Finally, the biggest argument in favor of syslets/threadlets AFAICS is
that -- if done correctly -- it would unify the AIO and normal IO
paths in the kernel. The improved ease of long term maintenance on the
kernel (and more test coverage, and more directed optimization,
etc...) just for this point alone makes them worth considering for
inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really special cases of the entire package that the
rest of us want.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

  I was referring to dropping an event directly to a userspace buffer, from 
  the poll callback. If pages are not there, you might sleep, and you can't 
  since the wakeup function holds a spinlock on the waitqueue head while 
  looping through the waiters to issue the wakeup. Also, you don't know from 
  where the poll wakeup is called.
 
 Ugh, no, that is very limited solution - memory must be either pinned
 (which leads to dos and limited ring buffer), or callback must sleep.
 Actually in any way there _must_ exist a queue - if ring buffer is full
 event is not allowed to be dropped - it must be stored in some other
 place, for example in queue from where entries will be read (copied)
 which ring buffer will have entries (that is how it is implemented in
 kevent at least).

I was not advocating for that, if you read carefully. The fact that epoll 
does not do that, should be a clear hint. The old /dev/epoll IIRC was only 
10% faster than the current epoll under an *heavy* event frequency 
micro-bench like pipetest (and that version of epoll did not have the 
single pass over the ready set optimization). And /dev/epoll was 
delivering events *directly* on userspace visible (mmaped) memory in a 
zero-copy fashion.




  BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
  fd. Code remained there and nobody cared. Question: Was it because
  1) it had file bindings or 2) because nobody really cared to deliver 
  signals to an event collector?
  And *if* later requirements come, you don't need to change the API by 
  adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
  XXEVENT-only submission structure. You create an API that automatically 
  makes that new abstraction work with POSIX poll/select, and you get epoll 
  support for free. Without even changing a bit in the epoll API.
 
 Well, we get epoll support for free, but we need to create tons of other
 interfaces and infrastructure for kernel users, and we need to change 
 userspace anyway.

Those *other* (tons?!?) interfaces can be created *when* the need comes 
(see Linus signalfd [1] example to show how urgent that was). *When* 
the need comes, they will work with existing POSIX interfaces, without 
requiring your own just-another event interface. Those other interfaces 
could also be more easily adopted by other Unix cousins, because of 
the fact that they rely on existing POSIX interfaces. One of the reason 
about the Unix file abstraction interfaces, is that you do *not* have to 
plan and bloat interfaces before. As long as your new abstraction behave 
in a file-fashion, it can be automatically used with existing interfaces. 
And you create them *when* the need comes.




[1] That was like 100 lines of code or so. See here:

http://tinyurl.com/3yuna5



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Davide Libenzi
On Sat, 3 Mar 2007, Davide Libenzi wrote:

 Those *other* (tons?!?) interfaces can be created *when* the need comes 
 (see Linus signalfd [1] example to show how urgent that was). *When* 
 the need comes, they will work with existing POSIX interfaces, without 
 requiring your own just-another event interface. Those other interfaces 
 could also be more easily adopted by other Unix cousins, because of 
 the fact that they rely on existing POSIX interfaces. One of the reason 
 about the Unix file abstraction interfaces, is that you do *not* have to 
 plan and bloat interfaces before. As long as your new abstraction behave 
 in a file-fashion, it can be automatically used with existing interfaces. 
 And you create them *when* the need comes.

Now, if you don't mind, my spare time is really limited and I prefer to 
spend it looking at stuff the topic of this thread talks about.
Even because the whole epoll/kevent discussion is heavily dependent on the 
fact that syslets/threadlets will or will not result a viable method for 
generic AIO. Savvy?



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Ray Lee [EMAIL PROTECTED] wrote:

On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote:
 What I'm trying to get to: keep things simple. The proposed
 optimization by Ingo does nothing else but allowing AIO to probe file
 cache - if data there to go with fast path. So why not to implement
 what the people want - probing of cache? Because it sounds bad? But
 they are in fact proposing precisely that just masked with fast
 threads.


Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.



Why select/poll/epoll/friends do not work? I have programmed on both
sides - user-space network servers and in-kernel network protocols -
and never blocking thing was implemented in *nix in the times I was
walking under table.

One can poll() more or less *any* device in system. With frigging
exception of - right - files. IOW for 75% of I/O problem doesn't
exists since there is proper interface - e.g. sockets - in place.

User-space-wise, check how squid (caching http proxy) does it: you
have several (forked) instances to serve network requests and you have
one/several disk I/O daemons. (So called diskd storeio) Why? Because
you cannot poll() file descriptors, but you can poll unix socket
connected to diskd. If diskd blocks, squid still can serve requests.
How threadlets are better then pool of diskd instances? All nastiness
of shared memory set loose...

What I'm trying to get to. Threadlets wouldn't help existing
single-threaded applications - what is about 95% of all applications.
And multi-threaded applications would gain little because few real
application create threads dynamically: creation need resources and
can fail, uncontrollable thread spawning hurts overall manageability
and additional care is needed regarding deadlocks/lock contentions
proofing. (The category of applications which want the performance
gain are also the applications which need to ensure greater stability
over long non-stop runs. Uncontrollable dynamism helps nothing.)

Having implemented several file servers - daemons serving file I/O
to other daemons - I honestly hardly see any improvements. Now people
configure such file servers to issue e.g. 10 file operations
simultaneously - using pool of 10 threads. What threadlets change? In
the end just to keep in check with threadlets I would need to issue
pthread_join() after some number of threadlets created. And the latter
number is the former e.g. 10. IOW, programmer-wise the
implementation remain same - and all the limitations remain the same.
And all overhead of user-space locking remain the same. (*)

What's more, as having some limited experience of kernel programming,
I fail to see what threadlets would simplify on kernel side. End
result as I see it: user space becomes bit more complicated because of
dynamic multi-threading and kernel-space becomes also more complicated
because of the same added dynamism.

(*) Hm... On other side, if application would be able to tell kernel
to limit number of issued threadlets to N, then it might simplify the
job. Application can tell kernel I need at most 10 blocking
threadlets, block me if there are more and then dumbly throw I/O
threadlets at kernel as they are coming in. And kernel would then put
process to sleep if N+1 thredlets are blocking. That would definitely
simplify the job in user-space: it wouldn't need to call
pthread_join(). But it is still no replacement to poll()able file
descriptor or truly async mmap().

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-03 Thread Ray Lee
Ihar `Philips` Filipau wrote:
 On 3/3/07, Ray Lee [EMAIL PROTECTED] wrote:
 On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote:
  What I'm trying to get to: keep things simple. The proposed
  optimization by Ingo does nothing else but allowing AIO to probe file
  cache - if data there to go with fast path. So why not to implement
  what the people want - probing of cache? Because it sounds bad? But
  they are in fact proposing precisely that just masked with fast
  threads.


 Servers want to never, ever block. Not on a socket, not on a stat, not
 on anything. (I have an embedded server I wrote that has to fork
 internally just to watch the damn serial port signals in parallel with
 handling network I/O, audio, and child processes that handle H323.)
 There's a lot of things that can block out there, and it's not just
 disk I/O.

 
 Why select/poll/epoll/friends do not work? I have programmed on both
 sides - user-space network servers and in-kernel network protocols -
 and never blocking thing was implemented in *nix in the times I was
 walking under table.
 

Then you've never had to write something that watches serial port
signals. Google on TIOCMIWAIT to see what I'm talking about. The only
option for a userspace programmer to deal with that is to fork() or poll
the signals every so many milliseconds. There are probably more easy
examples, but that's the one off the top of my head that affected me.

In short, this isn't just about network IO, this isn't just about file IO.

 One can poll() more or less *any* device in system. With frigging
 exception of - right - files.

The problem is the more or less. Say you're right, and 95% of the
system calls are either already asynchronous or non-blocking/poll()able.
One of the questions on the table is how to extend it to the last 5%.

 User-space-wise, check how squid (caching http proxy) does it: you
 have several (forked) instances to serve network requests and you have
 one/several disk I/O daemons. (So called diskd storeio) Why? Because
 you cannot poll() file descriptors, but you can poll unix socket
 connected to diskd. If diskd blocks, squid still can serve requests.
 How threadlets are better then pool of diskd instances? All nastiness
 of shared memory set loose...

Samba/lighttpd/git want to issue dozens of stats in parallel so that the
kernel can have an opportunity to sort them better. Are you saying they
should fork() a process per stat that they want to issue in parallel?

 What I'm trying to get to. Threadlets wouldn't help existing
 single-threaded applications - what is about 95% of all applications.

Eh, I don't think that's right. Part of the reason threadlets and
syslets are on the table because it may be a more efficient way to do
AIO. And the differences between the syslet API and the current kernel
Async IO API can be abstracted away by glibc, so that today's apps that
do AIO would immediately benefit.

 What's more, as having some limited experience of kernel programming,
 I fail to see what threadlets would simplify on kernel side.

You can yank the entire separate AIO path, and just treat them as
another blocking API that syslets makes nonblocking. Immediate reduction
of code, and everybody is now using the same code paths, which means
higher test coverage and reduced maintenance cost.

This last point is really important. Even if no extra functionality
eventually makes it to userspace, this last point would still be enough
to make the powers that be consider inclusion.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> Note that the control bits do not just magically change during normal 
> FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
> to change those per-thread anyway. This is a non-issue anyway - what is 
> important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
> callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
 ^ caller-saved
> unlazy anything or to do expensive FPU state saves or other FPU juggling 
> around threadlet (or even syslet) use.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> [...] Status word and control bits should not be changed from 
> underneath userspace AFAIK. [...]

Note that the control bits do not just magically change during normal 
FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
to change those per-thread anyway. This is a non-issue anyway - what is 
important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
unlazy anything or to do expensive FPU state saves or other FPU juggling 
around threadlet (or even syslet) use.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Nicholas Miell wrote:

> On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
> > On Fri, 2 Mar 2007, Nicholas Miell wrote:
> > 
> > > The point Ingo was making is that the x86 ABI already requires the FPU
> > > context to be saved before *all* function calls.
> > 
> > I've not seen that among Ingo's points, but yeah some status is caller 
> > saved. But, aren't things like status word and control bits callee saved? 
> > If that's the case, it might require proper handling.
> > 
> 
> Ingo mentioned it in one of the parts you cut out of your reply:
> 
> > and here is where thinking about threadlets as a function call and not 
> > as an asynchronous context helps alot: the classic gcc convention for 
> > FPU use & function calls should apply: gcc does not call an external 
> > function with an in-use FPU stack/register, it always neatly unuses it, 
> > as no FPU register is callee-saved, all are caller-saved.
> 
> The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
> XMM or MXCSR registers) and a bit vague (no mention at all of the FP
> status word), but I'm fairly certain that Ingo is right.

I'm not sure if that's the case. I'd be happy if it was, but I'm afraid 
it's not. Status word and control bits should not be changed from 
underneath userspace AFAIK. The ABI I remember tells me that those are 
callee saved. A quick gcc asm test tells me that too.
And assuming that's the case, why don't we have a smarter unlazy_fpu() 
then, that avoid FPU context sync if we're scheduled while inside a 
syscall (this is no different than an enter inside sys_async_exec - 
userspace should have taken care of it)?
IMO a syscall enter should not assume that userspace took care of saving 
the whole FPU context.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Benjamin LaHaise
On Fri, Mar 02, 2007 at 05:36:01PM -0800, Nicholas Miell wrote:
> > as an asynchronous context helps alot: the classic gcc convention for 
> > FPU use & function calls should apply: gcc does not call an external 
> > function with an in-use FPU stack/register, it always neatly unuses it, 
> > as no FPU register is callee-saved, all are caller-saved.
> 
> The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
> XMM or MXCSR registers) and a bit vague (no mention at all of the FP
> status word), but I'm fairly certain that Ingo is right.

The FPU status word *must* be saved, as the rounding behaviour and error mode 
bits are assumed to be preserved.  Iow, yes, there is state which is required.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell
On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Nicholas Miell wrote:
> 
> > The point Ingo was making is that the x86 ABI already requires the FPU
> > context to be saved before *all* function calls.
> 
> I've not seen that among Ingo's points, but yeah some status is caller 
> saved. But, aren't things like status word and control bits callee saved? 
> If that's the case, it might require proper handling.
> 

Ingo mentioned it in one of the parts you cut out of your reply:

> and here is where thinking about threadlets as a function call and not 
> as an asynchronous context helps alot: the classic gcc convention for 
> FPU use & function calls should apply: gcc does not call an external 
> function with an in-use FPU stack/register, it always neatly unuses it, 
> as no FPU register is callee-saved, all are caller-saved.

The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
XMM or MXCSR registers) and a bit vague (no mention at all of the FP
status word), but I'm fairly certain that Ingo is right.


-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Nicholas Miell wrote:

> The point Ingo was making is that the x86 ABI already requires the FPU
> context to be saved before *all* function calls.

I've not seen that among Ingo's points, but yeah some status is caller 
saved. But, aren't things like status word and control bits callee saved? 
If that's the case, it might require proper handling.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell
On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Ingo Molnar wrote:
> 
> > 
> > * Davide Libenzi  wrote:
> > 
> > > I think that the "dirty" FPU context must, at least, follow the new 
> > > head. That's what the userspace sees, and you don't want an async_exec 
> > > to re-emerge with a different FPU context.
> > 
> > well. I think there's some confusion about terminology, so please let me 
> > describe everything in detail. This is how execution goes:
> > 
> >   outer loop() {
> >   call_threadlet();
> >   }
> > 
> > this all runs in the 'head' context. call_threadlet() always switches to 
> > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
> > while executing the threadlet function, we block, then the 
> > threadlet-thread gets to keep the task (the threadlet stack and also the 
> > FPU), and blocks - and we pick a 'new head' from the thread pool and 
> > continue executing in that context - right after the call_threadlet() 
> > function, in the 'old' head's stack. I.e. it's as if we returned 
> > immediately from call_threadlet(), with a return code that signals that 
> > the 'threadlet went async'.
> > 
> > now, the FPU state that was when the threadlet blocked is totally 
> > meaningless to the 'new head' - that FPU state is from the middle of the 
> > threadlet execution.
> 
> For threadlets, it might be. Now think about a task wanting to dispatch N 
> parallel AIO requests as N independent syslets.
> Think about this task having USEDFPU set, so the FPU context is dirty.
> When it returns from async_exec, with one of the requests being become 
> sleepy, it needs to have the same FPU context it had when it entered, 
> otherwise it won't prolly be happy.
> For the same reason a schedule() must preserve/sync the "prev" FPU 
> context, to be reloaded at the next FPU fault.

The point Ingo was making is that the x86 ABI already requires the FPU
context to be saved before *all* function calls.

Unfortunately, this isn't true of other ABIs -- looking over the psABIs
specs I have laying around, AMD64, PPC64, and MIPS require at least part
of the FPU state to be preserved across function calls, and I'm sure
this is also true of others.

Then there's the other nasty details of new thread creation --
thankfully, the contents of the TLS isn't inherited from the parent
thread, but it still needs to be initialized; not to mention all the
other details involved in pthread creation and destruction.

I don't see any way around the pthread issues other than making a libc
upcall on return from the first system call that blocked.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Michael K. Edwards

On 3/2/07, Davide Libenzi  wrote:

For threadlets, it might be. Now think about a task wanting to dispatch N
parallel AIO requests as N independent syslets.
Think about this task having USEDFPU set, so the FPU context is dirty.
When it returns from async_exec, with one of the requests being become
sleepy, it needs to have the same FPU context it had when it entered,
otherwise it won't prolly be happy.
For the same reason a schedule() must preserve/sync the "prev" FPU
context, to be reloaded at the next FPU fault.


And if you actually think this through, I think you will arrive at (a
subset of) the conclusions I did a week ago: to keep the threadlets
lightweight enough to schedule and migrate cheaply, they can't be
allowed to "own" their own FPU and TLS context.  They have to be
allowed to _use_ the FPU (or they're useless) and to _use_ TLS (or
they can't use any glibc wrapper around a syscall, since they
practically all set the thread-local errno).  But they have to
"quiesce" the FPU and stash any thread-local state they want to keep
on their stack before entering the next syscall, or else it'll get
clobbered.

Keep thinking, especially about FPU flags, and you'll see why
threadlets spawned from the _same_ threadlet entrypoint should all run
in the same pool of threads, one per CPU, while threadlets from
_different_ entrypoints should never run in the same thread (FPU/TLS
context).  You'll see why threadlets in the same pool shouldn't be
permitted to preempt one another except at syscalls that block, and
the cost of preempting the real thread associated with one threadlet
pool with another real thread associated with a different threadlet
pool is the same as any other thread switch.  At which point,
threadlet pools are themselves first-class objects (to use the snake
oil phrase), and might as well be enhanced to a data structure that
has efficient operations for reprioritization, bulk cancellation, and
all that jazz.

Did I mention that there is actually quite a bit of prior art in this
area, which makes a much better guide to the design of round wheels
than micro-benchmarks do?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > I think that the "dirty" FPU context must, at least, follow the new 
> > head. That's what the userspace sees, and you don't want an async_exec 
> > to re-emerge with a different FPU context.
> 
> well. I think there's some confusion about terminology, so please let me 
> describe everything in detail. This is how execution goes:
> 
>   outer loop() {
>   call_threadlet();
>   }
> 
> this all runs in the 'head' context. call_threadlet() always switches to 
> the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
> while executing the threadlet function, we block, then the 
> threadlet-thread gets to keep the task (the threadlet stack and also the 
> FPU), and blocks - and we pick a 'new head' from the thread pool and 
> continue executing in that context - right after the call_threadlet() 
> function, in the 'old' head's stack. I.e. it's as if we returned 
> immediately from call_threadlet(), with a return code that signals that 
> the 'threadlet went async'.
> 
> now, the FPU state that was when the threadlet blocked is totally 
> meaningless to the 'new head' - that FPU state is from the middle of the 
> threadlet execution.

For threadlets, it might be. Now think about a task wanting to dispatch N 
parallel AIO requests as N independent syslets.
Think about this task having USEDFPU set, so the FPU context is dirty.
When it returns from async_exec, with one of the requests being become 
sleepy, it needs to have the same FPU context it had when it entered, 
otherwise it won't prolly be happy.
For the same reason a schedule() must preserve/sync the "prev" FPU 
context, to be reloaded at the next FPU fault.




> > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
> > context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
> > to the new head. This should really be a fork of the dirty FPU context 
> > IMO, and should only happen if the USEDFPU bit is set.
> 
> why? The only effect this will have is a slowdown :) The FPU context 
> from the middle of the threadlet function is totally meaningless to the 
> 'new head'. It might be anything. (although in practice system calls are 
> almost never called with a truly in-use FPU.)

See above ;)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> I think that the "dirty" FPU context must, at least, follow the new 
> head. That's what the userspace sees, and you don't want an async_exec 
> to re-emerge with a different FPU context.

well. I think there's some confusion about terminology, so please let me 
describe everything in detail. This is how execution goes:

  outer loop() {
  call_threadlet();
  }

this all runs in the 'head' context. call_threadlet() always switches to 
the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
while executing the threadlet function, we block, then the 
threadlet-thread gets to keep the task (the threadlet stack and also the 
FPU), and blocks - and we pick a 'new head' from the thread pool and 
continue executing in that context - right after the call_threadlet() 
function, in the 'old' head's stack. I.e. it's as if we returned 
immediately from call_threadlet(), with a return code that signals that 
the 'threadlet went async'.

now, the FPU state that was when the threadlet blocked is totally 
meaningless to the 'new head' - that FPU state is from the middle of the 
threadlet execution.

and here is where thinking about threadlets as a function call and not 
as an asynchronous context helps alot: the classic gcc convention for 
FPU use & function calls should apply: gcc does not call an external 
function with an in-use FPU stack/register, it always neatly unuses it, 
as no FPU register is callee-saved, all are caller-saved.

> So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
> context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
> to the new head. This should really be a fork of the dirty FPU context 
> IMO, and should only happen if the USEDFPU bit is set.

why? The only effect this will have is a slowdown :) The FPU context 
from the middle of the threadlet function is totally meaningless to the 
'new head'. It might be anything. (although in practice system calls are 
almost never called with a truly in-use FPU.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > [...] We're still missing proper FPU context switch in the 
> > move_user_context(). [...]
> 
> yeah - i'm starting to be of the opinion that the FPU context should 
> stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
> 'outer loop' (the event loop) should not leak FPU context into the 
> threadlet and then expect it to be replicated from whatever random point 
> the threadlet ended up sleeping at. It would be possible, but it just 
> makes no sense. What makes most sense is to just keep the FPU context 
> with the threadlet, and to let the 'new head' use an initial (unused) 
> FPU context. And it's in fact the threadlet that will most likely have 
> an acrive FPU context across a system call, not the outer loop. In other 
> words: no special FPU support needed at all for threadlets (i.e. no 
> flipping needed even) - this behavior just naturally happens in the 
> current implementation. Hm?

I think that the "dirty" FPU context must, at least, follow the new head. 
That's what the userspace sees, and you don't want an async_exec to 
re-emerge with a different FPU context.
I think it should also follow the async thread (old, going-to-sleep, 
thread), since a threadlet might have that dirtied, and as a consequence 
it'll want to find it back when it's re-scheduled.
So, IMO, if the USEDFPU bit is set, we need to sync the dirty  FPU context 
with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head.
This should really be a fork of the dirty FPU context IMO, and should only 
happen if the USEDFPU bit is set.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> [...] We're still missing proper FPU context switch in the 
> move_user_context(). [...]

yeah - i'm starting to be of the opinion that the FPU context should 
stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
'outer loop' (the event loop) should not leak FPU context into the 
threadlet and then expect it to be replicated from whatever random point 
the threadlet ended up sleeping at. It would be possible, but it just 
makes no sense. What makes most sense is to just keep the FPU context 
with the threadlet, and to let the 'new head' use an initial (unused) 
FPU context. And it's in fact the threadlet that will most likely have 
an acrive FPU context across a system call, not the outer loop. In other 
words: no special FPU support needed at all for threadlets (i.e. no 
flipping needed even) - this behavior just naturally happens in the 
current implementation. Hm?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Davide Libenzi wrote:

> And if you really feel raw about the single O(nready) loop that epoll 
> currently does, a new epoll_wait2 (or whatever) API could be used to 
> deliver the event directly into a userspace buffer [1], directly from the 
> poll callback, w/out extra delivery loops 
> (IRQ/event->epoll_callback->event_buffer).

And if you ever wonder from where the "epoll" name came, it came from the 
old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs 
everywhere events where needed and was delivering those events in O(1) 
*directly* on a user visible (mmap'd) buffer, in a zero-copy fashion.
The old /dev/epoll was faster the the current epoll, but the latter was 
chosen because despite being sloghtly slower, it had support for every 
pollable device, *without* adding more plugs into the existing code.
Performance and code maintainance are not to be taken disjointly whenever 
you evaluate a solution. That's the reason I got excited about this new 
generic AIO slution.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

> > After your changes epoll increased to 5k.
> 
> Can we please stop this pointless episode of benchmarketing, where every 
> mail of yours shows different results and you even deny having said 
> something which you clearly said just a few days ago? At this point i 
> simply cannot trust the numbers you are posting, nor is the discussion 
> style you are following productive in any way in my opinion.

Agreed. Can we focus on the topic here? We're still missing proper FPU 
context switch in the move_user_context(). In v6?


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

> do we really want to have per process signalfs, timerfs and so on - each 
> simple structure must be bound to a file, which becomes too cost.

I may be old school, but if you ask me, and if you *really* want those 
events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
use them with *existing* POSIX poll/select. Remember, not every app 
requires huge scalability efforts, so working with simpler and familiar 
APIs is always welcome.
The *only* thing that was not practical to have as fd, was block requests. 
But maybe threadlets/syslets will handle those just fine, and close the gap.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

> On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
> (davidel@xmailserver.org) wrote:
> > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> > 
> > > Ingo, do you really think I will send mails with faked benchmarks? :))
> > 
> > I don't think he ever implied that. He was only suggesting that when you 
> > post benchmarks, and even more when you make claims based on benchmarks, 
> > you need to be extra carefull about what you measure. Otherwise the 
> > external view that you give to others does not look good.
> > Kevent can be really faster than epoll, but if you post broken benchmarks 
> > (that can be, unrealiable HTTP loaders, broken server implemenations, 
> > etc..) and make claims based on that, the only effect that you have is to 
> > lose your point.
>  
> So, I only talked that kevent is superior compared to epoll because (and
> it is _main_ issue) of its ability to handle essentially any kind of
> events with very small overhead (the same as epoll has in struct file -
> list and spinlock) and without significant price of struct file binding
> to event.

You've to excuse me if my memory is bad, but IIRC the whole discussion 
and loong benchmark feast born with you throwing a benchmark at Ingo 
(with kevent showing a 1.9x performance boost WRT epoll), not with you 
making any other point.
As far as epoll not being able to handle other events. Said who? Of 
course, with zero modifications, you can handle zero additional events. 
With modifications, you can handle other events. But lets talk about those 
other events. The *only* kind of event that ppl (and being the epoll 
maintainer I tend to receive those requests) missed in epoll, was AIO 
events, That's the *only* thing that was missed by real life application 
developers. And if something like threadlets/syslets will prove effective, 
the gap is closed WRT that requirement.
Epoll handle already the whole class of pollable devices inside the 
kernel, and if you exclude block AIO, that's a pretty wide class already. 
The *existing* f_op->poll subsystem can be used to deliver events at the 
poll-head wakeup time (by using the "key" member of the poll callback), so 
that you don't even need the extra f_op->poll call to fetch events.
And if you really feel raw about the single O(nready) loop that epoll 
currently does, a new epoll_wait2 (or whatever) API could be used to 
deliver the event directly into a userspace buffer [1], directly from the 
poll callback, w/out extra delivery loops 
(IRQ/event->epoll_callback->event_buffer).


[1] From the epoll callback, we cannot sleep, so it's gonna be either an 
mlocked userspace buffer, or some kernel pages mapped to userspace.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > > > > [...] The numbers are still highly suspect - and we are already 
> > > > > down from the prior claim of kevent being almost twice as fast 
> > > > > to a 25% difference.
> > > >
> > > > Btw, there were never almost twice perfromance increase - epoll in 
> > > > my tests always showed 4-5 thousands requests per second, kevent - 
> > > > up to 7 thausands.
> > > 
> > > i'm referring to your claim in this mail of yours from 4 days ago 
> > > for example:
> > > 
> > >   http://lkml.org/lkml/2007/2/25/116
> > > 
> > >  "But note, that on my athlon64 3500 test machine kevent is about 7900
> > >   requests per second compared to 4000+ epoll, so expect a challenge."
> > > 
> > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
> > > "almost twice".
> > 
> > After your changes epoll increased to 5k.
> 
> Can we please stop this pointless episode of benchmarketing, where every 
> mail of yours shows different results and you even deny having said 
> something which you clearly said just a few days ago? At this point i 
> simply cannot trust the numbers you are posting, nor is the discussion 
> style you are following productive in any way in my opinion.

I just show what I see in tests - I do not perform deep analysis of
that, since I do not see why it should be done - it is not fake, it is
not fantasy - real behaviour which is observed in my test machine, if it
will sudenly change I will report it.
Btw, I showed cases when epoll behaved better than kevent and
performance was unbeatable 9k requests per second - I do not know, why
it happend - maybe some cache related issues, other process all slept in
once, increased radiation or strong wind blew away my bad aura - it is
not reproducible on demand too.

> (you are never ever wrong, and if you are proven wrong on topic A you 
> claim it is an irrelevant topic (without even admitting you were wrong 
> about it) and you point to topic B claiming it's the /real/ topic you 
> talked about all along. And along the way you are slandering other 
> projects like epoll and threadlets, distorting the discussion. This kind 
> of keep-the-ball-moving discussion style is effective in politics but 
> IMO it's a waste of time when developing a kernel.)

Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends
up with politics :)

What we are talking about - we try to insult each other with something,
that was supposed to be said after some assumption on theoretical mental
exercise? I can only laugh on that :)

Ingo, I never ever tried to show that something is broken - that is
fantasy based on straight words, not on the real intension.

I never said epoll is broken. Absolutely.

I never said threadlet is broken. Absolutely.

I just showed that it is not (in my opinion) right decision to use
threadlets for IO based model instead of event driven - it is not based
on kevent performance (I _never_ stated it as a main factor - kevent was
only an example of event driven model, you were confused it with kevent
AIO, which is different beast), but instead on experience with nptl
threads and linuxthreads, and related rescheduling overhead compared to 
userspace one.

I showed kevent as a possible usage scenario - since it does support own
AIO. And you started to fight against it in every detail, since you
think kevent is not a good idea to handle AIO model - well, it can be
pefectly correct, I showed kevent AIO (please do not think that kevent
and kevent AIO are the same - the latter is just one of the possible
users I implemented, it only uses kevent to deliver completion event to 
userspace) as possible AIO implementation, but not _kevent_ itself.

But somehow we ended with binding to me some words I never said and ideas
I never based my assumptions on... I do not really think you even
remotely wanted to make any somehow personal assumptions on what we had
discussed.

We even concluded, that perfect IO model should use both approaches to
really scale - both threadlets with its on-demand-only rescheduling, and
event driven ring.
You pointed your opinion on kevents - well, I can not agree with it, but
that is your right not to like something.

Let's not continue bad practice of kicking each other just because there
were some problematic roots which noone even remember correctly - let's
do not make a mistake of pointing something personal out of trivial bits
- if you will be in Russia of around any time soon I will happily buy you 
a beer or what you prefer :)

So, let's just draw a line:
kevent was showed to people, and its performance, although flacky, is a
bit faster than epoll. Threadlets bound to any event driven ring do not
show any performance degradation in network driven setup with small
number of reschedulings with all advantages of simpler programming.
So, repeating myself, both models (not kevent and 

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > Even if kevent has the same speed, it still allows to handle _any_ 
> > kind of events without any major surgery - a very tiny structure of 
> > lock and list head and you can process your own kernel event in 
> > userspace with timers, signals, io events, private userspace events 
> > and others without races and invention of differnet hacks for 
> > different types - _this_ is main point.
> 
> did it ever occur to you to ... extend epoll? To speed it up? To add a 
> new wait syscall to it? Instead of introducing a whole new parallel 
> framework?

Yes, I thought about its extension more than a year ago before started 
kevent, but epoll() is absolutely based on file structure and its 
file_operations with poll methodt, so it is quite impossible to work 
with sockets to implement network AIO. Eventually it had gathered a lot 
of other systems - do we really want to have per process signalfs, timerfs 
and so on - each simple structure must be bound to a file, which becomes 
too cost.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> > > > [...] The numbers are still highly suspect - and we are already 
> > > > down from the prior claim of kevent being almost twice as fast 
> > > > to a 25% difference.
> > >
> > > Btw, there were never almost twice perfromance increase - epoll in 
> > > my tests always showed 4-5 thousands requests per second, kevent - 
> > > up to 7 thausands.
> > 
> > i'm referring to your claim in this mail of yours from 4 days ago 
> > for example:
> > 
> >   http://lkml.org/lkml/2007/2/25/116
> > 
> >  "But note, that on my athlon64 3500 test machine kevent is about 7900
> >   requests per second compared to 4000+ epoll, so expect a challenge."
> > 
> > no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
> > "almost twice".
> 
> After your changes epoll increased to 5k.

Can we please stop this pointless episode of benchmarketing, where every 
mail of yours shows different results and you even deny having said 
something which you clearly said just a few days ago? At this point i 
simply cannot trust the numbers you are posting, nor is the discussion 
style you are following productive in any way in my opinion.

(you are never ever wrong, and if you are proven wrong on topic A you 
claim it is an irrelevant topic (without even admitting you were wrong 
about it) and you point to topic B claiming it's the /real/ topic you 
talked about all along. And along the way you are slandering other 
projects like epoll and threadlets, distorting the discussion. This kind 
of keep-the-ball-moving discussion style is effective in politics but 
IMO it's a waste of time when developing a kernel.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> Even if kevent has the same speed, it still allows to handle _any_ 
> kind of events without any major surgery - a very tiny structure of 
> lock and list head and you can process your own kernel event in 
> userspace with timers, signals, io events, private userspace events 
> and others without races and invention of differnet hacks for 
> different types - _this_ is main point.

did it ever occur to you to ... extend epoll? To speed it up? To add a 
new wait syscall to it? Instead of introducing a whole new parallel 
framework?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
> Maybe. It is not up to me to decide. But "it is faster" is _not_ the
> only merge criterium.

Of course not!
Even if kevent has the same speed, it still allows to handle _any_ kind
of events without any major surgery - a very tiny structure of lock and
list head and you can process your own kernel event in userspace with 
timers, signals, io events, private userspace events and others without 
races and invention of differnet hacks for different types - 
_this_ is main point.

>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Pavel Machek
Hi!

> > > > If you can replace them with something simpler, and no worse than 10%
> > > > slower in worst case, then go ahead. (We actually tried to do that at
> > > > some point, only to realize that efence stresses vm subsystem in very
> > > > unexpected/unfriendly way).
> > > 
> > > Agh, only 10% in the worst case.
> > > I think you can not even imagine what tricks network uses to get at
> > > least aditional 1% out of the box.
> > 
> > Yep? Feel free to rewrite networking to assembly on Eugenix. That
> > should get you 1% improvement. If you reserve few registers to be only
> > used by kernel (not allowed by userspace), you can speedup networking
> > 5%, too. Ouch and you could turn off MMU, that is sure way to get few
> > more percent improvement in your networking case.
> 
> It is not _my_ networking, but taht one you use everyday in every Linux
> box. Notice which tricks are used to remove single byte from
> sk_buff.

Ok, so tricks were worth it in sk_buff case.

> It is called optimization, and if it does us a single plus it must be
> implemented. Not all people have magical fear of new things.

But that does not mean "every optimalization must be
implemented". Only optimalizations that are "worth it" are... 

> > > Using such logic you can just abandon any further development, since it
> > > work as is right now.
> > 
> > Stop trying to pervert my logic.
> 
> Ugh? :)
> I just say in simple words your 'we do not need something if adds 10%,
> but is complex to understand'.

Yes... but that does not mean "stop development". You are still free
to clean up the code _while_ making it faster.

> > If your code is so complex that it is almost impossible to use from
> > userspace, that is good enough reason not to be merged. "But it is 3%
> > faster if..." is not a good-enough argument.
> 
> Is it enough for you?
> 
> epoll   4794.23 req/sec
> kevent  6468.95 req/sec

Maybe. It is not up to me to decide. But "it is faster" is _not_ the
only merge criterium.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> 
> > Ingo, do you really think I will send mails with faked benchmarks? :))
> 
> I don't think he ever implied that. He was only suggesting that when you 
> post benchmarks, and even more when you make claims based on benchmarks, 
> you need to be extra carefull about what you measure. Otherwise the 
> external view that you give to others does not look good.
> Kevent can be really faster than epoll, but if you post broken benchmarks 
> (that can be, unrealiable HTTP loaders, broken server implemenations, 
> etc..) and make claims based on that, the only effect that you have is to 
> lose your point.
 
We seems to move far away from original topic - I never built any
assumptions on top of kevent _performance_ - kevent is a logical
extrapolation of the epoll, I only showed that event driven model can be
fast and it outperforms threadlet one - after we changed topic we were
unable to actually test threadlets in networking environment, since the
only test I ran showed that threadlest do not reschedule at all, and
Ingo's tests showed small number of reschedulings.

So, I only talked that kevent is superior compared to epoll because (and
it is _main_ issue) of its ability to handle essentially any kind of
events with very small overhead (the same as epoll has in struct file -
list and spinlock) and without significant price of struct file binding
to event.

I did not want and do not want to hurt anyone (even Ingo, although he is 
against kevent :), but my opinion is that thread moved from nice 
discussion about threads and events with jokes and fun into quite angry 
word throwings, and that is too good - let's make it fun again.
I'm not a native english speaker (and do not use a dictionary), so it is 
quite possible that some my phrases were not exactly nice, but it was 
unintentional (at least not very) :)

Peace?

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
 On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
 
  Ingo, do you really think I will send mails with faked benchmarks? :))
 
 I don't think he ever implied that. He was only suggesting that when you 
 post benchmarks, and even more when you make claims based on benchmarks, 
 you need to be extra carefull about what you measure. Otherwise the 
 external view that you give to others does not look good.
 Kevent can be really faster than epoll, but if you post broken benchmarks 
 (that can be, unrealiable HTTP loaders, broken server implemenations, 
 etc..) and make claims based on that, the only effect that you have is to 
 lose your point.
 
We seems to move far away from original topic - I never built any
assumptions on top of kevent _performance_ - kevent is a logical
extrapolation of the epoll, I only showed that event driven model can be
fast and it outperforms threadlet one - after we changed topic we were
unable to actually test threadlets in networking environment, since the
only test I ran showed that threadlest do not reschedule at all, and
Ingo's tests showed small number of reschedulings.

So, I only talked that kevent is superior compared to epoll because (and
it is _main_ issue) of its ability to handle essentially any kind of
events with very small overhead (the same as epoll has in struct file -
list and spinlock) and without significant price of struct file binding
to event.

I did not want and do not want to hurt anyone (even Ingo, although he is 
against kevent :), but my opinion is that thread moved from nice 
discussion about threads and events with jokes and fun into quite angry 
word throwings, and that is too good - let's make it fun again.
I'm not a native english speaker (and do not use a dictionary), so it is 
quite possible that some my phrases were not exactly nice, but it was 
unintentional (at least not very) :)

Peace?

 - Davide
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Pavel Machek
Hi!

If you can replace them with something simpler, and no worse than 10%
slower in worst case, then go ahead. (We actually tried to do that at
some point, only to realize that efence stresses vm subsystem in very
unexpected/unfriendly way).
   
   Agh, only 10% in the worst case.
   I think you can not even imagine what tricks network uses to get at
   least aditional 1% out of the box.
  
  Yep? Feel free to rewrite networking to assembly on Eugenix. That
  should get you 1% improvement. If you reserve few registers to be only
  used by kernel (not allowed by userspace), you can speedup networking
  5%, too. Ouch and you could turn off MMU, that is sure way to get few
  more percent improvement in your networking case.
 
 It is not _my_ networking, but taht one you use everyday in every Linux
 box. Notice which tricks are used to remove single byte from
 sk_buff.

Ok, so tricks were worth it in sk_buff case.

 It is called optimization, and if it does us a single plus it must be
 implemented. Not all people have magical fear of new things.

But that does not mean every optimalization must be
implemented. Only optimalizations that are worth it are... 

   Using such logic you can just abandon any further development, since it
   work as is right now.
  
  Stop trying to pervert my logic.
 
 Ugh? :)
 I just say in simple words your 'we do not need something if adds 10%,
 but is complex to understand'.

Yes... but that does not mean stop development. You are still free
to clean up the code _while_ making it faster.

  If your code is so complex that it is almost impossible to use from
  userspace, that is good enough reason not to be merged. But it is 3%
  faster if... is not a good-enough argument.
 
 Is it enough for you?
 
 epoll   4794.23 req/sec
 kevent  6468.95 req/sec

Maybe. It is not up to me to decide. But it is faster is _not_ the
only merge criterium.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
 Maybe. It is not up to me to decide. But it is faster is _not_ the
 only merge criterium.

Of course not!
Even if kevent has the same speed, it still allows to handle _any_ kind
of events without any major surgery - a very tiny structure of lock and
list head and you can process your own kernel event in userspace with 
timers, signals, io events, private userspace events and others without 
races and invention of differnet hacks for different types - 
_this_ is main point.

   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 Even if kevent has the same speed, it still allows to handle _any_ 
 kind of events without any major surgery - a very tiny structure of 
 lock and list head and you can process your own kernel event in 
 userspace with timers, signals, io events, private userspace events 
 and others without races and invention of differnet hacks for 
 different types - _this_ is main point.

did it ever occur to you to ... extend epoll? To speed it up? To add a 
new wait syscall to it? Instead of introducing a whole new parallel 
framework?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Evgeniy Polyakov [EMAIL PROTECTED] wrote:

[...] The numbers are still highly suspect - and we are already 
down from the prior claim of kevent being almost twice as fast 
to a 25% difference.
  
   Btw, there were never almost twice perfromance increase - epoll in 
   my tests always showed 4-5 thousands requests per second, kevent - 
   up to 7 thausands.
  
  i'm referring to your claim in this mail of yours from 4 days ago 
  for example:
  
http://lkml.org/lkml/2007/2/25/116
  
   But note, that on my athlon64 3500 test machine kevent is about 7900
requests per second compared to 4000+ epoll, so expect a challenge.
  
  no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
  almost twice.
 
 After your changes epoll increased to 5k.

Can we please stop this pointless episode of benchmarketing, where every 
mail of yours shows different results and you even deny having said 
something which you clearly said just a few days ago? At this point i 
simply cannot trust the numbers you are posting, nor is the discussion 
style you are following productive in any way in my opinion.

(you are never ever wrong, and if you are proven wrong on topic A you 
claim it is an irrelevant topic (without even admitting you were wrong 
about it) and you point to topic B claiming it's the /real/ topic you 
talked about all along. And along the way you are slandering other 
projects like epoll and threadlets, distorting the discussion. This kind 
of keep-the-ball-moving discussion style is effective in politics but 
IMO it's a waste of time when developing a kernel.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
 
 * Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  Even if kevent has the same speed, it still allows to handle _any_ 
  kind of events without any major surgery - a very tiny structure of 
  lock and list head and you can process your own kernel event in 
  userspace with timers, signals, io events, private userspace events 
  and others without races and invention of differnet hacks for 
  different types - _this_ is main point.
 
 did it ever occur to you to ... extend epoll? To speed it up? To add a 
 new wait syscall to it? Instead of introducing a whole new parallel 
 framework?

Yes, I thought about its extension more than a year ago before started 
kevent, but epoll() is absolutely based on file structure and its 
file_operations with poll methodt, so it is quite impossible to work 
with sockets to implement network AIO. Eventually it had gathered a lot 
of other systems - do we really want to have per process signalfs, timerfs 
and so on - each simple structure must be bound to a file, which becomes 
too cost.

   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
 
 * Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
 [...] The numbers are still highly suspect - and we are already 
 down from the prior claim of kevent being almost twice as fast 
 to a 25% difference.
   
Btw, there were never almost twice perfromance increase - epoll in 
my tests always showed 4-5 thousands requests per second, kevent - 
up to 7 thausands.
   
   i'm referring to your claim in this mail of yours from 4 days ago 
   for example:
   
 http://lkml.org/lkml/2007/2/25/116
   
But note, that on my athlon64 3500 test machine kevent is about 7900
 requests per second compared to 4000+ epoll, so expect a challenge.
   
   no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
   almost twice.
  
  After your changes epoll increased to 5k.
 
 Can we please stop this pointless episode of benchmarketing, where every 
 mail of yours shows different results and you even deny having said 
 something which you clearly said just a few days ago? At this point i 
 simply cannot trust the numbers you are posting, nor is the discussion 
 style you are following productive in any way in my opinion.

I just show what I see in tests - I do not perform deep analysis of
that, since I do not see why it should be done - it is not fake, it is
not fantasy - real behaviour which is observed in my test machine, if it
will sudenly change I will report it.
Btw, I showed cases when epoll behaved better than kevent and
performance was unbeatable 9k requests per second - I do not know, why
it happend - maybe some cache related issues, other process all slept in
once, increased radiation or strong wind blew away my bad aura - it is
not reproducible on demand too.

 (you are never ever wrong, and if you are proven wrong on topic A you 
 claim it is an irrelevant topic (without even admitting you were wrong 
 about it) and you point to topic B claiming it's the /real/ topic you 
 talked about all along. And along the way you are slandering other 
 projects like epoll and threadlets, distorting the discussion. This kind 
 of keep-the-ball-moving discussion style is effective in politics but 
 IMO it's a waste of time when developing a kernel.)

Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends
up with politics :)

What we are talking about - we try to insult each other with something,
that was supposed to be said after some assumption on theoretical mental
exercise? I can only laugh on that :)

Ingo, I never ever tried to show that something is broken - that is
fantasy based on straight words, not on the real intension.

I never said epoll is broken. Absolutely.

I never said threadlet is broken. Absolutely.

I just showed that it is not (in my opinion) right decision to use
threadlets for IO based model instead of event driven - it is not based
on kevent performance (I _never_ stated it as a main factor - kevent was
only an example of event driven model, you were confused it with kevent
AIO, which is different beast), but instead on experience with nptl
threads and linuxthreads, and related rescheduling overhead compared to 
userspace one.

I showed kevent as a possible usage scenario - since it does support own
AIO. And you started to fight against it in every detail, since you
think kevent is not a good idea to handle AIO model - well, it can be
pefectly correct, I showed kevent AIO (please do not think that kevent
and kevent AIO are the same - the latter is just one of the possible
users I implemented, it only uses kevent to deliver completion event to 
userspace) as possible AIO implementation, but not _kevent_ itself.

But somehow we ended with binding to me some words I never said and ideas
I never based my assumptions on... I do not really think you even
remotely wanted to make any somehow personal assumptions on what we had
discussed.

We even concluded, that perfect IO model should use both approaches to
really scale - both threadlets with its on-demand-only rescheduling, and
event driven ring.
You pointed your opinion on kevents - well, I can not agree with it, but
that is your right not to like something.

Let's not continue bad practice of kicking each other just because there
were some problematic roots which noone even remember correctly - let's
do not make a mistake of pointing something personal out of trivial bits
- if you will be in Russia of around any time soon I will happily buy you 
a beer or what you prefer :)

So, let's just draw a line:
kevent was showed to people, and its performance, although flacky, is a
bit faster than epoll. Threadlets bound to any event driven ring do not
show any performance degradation in network driven setup with small
number of reschedulings with all advantages of simpler programming.
So, repeating myself, both models (not kevent and threadlet, but event
driven and thread based) should be used to achieve the maximum

Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

 On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
 (davidel@xmailserver.org) wrote:
  On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
  
   Ingo, do you really think I will send mails with faked benchmarks? :))
  
  I don't think he ever implied that. He was only suggesting that when you 
  post benchmarks, and even more when you make claims based on benchmarks, 
  you need to be extra carefull about what you measure. Otherwise the 
  external view that you give to others does not look good.
  Kevent can be really faster than epoll, but if you post broken benchmarks 
  (that can be, unrealiable HTTP loaders, broken server implemenations, 
  etc..) and make claims based on that, the only effect that you have is to 
  lose your point.
  
 So, I only talked that kevent is superior compared to epoll because (and
 it is _main_ issue) of its ability to handle essentially any kind of
 events with very small overhead (the same as epoll has in struct file -
 list and spinlock) and without significant price of struct file binding
 to event.

You've to excuse me if my memory is bad, but IIRC the whole discussion 
and loong benchmark feast born with you throwing a benchmark at Ingo 
(with kevent showing a 1.9x performance boost WRT epoll), not with you 
making any other point.
As far as epoll not being able to handle other events. Said who? Of 
course, with zero modifications, you can handle zero additional events. 
With modifications, you can handle other events. But lets talk about those 
other events. The *only* kind of event that ppl (and being the epoll 
maintainer I tend to receive those requests) missed in epoll, was AIO 
events, That's the *only* thing that was missed by real life application 
developers. And if something like threadlets/syslets will prove effective, 
the gap is closed WRT that requirement.
Epoll handle already the whole class of pollable devices inside the 
kernel, and if you exclude block AIO, that's a pretty wide class already. 
The *existing* f_op-poll subsystem can be used to deliver events at the 
poll-head wakeup time (by using the key member of the poll callback), so 
that you don't even need the extra f_op-poll call to fetch events.
And if you really feel raw about the single O(nready) loop that epoll 
currently does, a new epoll_wait2 (or whatever) API could be used to 
deliver the event directly into a userspace buffer [1], directly from the 
poll callback, w/out extra delivery loops 
(IRQ/event-epoll_callback-event_buffer).


[1] From the epoll callback, we cannot sleep, so it's gonna be either an 
mlocked userspace buffer, or some kernel pages mapped to userspace.


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

 do we really want to have per process signalfs, timerfs and so on - each 
 simple structure must be bound to a file, which becomes too cost.

I may be old school, but if you ask me, and if you *really* want those 
events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
use them with *existing* POSIX poll/select. Remember, not every app 
requires huge scalability efforts, so working with simpler and familiar 
APIs is always welcome.
The *only* thing that was not practical to have as fd, was block requests. 
But maybe threadlets/syslets will handle those just fine, and close the gap.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

  After your changes epoll increased to 5k.
 
 Can we please stop this pointless episode of benchmarketing, where every 
 mail of yours shows different results and you even deny having said 
 something which you clearly said just a few days ago? At this point i 
 simply cannot trust the numbers you are posting, nor is the discussion 
 style you are following productive in any way in my opinion.

Agreed. Can we focus on the topic here? We're still missing proper FPU 
context switch in the move_user_context(). In v6?


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Davide Libenzi wrote:

 And if you really feel raw about the single O(nready) loop that epoll 
 currently does, a new epoll_wait2 (or whatever) API could be used to 
 deliver the event directly into a userspace buffer [1], directly from the 
 poll callback, w/out extra delivery loops 
 (IRQ/event-epoll_callback-event_buffer).

And if you ever wonder from where the epoll name came, it came from the 
old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs 
everywhere events where needed and was delivering those events in O(1) 
*directly* on a user visible (mmap'd) buffer, in a zero-copy fashion.
The old /dev/epoll was faster the the current epoll, but the latter was 
chosen because despite being sloghtly slower, it had support for every 
pollable device, *without* adding more plugs into the existing code.
Performance and code maintainance are not to be taken disjointly whenever 
you evaluate a solution. That's the reason I got excited about this new 
generic AIO slution.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi davidel@xmailserver.org wrote:

 [...] We're still missing proper FPU context switch in the 
 move_user_context(). [...]

yeah - i'm starting to be of the opinion that the FPU context should 
stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
'outer loop' (the event loop) should not leak FPU context into the 
threadlet and then expect it to be replicated from whatever random point 
the threadlet ended up sleeping at. It would be possible, but it just 
makes no sense. What makes most sense is to just keep the FPU context 
with the threadlet, and to let the 'new head' use an initial (unused) 
FPU context. And it's in fact the threadlet that will most likely have 
an acrive FPU context across a system call, not the outer loop. In other 
words: no special FPU support needed at all for threadlets (i.e. no 
flipping needed even) - this behavior just naturally happens in the 
current implementation. Hm?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

 
 * Davide Libenzi davidel@xmailserver.org wrote:
 
  [...] We're still missing proper FPU context switch in the 
  move_user_context(). [...]
 
 yeah - i'm starting to be of the opinion that the FPU context should 
 stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
 'outer loop' (the event loop) should not leak FPU context into the 
 threadlet and then expect it to be replicated from whatever random point 
 the threadlet ended up sleeping at. It would be possible, but it just 
 makes no sense. What makes most sense is to just keep the FPU context 
 with the threadlet, and to let the 'new head' use an initial (unused) 
 FPU context. And it's in fact the threadlet that will most likely have 
 an acrive FPU context across a system call, not the outer loop. In other 
 words: no special FPU support needed at all for threadlets (i.e. no 
 flipping needed even) - this behavior just naturally happens in the 
 current implementation. Hm?

I think that the dirty FPU context must, at least, follow the new head. 
That's what the userspace sees, and you don't want an async_exec to 
re-emerge with a different FPU context.
I think it should also follow the async thread (old, going-to-sleep, 
thread), since a threadlet might have that dirtied, and as a consequence 
it'll want to find it back when it's re-scheduled.
So, IMO, if the USEDFPU bit is set, we need to sync the dirty  FPU context 
with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head.
This should really be a fork of the dirty FPU context IMO, and should only 
happen if the USEDFPU bit is set.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi davidel@xmailserver.org wrote:

 I think that the dirty FPU context must, at least, follow the new 
 head. That's what the userspace sees, and you don't want an async_exec 
 to re-emerge with a different FPU context.

well. I think there's some confusion about terminology, so please let me 
describe everything in detail. This is how execution goes:

  outer loop() {
  call_threadlet();
  }

this all runs in the 'head' context. call_threadlet() always switches to 
the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
while executing the threadlet function, we block, then the 
threadlet-thread gets to keep the task (the threadlet stack and also the 
FPU), and blocks - and we pick a 'new head' from the thread pool and 
continue executing in that context - right after the call_threadlet() 
function, in the 'old' head's stack. I.e. it's as if we returned 
immediately from call_threadlet(), with a return code that signals that 
the 'threadlet went async'.

now, the FPU state that was when the threadlet blocked is totally 
meaningless to the 'new head' - that FPU state is from the middle of the 
threadlet execution.

and here is where thinking about threadlets as a function call and not 
as an asynchronous context helps alot: the classic gcc convention for 
FPU use  function calls should apply: gcc does not call an external 
function with an in-use FPU stack/register, it always neatly unuses it, 
as no FPU register is callee-saved, all are caller-saved.

 So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
 context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
 to the new head. This should really be a fork of the dirty FPU context 
 IMO, and should only happen if the USEDFPU bit is set.

why? The only effect this will have is a slowdown :) The FPU context 
from the middle of the threadlet function is totally meaningless to the 
'new head'. It might be anything. (although in practice system calls are 
almost never called with a truly in-use FPU.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Davide Libenzi
On Fri, 2 Mar 2007, Ingo Molnar wrote:

 
 * Davide Libenzi davidel@xmailserver.org wrote:
 
  I think that the dirty FPU context must, at least, follow the new 
  head. That's what the userspace sees, and you don't want an async_exec 
  to re-emerge with a different FPU context.
 
 well. I think there's some confusion about terminology, so please let me 
 describe everything in detail. This is how execution goes:
 
   outer loop() {
   call_threadlet();
   }
 
 this all runs in the 'head' context. call_threadlet() always switches to 
 the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
 while executing the threadlet function, we block, then the 
 threadlet-thread gets to keep the task (the threadlet stack and also the 
 FPU), and blocks - and we pick a 'new head' from the thread pool and 
 continue executing in that context - right after the call_threadlet() 
 function, in the 'old' head's stack. I.e. it's as if we returned 
 immediately from call_threadlet(), with a return code that signals that 
 the 'threadlet went async'.
 
 now, the FPU state that was when the threadlet blocked is totally 
 meaningless to the 'new head' - that FPU state is from the middle of the 
 threadlet execution.

For threadlets, it might be. Now think about a task wanting to dispatch N 
parallel AIO requests as N independent syslets.
Think about this task having USEDFPU set, so the FPU context is dirty.
When it returns from async_exec, with one of the requests being become 
sleepy, it needs to have the same FPU context it had when it entered, 
otherwise it won't prolly be happy.
For the same reason a schedule() must preserve/sync the prev FPU 
context, to be reloaded at the next FPU fault.




  So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
  context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
  to the new head. This should really be a fork of the dirty FPU context 
  IMO, and should only happen if the USEDFPU bit is set.
 
 why? The only effect this will have is a slowdown :) The FPU context 
 from the middle of the threadlet function is totally meaningless to the 
 'new head'. It might be anything. (although in practice system calls are 
 almost never called with a truly in-use FPU.)

See above ;)



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >