Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote: > On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > >* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > > > >> I don't believe I'm wasting my time explaining this. They don't exist > >> as /dev/null, they are just fucking _LINKS_. > >[...] > >> > Either stop flaming kernel developers or become one. It is that > >> > simple. > >> > >> If I were to become a kernel developer I would stick with FreeBSD. > >> [...] > > > >Hey, really, this is an excellent idea: what a boon you could become to > >FreeBSD, again! How much they must be longing for your insightful > >feedback, how much they must be missing your charming style and tactful > >approach! I bet they'll want to print your mails out, frame them and > >hang them over their fireplace, to remember the good old days on cold > >snowy winter days, with warmth in their hearts! Please? > > > > http://www.totallytom.com/thecureforgayness.html Fonts are a bit bad in my browser :) Kirk, I understand your frustration - yes, Linux is not the perfect place to include startups ideas, and yes it lacks some features modern (or old) systems support for years, but things change with time. I posted a patch which allows to poll for signals, it can be trivially adopted to support timers and essentially any other events. Kevent did that too, but some things are just too radical for immediate support, especially when majority of users do not require additional functionality. People do work, and a lot of them do really good work, so no need for rude talks about how things are bad. Things change - even I support that, although kevent ignorance should put me into the first line with you :) Be good, and be cool. > -- > Kirk Kuchov -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: * Kirk Kuchov <[EMAIL PROTECTED]> wrote: > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. [...] > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. > [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? http://www.totallytom.com/thecureforgayness.html -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, Mar 07 2007, Kirk Kuchov wrote: > On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > >* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > > > >> I don't believe I'm wasting my time explaining this. They don't exist > >> as /dev/null, they are just fucking _LINKS_. > >[...] > >> > Either stop flaming kernel developers or become one. It is that > >> > simple. > >> > >> If I were to become a kernel developer I would stick with FreeBSD. > >> [...] > > > >Hey, really, this is an excellent idea: what a boon you could become to > >FreeBSD, again! How much they must be longing for your insightful > >feedback, how much they must be missing your charming style and tactful > >approach! I bet they'll want to print your mails out, frame them and > >hang them over their fireplace, to remember the good old days on cold > >snowy winter days, with warmth in their hearts! Please? > > > > http://www.totallytom.com/thecureforgayness.html Dude, get a life. But more importantly, go waste somebody elses time instead of lkml's. -- Jens Axboe, updating killfile - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. [...] > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. > [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 7 Mar 2007, Kirk Kuchov wrote: > > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. I could even "ln -s > /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but > that's not the point! Actually, one large reason for /proc/self/ existing is exactly /dev/stdin and friends. And yes, /proc/self looks like a link too, but that doesn't change the fact that it's a very special file. No different from /dev/null or friends. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)
On 3/7/07, Al Boldi <[EMAIL PROTECTED]> wrote: Kirk Kuchov wrote: > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. At > least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? http://www.debian.org/ports/kfreebsd-gnu/ -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)
Kirk Kuchov wrote: > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. At > least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/6/07, Pavel Machek <[EMAIL PROTECTED]> wrote: > >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > > This is a stupid comparaison. By your logic we should also have /dev/stdin, > /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout -> fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 -> /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 -> /proc/2299/fd [EMAIL PROTECTED]:~$ I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. I could even "ln -s /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but that's not the point! It remains a stupid comparison because /dev/stdin/stderr/whatever "must" be plugged, else how could a process write to stdout/stderr that it coud'nt open it ? The way things are is not because it's cleaner to have it as a file but because it's the only sane way. /dev/null is not a must have, it's mainly used for redirecting purposes. A sys_nullify(fileno(stdout)) would rule out almost any use of /dev/null. > >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > >But here the list could be almost endless. > >And please don't start the, they don't scale or they need heavy file > >binding tossfeast. They scale as well as the interface that will receive > >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for > >the struct file? How many signal/timer fd are you gonna have? Like 100K? > >Really moot argument when opposed to the benefit of being compatible with > >existing POSIX interfaces and being more Unix friendly. > > So why the HELL don't we have those yet? Why haven't you designed > epoll with those in mind? Why don't you back your claims with patches? > (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. At least they have kqueue for about seven years now. -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/6/07, Pavel Machek [EMAIL PROTECTED] wrote: As for why common abstractions like file are a good thing, think about why having /dev/null is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr - fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin - fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout - fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 - /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 - /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 - /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 - /proc/2299/fd [EMAIL PROTECTED]:~$ I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. I could even ln -s /proc/self/fd/0 sucker. A real /dev/stdout can/could even exist, but that's not the point! It remains a stupid comparison because /dev/stdin/stderr/whatever must be plugged, else how could a process write to stdout/stderr that it coud'nt open it ? The way things are is not because it's cleaner to have it as a file but because it's the only sane way. /dev/null is not a must have, it's mainly used for redirecting purposes. A sys_nullify(fileno(stdout)) would rule out almost any use of /dev/null. As for why common abstractions like file are a good thing, think about why having /dev/null is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. So why the HELL don't we have those yet? Why haven't you designed epoll with those in mind? Why don't you back your claims with patches? (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. At least they have kqueue for about seven years now. -- Kirk Kuchov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Trading Places (was: [patch 00/13] Syslets, Threadlets, generic AIO support, v3)
Kirk Kuchov wrote: Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. At least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trading Places (was: [patch 00/13] Syslets, Threadlets, generic AIO support, v3)
On 3/7/07, Al Boldi [EMAIL PROTECTED] wrote: Kirk Kuchov wrote: Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. At least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? http://www.debian.org/ports/kfreebsd-gnu/ -- Kirk Kuchov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Wed, 7 Mar 2007, Kirk Kuchov wrote: I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. I could even ln -s /proc/self/fd/0 sucker. A real /dev/stdout can/could even exist, but that's not the point! Actually, one large reason for /proc/self/ existing is exactly /dev/stdin and friends. And yes, /proc/self looks like a link too, but that doesn't change the fact that it's a very special file. No different from /dev/null or friends. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
* Kirk Kuchov [EMAIL PROTECTED] wrote: I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. [...] Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote: * Kirk Kuchov [EMAIL PROTECTED] wrote: I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. [...] Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? http://www.totallytom.com/thecureforgayness.html -- Kirk Kuchov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Wed, Mar 07 2007, Kirk Kuchov wrote: On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote: * Kirk Kuchov [EMAIL PROTECTED] wrote: I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. [...] Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? http://www.totallytom.com/thecureforgayness.html Dude, get a life. But more importantly, go waste somebody elses time instead of lkml's. -- Jens Axboe, updating killfile - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote: On 3/7/07, Ingo Molnar [EMAIL PROTECTED] wrote: * Kirk Kuchov [EMAIL PROTECTED] wrote: I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. [...] Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? http://www.totallytom.com/thecureforgayness.html Fonts are a bit bad in my browser :) Kirk, I understand your frustration - yes, Linux is not the perfect place to include startups ideas, and yes it lacks some features modern (or old) systems support for years, but things change with time. I posted a patch which allows to poll for signals, it can be trivially adopted to support timers and essentially any other events. Kevent did that too, but some things are just too radical for immediate support, especially when majority of users do not require additional functionality. People do work, and a lot of them do really good work, so no need for rude talks about how things are bad. Things change - even I support that, although kevent ignorance should put me into the first line with you :) Be good, and be cool. -- Kirk Kuchov -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
> >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > > This is a stupid comparaison. By your logic we should also have /dev/stdin, > /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout -> fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 -> /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 -> /proc/2299/fd [EMAIL PROTECTED]:~$ > >But here the list could be almost endless. > >And please don't start the, they don't scale or they need heavy file > >binding tossfeast. They scale as well as the interface that will receive > >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for > >the struct file? How many signal/timer fd are you gonna have? Like 100K? > >Really moot argument when opposed to the benefit of being compatible with > >existing POSIX interfaces and being more Unix friendly. > > So why the HELL don't we have those yet? Why haven't you designed > epoll with those in mind? Why don't you back your claims with patches? > (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
As for why common abstractions like file are a good thing, think about why having /dev/null is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr - fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin - fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout - fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 - /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 - /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 - /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 - /proc/2299/fd [EMAIL PROTECTED]:~$ But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. So why the HELL don't we have those yet? Why haven't you designed epoll with those in mind? Why don't you back your claims with patches? (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/4/07, Kyle Moffett <[EMAIL PROTECTED]> wrote: Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a "read" return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Or, if you want to do slightly more work and produce something a great deal more useful, you could implement additional netlink address families for additional "event" sources. The socket - setsockopt - bind - sendmsg/recvmsg sequence is a well understood and well documented UNIX paradigm for multiplexing non-blocking I/O to many destinations over one socket. Everyone who has read Stevens is familiar with the basic UDP and "fd open server" techniques, and if you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll see how easily they could be extended to file AIO and other kinds of event sources. For file AIO, you might have the application open one AIO socket per mount point, open files indirectly via the SCM_RIGHTS mechanism, and submit/retire read/write requests via sendmsg/recvmsg with ancillary data consisting of an lseek64 tuple and a user-provided cookie. Although the process still has to have one fd open per actual open file (because trying to authenticate file accesses without opening fds is madness), the only fds it has to manipulate directly are those representing entire pools of outstanding requests. This is usually a small enough set that select() will do just fine, if you're careful with fd allocation. (You can simply punt indirectly opened fds up to a high numerical range, where they can't be accessed directly from userspace but still make fine cookies for use in lseek64 tuples within cmsg headers). The same basic approach will work for timers, signals, and just about any other event source. Userspace is of course still stuck doing its own state machines / thread scheduling / however you choose to think of it. But all the important activity goes through socketcall(), and the data and control parameters are all packaged up into a struct msghdr instead of the bare buffer pointers of read/write. So if someone else does come along later and design an ultralight threading mechanism that isn't a total botch, the actual data paths won't need much rework; the exception handling will just get a lot simpler. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Kirk Kuchov wrote: [snip] This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. Well, as a matter of fact (on my system): # ls -l /dev/std* lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdout -> fd/1 Please don't bother to respond to this mail, I just saw that you apparently needed the info. Magnus P.S.: *PLONK* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Discussing LKML community [OT from the Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3]
> From: "Michael K. Edwards" <[EMAIL PROTECTED]> > Newsgroups: gmane.linux.kernel > Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3 > Date: Wed, 28 Feb 2007 09:01:07 -0800 Michael, [] > In this instance, there didn't seem to be any harm in sending my > thoughts to LKML as I wrote them, on the off chance that Ingo or > Davide would get some value out of them in this design cycle (which > any code I eventually get around to producing will miss). So far, > I've gotten some rather dismissive pushback from Ingo and Alan (who > seem to have no interest outside x86 and less understanding than I > would have thought of what real userspace code looks like), a "why > preach to people who know more than you do" from Davide, this may be sad, unless you've spent time and effort to make a Patch, i.e. read source, understand why it's written so, why it's being used now that way, and why it has to be updated on new cycle of kernel development. > a brief aside on the dominance of x86 from Oleg, I didn't have a chance, and probably i will not have one, to communicate with people like you to learn from your wisdom personally. That's why i've replied to your, after you've mentioned transputers. And i've got rather different opinion, than i expected. That shows my test-tube being, little experience etc. As discussion was about CPUs, it was technical, thus on-topic for LKML. > and one off-list "keep up the good work". Not a very rich harvest from > (IMHO) pretty good seeds. Offlist message was my share of view about things, that were offtopic, and clarifying about lkml thing, and it wasn't on-topic for LKML. I'm pretty sure, that there libraries of books, written on every single bit of things Linux currently *implements* in asm/C. (1) Thus, `return -ENOPATCH', man, regardless what you are saying in lkml. That's why prominent people, you've joined me with (:, replied in go-to-kernelnewbie style. > In short, so far the "Linux kernel community" is upholding its > reputation for insularity, arrogance, coding without prior design, > lack of interest in userspace problems, and inability to learn from > the mistakes of others. (None of these characterizations depends on > there being any real insight in anything I have written.) You, as a person, who have right to be personally wrong, may think that way. But do not forget, as i've wrote you offlist and in (1), this is development community, sometimes development of development one, etc; educated, enthusiastic, wise, Open Source, poor on time (and money :). > Happy hacking, > - Michael And you too. LKML *can* (sometimes may) show how useful this hacking is. > P. S. I do think "threadlets" are brilliant, though, and reading > Ingo's patches gave me a much better idea of what would be involved in > prototyping Asynchronously Executed I/O Unit opcodes. You are discussing on-topic thing in the P.S. And this is IMHO wrong approach. Also, note, that i've changed subject, stripped cc list, please note, that i can be young and naive boy barking up the wrong tree. Kind regards. -- -o--=O`C /. .\ #oo'L O o <___=E M^-- (Wuuf) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: > I don't give a shit. Here's another good use of /dev/null: *PLONK* - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/4/07, Davide Libenzi wrote: On Sun, 4 Mar 2007, Kirk Kuchov wrote: > On 3/3/07, Davide Libenzi wrote: > > > > > > Those *other* (tons?!?) interfaces can be created *when* the need comes > > (see Linus signalfd [1] example to show how urgent that was). *When* > > the need comes, they will work with existing POSIX interfaces, without > > requiring your own just-another event interface. Those other interfaces > > could also be more easily adopted by other Unix cousins, because of > > the fact that they rely on existing POSIX interfaces. > > Please stop with this crap, this chicken or the egg argument of yours is utter > BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. I don't give a shit. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op->poll subsystem), how about we use that instead of just-another way? [3] Man you're so full of shit, your eyes are brown. NOBODY cares about select/poll or that the interfaces are going to be adopted by other Unixes. This issue has already been solved by then YEARS ago. What I want (and a ton of other users) is a SIMPLE and generic way to receive events from _MULTIPLE_multiple sources. I don't care about kernel-level portability, easiness or whatever, the linux kernel developers are good at not knowing what their users want. As for why common abstractions like file are a good thing, think about why having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. Yet another stupid comparaison, /proc is a MESS! Almost as worse as the registry. Linux now has three pieces of crap for configuration/information: /proc, sysfs and sysctl. Nobody knows exactly what should go into each one of those. Crap design at it's best. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. So why the HELL don't we have those yet? Why haven't you designed epoll with those in mind? Why don't you back your claims with patches? (I'm not a kernel developer.) As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of "kludge", we can even find a way to bridge the two. I don't care about threadlets in this context, I just want to wait for EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap. Your arrogance is amusing, stop pushing narrow-minded beliefs down the throats of all Linux users. Kqueue, event ports, WaitForMultipleObjects, epoll with multiple sources. That's what users want, not yet another syscall/whatever hack. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. Yet another hack!! 64kiB of space just to push some user events around. Great idea! [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. Who said I want yet another interface? I just fucking want to receive events from MULTIPLE sources through epoll. With or without a fd! My anger and frustration is that we can get past this SIMPLE need! [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and w/out special ad-hoc code to
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: > On 3/3/07, Davide Libenzi wrote: > > > > > > Those *other* (tons?!?) interfaces can be created *when* the need comes > > (see Linus signalfd [1] example to show how urgent that was). *When* > > the need comes, they will work with existing POSIX interfaces, without > > requiring your own just-another event interface. Those other interfaces > > could also be more easily adopted by other Unix cousins, because of > > the fact that they rely on existing POSIX interfaces. > > Please stop with this crap, this chicken or the egg argument of yours is utter > BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op->poll subsystem), how about we use that instead of just-another way? [3] As for why common abstractions like file are a good thing, think about why having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of "kludge", we can even find a way to bridge the two. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and w/out special ad-hoc code to be plugged everywhere. This translate directly in easier to maintain code. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote: So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: [code snipped] Would this be acceptable? Can we finally move on? Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a "read" return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Davide Libenzi wrote: Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. Please stop with this crap, this chicken or the egg argument of yours is utter BULLSHIT! Just because Linux doesn't have a decent kernel event notification mechanism it does not mean that users don't need. Nobody cared about Linus's signalfd because it wasn't mainline. Look at any event notification libraries out there, it makes me sick how much kludge they have to go thru to get near the same functionality of kqueue on Linux. Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD and Mac OS X support kqueue since around 2000. Windows has had event notification for ages now. These _facilities_ are all widely used, given the platforms popularity. So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: struct epoll_event ev; ev.events = EV_TIMER | EPOLLONESHOT; ev.data.u64 = 1000; /* timeout */ epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, ); or struct sigevent ev; ev.sigev_notify = SIGEV_EPOLL; ev.sigev_signo = epfd; ev.sigev_value = timer_create(CLOCK_MONOTONIC, , ); AIO: struct sigevent ev; int fd = io_setup(..); /* oh boy, I wish... but it works */ ev.events = EV_AIO | EPOLLONESHOT; /* event.data.ptr returns pointer to the iocb */ epoll_ctl(epfd, EPOLL_CTL_ADD, fd, ); or struct iocb iocb; iocb.aio_fildes = fileno(stdin); iocb.aio_lio_opcode = IO_CMD_PREAD; iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */ Would this be acceptable? Can we finally move on? -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Please don't take this the wrong way, Ray, but I don't think _you_ understand the problem space that people are (or should be) trying to address here. Servers want to always, always block. Not on a socket, not on a stat, not on any _one_ thing, but in a condition where the optimum number of concurrent I/O requests are outstanding (generally of several kinds with widely varying expected latencies). I have an embedded server I wrote that avoids forking internally for any reason, although it watches the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle VoIP signaling protocols (which are separate processes because it was more practical to write them in a different language with mediocre embeddability). There's a lot of things that can block out there, not just disk I/O, but the only thing a genuinely scalable server process ever blocks on (apart from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or if it genuinely doesn't suck, the thread scheduler). Furthermore, not only do servers want to block rather than shove more I/O into the plumbing than it can handle without backing up, they also want to throttle the concurrency of requests at the kernel level *for the kernel's benefit*. In particular, a server wants to submit to the kernel a ton of stats and I/O in parallel, far more than it makes sense to actually issue concurrently, so that efficient sequencing of these requests can be left to the kernel. But the server wants to guide the kernel with regard to the ratios of concurrency appropriate to the various classes and the relative urgency of the individual requests within each class. The server also wants to be able to reprioritize groups of requests or cancel them altogether based on new information about hardware status and user behavior. Finally, the biggest argument against syslets/threadlets AFAICS is that -- if done incorrectly, as currently proposed -- they would unify the AIO and normal IO paths in the kernel. This would shackle AIO to the current semantics of synchronous syscalls, in which buffers are passed as bare pointers and exceptional results are tangled up with programming errors. This would, in turn, make it quite impossible for future hardware to pipeline and speculatively execute chains of AIO operations, leaving "syslets" to a few RDBMS programmers with time to burn. The unimproved ease of long term maintenance on the kernel (not to mention the complete failure to make the writing of _correct_, performant server code any easier) makes them unworthy of consideration for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really total irrelevancies. The principal problem that needs solving is to model the process's pool of in-flight I/O requests, together with a much larger number of submitted but not yet issued requests whose results are foreseeably likely to be needed soon, using a data structure that efficiently supports _all_ of the operations needed, including bulk cancellation, reprioritization, and batch migration based on affinities among requests and locality to the correct I/O resources. Memory footprint and gentle-on-real-hardware scheduling are secondary, but also important, considerations. If you happen to be able to service certain things directly from cache, that's gravy -- but it's not very smart IMHO to put that central to your design process. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
Please don't take this the wrong way, Ray, but I don't think _you_ understand the problem space that people are (or should be) trying to address here. Servers want to always, always block. Not on a socket, not on a stat, not on any _one_ thing, but in a condition where the optimum number of concurrent I/O requests are outstanding (generally of several kinds with widely varying expected latencies). I have an embedded server I wrote that avoids forking internally for any reason, although it watches the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle VoIP signaling protocols (which are separate processes because it was more practical to write them in a different language with mediocre embeddability). There's a lot of things that can block out there, not just disk I/O, but the only thing a genuinely scalable server process ever blocks on (apart from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or if it genuinely doesn't suck, the thread scheduler). Furthermore, not only do servers want to block rather than shove more I/O into the plumbing than it can handle without backing up, they also want to throttle the concurrency of requests at the kernel level *for the kernel's benefit*. In particular, a server wants to submit to the kernel a ton of stats and I/O in parallel, far more than it makes sense to actually issue concurrently, so that efficient sequencing of these requests can be left to the kernel. But the server wants to guide the kernel with regard to the ratios of concurrency appropriate to the various classes and the relative urgency of the individual requests within each class. The server also wants to be able to reprioritize groups of requests or cancel them altogether based on new information about hardware status and user behavior. Finally, the biggest argument against syslets/threadlets AFAICS is that -- if done incorrectly, as currently proposed -- they would unify the AIO and normal IO paths in the kernel. This would shackle AIO to the current semantics of synchronous syscalls, in which buffers are passed as bare pointers and exceptional results are tangled up with programming errors. This would, in turn, make it quite impossible for future hardware to pipeline and speculatively execute chains of AIO operations, leaving syslets to a few RDBMS programmers with time to burn. The unimproved ease of long term maintenance on the kernel (not to mention the complete failure to make the writing of _correct_, performant server code any easier) makes them unworthy of consideration for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really total irrelevancies. The principal problem that needs solving is to model the process's pool of in-flight I/O requests, together with a much larger number of submitted but not yet issued requests whose results are foreseeably likely to be needed soon, using a data structure that efficiently supports _all_ of the operations needed, including bulk cancellation, reprioritization, and batch migration based on affinities among requests and locality to the correct I/O resources. Memory footprint and gentle-on-real-hardware scheduling are secondary, but also important, considerations. If you happen to be able to service certain things directly from cache, that's gravy -- but it's not very smart IMHO to put that central to your design process. Cheers, - Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote: snip Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. Please stop with this crap, this chicken or the egg argument of yours is utter BULLSHIT! Just because Linux doesn't have a decent kernel event notification mechanism it does not mean that users don't need. Nobody cared about Linus's signalfd because it wasn't mainline. Look at any event notification libraries out there, it makes me sick how much kludge they have to go thru to get near the same functionality of kqueue on Linux. Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD and Mac OS X support kqueue since around 2000. Windows has had event notification for ages now. These _facilities_ are all widely used, given the platforms popularity. So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: struct epoll_event ev; ev.events = EV_TIMER | EPOLLONESHOT; ev.data.u64 = 1000; /* timeout */ epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, ev); or struct sigevent ev; ev.sigev_notify = SIGEV_EPOLL; ev.sigev_signo = epfd; ev.sigev_value = ev; timer_create(CLOCK_MONOTONIC, ev, timerid); AIO: struct sigevent ev; int fd = io_setup(..); /* oh boy, I wish... but it works */ ev.events = EV_AIO | EPOLLONESHOT; /* event.data.ptr returns pointer to the iocb */ epoll_ctl(epfd, EPOLL_CTL_ADD, fd, ev); or struct iocb iocb; iocb.aio_fildes = fileno(stdin); iocb.aio_lio_opcode = IO_CMD_PREAD; iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */ Would this be acceptable? Can we finally move on? -- Kirk Kuchov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote: So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: [code snipped] Would this be acceptable? Can we finally move on? Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a read return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote: snip Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. Please stop with this crap, this chicken or the egg argument of yours is utter BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op-poll subsystem), how about we use that instead of just-another way? [3] As for why common abstractions like file are a good thing, think about why having /dev/null is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of kludge, we can even find a way to bridge the two. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and w/out special ad-hoc code to be plugged everywhere. This translate directly in easier to maintain code. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/4/07, Davide Libenzi davidel@xmailserver.org wrote: On Sun, 4 Mar 2007, Kirk Kuchov wrote: On 3/3/07, Davide Libenzi davidel@xmailserver.org wrote: snip Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. Please stop with this crap, this chicken or the egg argument of yours is utter BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. I don't give a shit. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op-poll subsystem), how about we use that instead of just-another way? [3] Man you're so full of shit, your eyes are brown. NOBODY cares about select/poll or that the interfaces are going to be adopted by other Unixes. This issue has already been solved by then YEARS ago. What I want (and a ton of other users) is a SIMPLE and generic way to receive events from _MULTIPLE_multiple sources. I don't care about kernel-level portability, easiness or whatever, the linux kernel developers are good at not knowing what their users want. As for why common abstractions like file are a good thing, think about why having /dev/null is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. Yet another stupid comparaison, /proc is a MESS! Almost as worse as the registry. Linux now has three pieces of crap for configuration/information: /proc, sysfs and sysctl. Nobody knows exactly what should go into each one of those. Crap design at it's best. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. So why the HELL don't we have those yet? Why haven't you designed epoll with those in mind? Why don't you back your claims with patches? (I'm not a kernel developer.) As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of kludge, we can even find a way to bridge the two. I don't care about threadlets in this context, I just want to wait for EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap. Your arrogance is amusing, stop pushing narrow-minded beliefs down the throats of all Linux users. Kqueue, event ports, WaitForMultipleObjects, epoll with multiple sources. That's what users want, not yet another syscall/whatever hack. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. Yet another hack!! 64kiB of space just to push some user events around. Great idea! [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. Who said I want yet another interface? I just fucking want to receive events from MULTIPLE sources through epoll. With or without a fd! My anger and frustration is that we can get past this SIMPLE need! [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: I don't give a shit. Here's another good use of /dev/null: *PLONK* - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Discussing LKML community [OT from the Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3]
From: Michael K. Edwards [EMAIL PROTECTED] Newsgroups: gmane.linux.kernel Subject: Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3 Date: Wed, 28 Feb 2007 09:01:07 -0800 Michael, [] In this instance, there didn't seem to be any harm in sending my thoughts to LKML as I wrote them, on the off chance that Ingo or Davide would get some value out of them in this design cycle (which any code I eventually get around to producing will miss). So far, I've gotten some rather dismissive pushback from Ingo and Alan (who seem to have no interest outside x86 and less understanding than I would have thought of what real userspace code looks like), a why preach to people who know more than you do from Davide, this may be sad, unless you've spent time and effort to make a Patch, i.e. read source, understand why it's written so, why it's being used now that way, and why it has to be updated on new cycle of kernel development. a brief aside on the dominance of x86 from Oleg, I didn't have a chance, and probably i will not have one, to communicate with people like you to learn from your wisdom personally. That's why i've replied to your, after you've mentioned transputers. And i've got rather different opinion, than i expected. That shows my test-tube being, little experience etc. As discussion was about CPUs, it was technical, thus on-topic for LKML. and one off-list keep up the good work. Not a very rich harvest from (IMHO) pretty good seeds. Offlist message was my share of view about things, that were offtopic, and clarifying about lkml thing, and it wasn't on-topic for LKML. I'm pretty sure, that there libraries of books, written on every single bit of things Linux currently *implements* in asm/C. (1) Thus, `return -ENOPATCH', man, regardless what you are saying in lkml. That's why prominent people, you've joined me with (:, replied in go-to-kernelnewbie style. In short, so far the Linux kernel community is upholding its reputation for insularity, arrogance, coding without prior design, lack of interest in userspace problems, and inability to learn from the mistakes of others. (None of these characterizations depends on there being any real insight in anything I have written.) You, as a person, who have right to be personally wrong, may think that way. But do not forget, as i've wrote you offlist and in (1), this is development community, sometimes development of development one, etc; educated, enthusiastic, wise, Open Source, poor on time (and money :). Happy hacking, - Michael And you too. LKML *can* (sometimes may) show how useful this hacking is. P. S. I do think threadlets are brilliant, though, and reading Ingo's patches gave me a much better idea of what would be involved in prototyping Asynchronously Executed I/O Unit opcodes. You are discussing on-topic thing in the P.S. And this is IMHO wrong approach. Also, note, that i've changed subject, stripped cc list, please note, that i can be young and naive boy barking up the wrong tree. Kind regards. -- -o--=O`C /. .\ #oo'L O o ___=E M^-- (Wuuf) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
Kirk Kuchov wrote: [snip] This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. Well, as a matter of fact (on my system): # ls -l /dev/std* lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stderr - fd/2 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdin - fd/0 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdout - fd/1 Please don't bother to respond to this mail, I just saw that you apparently needed the info. Magnus P.S.: *PLONK* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/4/07, Kyle Moffett [EMAIL PROTECTED] wrote: Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a read return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Or, if you want to do slightly more work and produce something a great deal more useful, you could implement additional netlink address families for additional event sources. The socket - setsockopt - bind - sendmsg/recvmsg sequence is a well understood and well documented UNIX paradigm for multiplexing non-blocking I/O to many destinations over one socket. Everyone who has read Stevens is familiar with the basic UDP and fd open server techniques, and if you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll see how easily they could be extended to file AIO and other kinds of event sources. For file AIO, you might have the application open one AIO socket per mount point, open files indirectly via the SCM_RIGHTS mechanism, and submit/retire read/write requests via sendmsg/recvmsg with ancillary data consisting of an lseek64 tuple and a user-provided cookie. Although the process still has to have one fd open per actual open file (because trying to authenticate file accesses without opening fds is madness), the only fds it has to manipulate directly are those representing entire pools of outstanding requests. This is usually a small enough set that select() will do just fine, if you're careful with fd allocation. (You can simply punt indirectly opened fds up to a high numerical range, where they can't be accessed directly from userspace but still make fine cookies for use in lseek64 tuples within cmsg headers). The same basic approach will work for timers, signals, and just about any other event source. Userspace is of course still stuck doing its own state machines / thread scheduling / however you choose to think of it. But all the important activity goes through socketcall(), and the data and control parameters are all packaged up into a struct msghdr instead of the bare buffer pointers of read/write. So if someone else does come along later and design an ultralight threading mechanism that isn't a total botch, the actual data paths won't need much rework; the exception handling will just get a lot simpler. Cheers, - Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Ihar `Philips` Filipau wrote: > On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote: >> On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: >> > What I'm trying to get to: keep things simple. The proposed >> > optimization by Ingo does nothing else but allowing AIO to probe file >> > cache - if data there to go with fast path. So why not to implement >> > what the people want - probing of cache? Because it sounds bad? But >> > they are in fact proposing precisely that just masked with "fast >> > threads". >> >> >> Servers want to never, ever block. Not on a socket, not on a stat, not >> on anything. (I have an embedded server I wrote that has to fork >> internally just to watch the damn serial port signals in parallel with >> handling network I/O, audio, and child processes that handle H323.) >> There's a lot of things that can block out there, and it's not just >> disk I/O. >> > > Why select/poll/epoll/friends do not work? I have programmed on both > sides - user-space network servers and in-kernel network protocols - > and "never blocking" thing was implemented in *nix in the times I was > walking under table. > Then you've never had to write something that watches serial port signals. Google on TIOCMIWAIT to see what I'm talking about. The only option for a userspace programmer to deal with that is to fork() or poll the signals every so many milliseconds. There are probably more easy examples, but that's the one off the top of my head that affected me. In short, this isn't just about network IO, this isn't just about file IO. > One can poll() more or less *any* device in system. With frigging > exception of - right - files. The problem is the "more or less." Say you're right, and 95% of the system calls are either already asynchronous or non-blocking/poll()able. One of the questions on the table is how to extend it to the last 5%. > User-space-wise, check how squid (caching http proxy) does it: you > have several (forked) instances to serve network requests and you have > one/several disk I/O daemons. (So called "diskd storeio") Why? Because > you cannot poll() file descriptors, but you can poll unix socket > connected to diskd. If diskd blocks, squid still can serve requests. > How threadlets are better then pool of diskd instances? All nastiness > of shared memory set loose... Samba/lighttpd/git want to issue dozens of stats in parallel so that the kernel can have an opportunity to sort them better. Are you saying they should fork() a process per stat that they want to issue in parallel? > What I'm trying to get to. Threadlets wouldn't help existing > single-threaded applications - what is about 95% of all applications. Eh, I don't think that's right. Part of the reason threadlets and syslets are on the table because it may be a more efficient way to do AIO. And the differences between the syslet API and the current kernel Async IO API can be abstracted away by glibc, so that today's apps that do AIO would immediately benefit. > What's more, as having some limited experience of kernel programming, > I fail to see what threadlets would simplify on kernel side. You can yank the entire separate AIO path, and just treat them as another blocking API that syslets makes nonblocking. Immediate reduction of code, and everybody is now using the same code paths, which means higher test coverage and reduced maintenance cost. This last point is really important. Even if no extra functionality eventually makes it to userspace, this last point would still be enough to make the powers that be consider inclusion. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote: On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Why select/poll/epoll/friends do not work? I have programmed on both sides - user-space network servers and in-kernel network protocols - and "never blocking" thing was implemented in *nix in the times I was walking under table. One can poll() more or less *any* device in system. With frigging exception of - right - files. IOW for 75% of I/O problem doesn't exists since there is proper interface - e.g. sockets - in place. User-space-wise, check how squid (caching http proxy) does it: you have several (forked) instances to serve network requests and you have one/several disk I/O daemons. (So called "diskd storeio") Why? Because you cannot poll() file descriptors, but you can poll unix socket connected to diskd. If diskd blocks, squid still can serve requests. How threadlets are better then pool of diskd instances? All nastiness of shared memory set loose... What I'm trying to get to. Threadlets wouldn't help existing single-threaded applications - what is about 95% of all applications. And multi-threaded applications would gain little because few real application create threads dynamically: creation need resources and can fail, uncontrollable thread spawning hurts overall manageability and additional care is needed regarding deadlocks/lock contentions proofing. (The category of applications which want the performance gain are also the applications which need to ensure greater stability over long non-stop runs. Uncontrollable dynamism helps nothing.) Having implemented several "file servers" - daemons serving file I/O to other daemons - I honestly hardly see any improvements. Now people configure such file servers to issue e.g. 10 file operations simultaneously - using pool of 10 threads. What threadlets change? In the end just to keep in check with threadlets I would need to issue pthread_join() after some number of threadlets created. And the latter number is the former "e.g. 10". IOW, programmer-wise the implementation remain same - and all the limitations remain the same. And all overhead of user-space locking remain the same. (*) What's more, as having some limited experience of kernel programming, I fail to see what threadlets would simplify on kernel side. End result as I see it: user space becomes bit more complicated because of dynamic multi-threading and kernel-space becomes also more complicated because of the same added dynamism. (*) Hm... On other side, if application would be able to tell kernel to limit number of issued threadlets to N, then it might simplify the job. Application can tell kernel "I need at most 10 blocking threadlets, block me if there are more" and then dumbly throw I/O threadlets at kernel as they are coming in. And kernel would then put process to sleep if N+1 thredlets are blocking. That would definitely simplify the job in user-space: it wouldn't need to call pthread_join(). But it is still no replacement to poll()able file descriptor or truly async mmap(). -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Davide Libenzi wrote: > Those *other* (tons?!?) interfaces can be created *when* the need comes > (see Linus signalfd [1] example to show how urgent that was). *When* > the need comes, they will work with existing POSIX interfaces, without > requiring your own just-another event interface. Those other interfaces > could also be more easily adopted by other Unix cousins, because of > the fact that they rely on existing POSIX interfaces. One of the reason > about the Unix file abstraction interfaces, is that you do *not* have to > plan and bloat interfaces before. As long as your new abstraction behave > in a file-fashion, it can be automatically used with existing interfaces. > And you create them *when* the need comes. Now, if you don't mind, my spare time is really limited and I prefer to spend it looking at stuff the topic of this thread talks about. Even because the whole epoll/kevent discussion is heavily dependent on the fact that syslets/threadlets will or will not result a viable method for generic AIO. Savvy? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > I was referring to dropping an event directly to a userspace buffer, from > > the poll callback. If pages are not there, you might sleep, and you can't > > since the wakeup function holds a spinlock on the waitqueue head while > > looping through the waiters to issue the wakeup. Also, you don't know from > > where the poll wakeup is called. > > Ugh, no, that is very limited solution - memory must be either pinned > (which leads to dos and limited ring buffer), or callback must sleep. > Actually in any way there _must_ exist a queue - if ring buffer is full > event is not allowed to be dropped - it must be stored in some other > place, for example in queue from where entries will be read (copied) > which ring buffer will have entries (that is how it is implemented in > kevent at least). I was not advocating for that, if you read carefully. The fact that epoll does not do that, should be a clear hint. The old /dev/epoll IIRC was only 10% faster than the current epoll under an *heavy* event frequency micro-bench like pipetest (and that version of epoll did not have the single pass over the ready set optimization). And /dev/epoll was delivering events *directly* on userspace visible (mmaped) memory in a zero-copy fashion. > > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an > > fd. Code remained there and nobody cared. Question: Was it because > > 1) it had file bindings or 2) because nobody really cared to deliver > > signals to an event collector? > > And *if* later requirements come, you don't need to change the API by > > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new > > XXEVENT-only submission structure. You create an API that automatically > > makes that new abstraction work with POSIX poll/select, and you get epoll > > support for free. Without even changing a bit in the epoll API. > > Well, we get epoll support for free, but we need to create tons of other > interfaces and infrastructure for kernel users, and we need to change > userspace anyway. Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. One of the reason about the Unix file abstraction interfaces, is that you do *not* have to plan and bloat interfaces before. As long as your new abstraction behave in a file-fashion, it can be automatically used with existing interfaces. And you create them *when* the need comes. [1] That was like 100 lines of code or so. See here: http://tinyurl.com/3yuna5 - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with "fast threads". Please don't take this the wrong way, but I don't think you understand the problem space that people are trying to address here. Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Further, not only do servers not want to block, they also want to cram a lot more requests into the kernel at once *for the kernel's benefit*. In particular, a server wants to issue a ton of stats and I/O in parallel so that the kernel can optimize which order to handle the requests. Finally, the biggest argument in favor of syslets/threadlets AFAICS is that -- if done correctly -- it would unify the AIO and normal IO paths in the kernel. The improved ease of long term maintenance on the kernel (and more test coverage, and more directed optimization, etc...) just for this point alone makes them worth considering for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really special cases of the entire package that the rest of us want. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > > > You've to excuse me if my memory is bad, but IIRC the whole discussion > > > and loong benchmark feast born with you throwing a benchmark at Ingo > > > (with kevent showing a 1.9x performance boost WRT epoll), not with you > > > making any other point. > > > > So, how does it sound? > > "Threadlets are bad for IO because kevent is 2 times faster than epoll?" > > > > I said threadlets are bad for IO (and we agreed that both approaches > > shouldbe usedfor the maximum performance) because of rescheduling overhead - > > tasks are quite heavy structuresa to move around - even pt_regs copy > > takes more than event structure, but not because there is something in other > > galaxy which might work faster than another something in another galaxy. > > That was stupid even to think about that. > > Evgeny, other folks on this thread read what you said, so let's not drag > this over. Sure, I was wrong to start this again, but try to get my position - I really tired from trying to prove that I'm not a camel just because we had some misunderstanding at the start. I do think that threadlets are relly cool solution and are indeed very good approach for majority of the parallel processing, but my point is still that it is not a perfect solution for all tasks. Just to draw a line: kevent example is extrapolation of what can be achieved with event-driven model, but that does not mean that it must be _only_ used for AIO model - threadlets _and_ event driven model (yes, I accepted Ingo's point about its declining) is the best solution. > > > And if you really feel raw about the single O(nready) loop that epoll > > > currently does, a new epoll_wait2 (or whatever) API could be used to > > > deliver the event directly into a userspace buffer [1], directly from the > > > poll callback, w/out extra delivery loops > > > (IRQ/event->epoll_callback->event_buffer). > > > > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > > > mlocked userspace buffer, or some kernel pages mapped to userspace. > > > > Callbacks never sleep - they add event into list just like current > > implementation (maybe some lock must be changed from mutex to spinlock, > > I do not rememeber), main problem is binding to the file structure, > > which is heavy. > > I was referring to dropping an event directly to a userspace buffer, from > the poll callback. If pages are not there, you might sleep, and you can't > since the wakeup function holds a spinlock on the waitqueue head while > looping through the waiters to issue the wakeup. Also, you don't know from > where the poll wakeup is called. Ugh, no, that is very limited solution - memory must be either pinned (which leads to dos and limited ring buffer), or callback must sleep. Actually in any way there _must_ exist a queue - if ring buffer is full event is not allowed to be dropped - it must be stored in some other place, for example in queue from where entries will be read (copied) which ring buffer will have entries (that is how it is implemented in kevent at least). > File binding heavy? The first, and by *far* biggest, source of events > inside an event collector, of someone that cares about scalability, are > sockets. And those are already files. Second would be AIO, and those (if > performance figures agrees) can be hosted inside syslets/threadlets. > Then you fall into the no-care category, where the extra 100 bytes do not > make a case against the ability of using it with an existing POSIX > infrastructure (poll/select). Well, sockets are the files indeed, and sockets already are perfectly handled by epoll - but there are other users of petential interace - and it must be designed to scale in _any_ situation very well. Even if we right now do not have problems with some types of events, we must scale with any new one. > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an > fd. Code remained there and nobody cared. Question: Was it because > 1) it had file bindings or 2) because nobody really cared to deliver > signals to an event collector? > And *if* later requirements come, you don't need to change the API by > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new > XXEVENT-only submission structure. You create an API that automatically > makes that new abstraction work with POSIX poll/select, and you get epoll > support for free. Without even changing a bit in the epoll API. Well, we get epoll support for free, but we need to create tons of other interfaces and infrastructure for kernel users, and we need to change userspace anyway. But epoll support requires to have quite heavy bindings to file structure, so why don't we want to design new interface (since we need to change userspace anyway) so that it could allow to scale and be very memory
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > You've to excuse me if my memory is bad, but IIRC the whole discussion > > and loong benchmark feast born with you throwing a benchmark at Ingo > > (with kevent showing a 1.9x performance boost WRT epoll), not with you > > making any other point. > > So, how does it sound? > "Threadlets are bad for IO because kevent is 2 times faster than epoll?" > > I said threadlets are bad for IO (and we agreed that both approaches > shouldbe usedfor the maximum performance) because of rescheduling overhead - > tasks are quite heavy structuresa to move around - even pt_regs copy > takes more than event structure, but not because there is something in other > galaxy which might work faster than another something in another galaxy. > That was stupid even to think about that. Evgeny, other folks on this thread read what you said, so let's not drag this over. > > And if you really feel raw about the single O(nready) loop that epoll > > currently does, a new epoll_wait2 (or whatever) API could be used to > > deliver the event directly into a userspace buffer [1], directly from the > > poll callback, w/out extra delivery loops > > (IRQ/event->epoll_callback->event_buffer). > > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > > mlocked userspace buffer, or some kernel pages mapped to userspace. > > Callbacks never sleep - they add event into list just like current > implementation (maybe some lock must be changed from mutex to spinlock, > I do not rememeber), main problem is binding to the file structure, > which is heavy. I was referring to dropping an event directly to a userspace buffer, from the poll callback. If pages are not there, you might sleep, and you can't since the wakeup function holds a spinlock on the waitqueue head while looping through the waiters to issue the wakeup. Also, you don't know from where the poll wakeup is called. File binding heavy? The first, and by *far* biggest, source of events inside an event collector, of someone that cares about scalability, are sockets. And those are already files. Second would be AIO, and those (if performance figures agrees) can be hosted inside syslets/threadlets. Then you fall into the no-care category, where the extra 100 bytes do not make a case against the ability of using it with an existing POSIX infrastructure (poll/select). BTW, Linus made a signalfd sketch code time ago, to deliver signals to an fd. Code remained there and nobody cared. Question: Was it because 1) it had file bindings or 2) because nobody really cared to deliver signals to an event collector? And *if* later requirements come, you don't need to change the API by adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new XXEVENT-only submission structure. You create an API that automatically makes that new abstraction work with POSIX poll/select, and you get epoll support for free. Without even changing a bit in the epoll API. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > >Threadlets can work with any functionas a base - if it would be > >recv-like it will limit possible case for parallel programming, so you > >can code anything in threadlets - it is not only about IO. > > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. I'm not talking about cache warm up. And if we do - and that the whole freaking point of AIO - Linux IIRC pins freshly loaded clean pages anyway. So there would be problem but only under memory pressure. If you under memory pressure - you already lost the game and do not care about performance/what threads you are using. It is the whole "threadlets to threads on blocking" things doesn't sound convincing. It sounds more like "premature optimization". But anyway, not that I'm AIO specialist. For networking it is totally unnecessary since most applications which care have already rate control and buffer management built in. Network connections/sockets allows greater level of application control on what and how they do. Compared to blockdev's plain dumb read()/write() going through global cache. And not that (judging from interface) AIO changes that much - it is still dumb read() what IMHO makes no sense whatsoever to mmap() oriented Linux. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: > On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > >On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau > >([EMAIL PROTECTED]) wrote: > >> I'm not well versed in modern kernel development discussions, and > >> since you have put the thing into the networked context anyway, can > >> you please ask on lkml why (if they want threadlets solely for AIO) > >> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). > >> Developers already know the inteface, socket infrastructure is already > >> in kernel, etc. And it might do precisely what they want: access file > >> in disk cache - just like in case of socket it does access recv buffer > >> of socket. Why bother with implicit threads/waiting/etc - if all they > >> want some way to probe cache? > > > >Threadlets can work with any functionas a base - if it would be > >recv-like it will limit possible case for parallel programming, so you > >can code anything in threadlets - it is not only about IO. > > > > Ingo defined them as "plain function calls as long as they do not block". > > But when/what function could block? > > (1) File descriptors. Read. If data are in cache it wouldn't block. > Otherwise it would. Write. If there is space in cache it wouldn't > block. Otherwise it would. > > (2) Network sockets. Recv. If data are in buffer they wouldn't block. > Otherwise they would. Send. If there is space in send buffer it > wouldn't block. Otherwise it would. > > (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the > reliable local communication used mostly for control information > passing. If you have to block on such socket it most likely important > information anyway. (e.g. X server communication or sql query to SQL > server). (Or even less important here case of shell pipes.) And most > users here are all single threaded and I/O bound: they would gain > nothing from multi-threading - only PITA of added locking. > > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. > -- > Don't walk behind me, I may not lead. > Don't walk in front of me, I may not follow. > Just walk beside me and be my friend. >-- Albert Camus (attributed to) -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: > I'm not well versed in modern kernel development discussions, and > since you have put the thing into the networked context anyway, can > you please ask on lkml why (if they want threadlets solely for AIO) > not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). > Developers already know the inteface, socket infrastructure is already > in kernel, etc. And it might do precisely what they want: access file > in disk cache - just like in case of socket it does access recv buffer > of socket. Why bother with implicit threads/waiting/etc - if all they > want some way to probe cache? Threadlets can work with any functionas a base - if it would be recv-like it will limit possible case for parallel programming, so you can code anything in threadlets - it is not only about IO. Ingo defined them as "plain function calls as long as they do not block". But when/what function could block? (1) File descriptors. Read. If data are in cache it wouldn't block. Otherwise it would. Write. If there is space in cache it wouldn't block. Otherwise it would. (2) Network sockets. Recv. If data are in buffer they wouldn't block. Otherwise they would. Send. If there is space in send buffer it wouldn't block. Otherwise it would. (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the reliable local communication used mostly for control information passing. If you have to block on such socket it most likely important information anyway. (e.g. X server communication or sql query to SQL server). (Or even less important here case of shell pipes.) And most users here are all single threaded and I/O bound: they would gain nothing from multi-threading - only PITA of added locking. What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with "fast threads". -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > > > do we really want to have per process signalfs, timerfs and so on - each > > simple structure must be bound to a file, which becomes too cost. > > I may be old school, but if you ask me, and if you *really* want those > events, yes. Reason? Unix's everything-is-a-file rule, and being able to > use them with *existing* POSIX poll/select. Remember, not every app > requires huge scalability efforts, so working with simpler and familiar > APIs is always welcome. > The *only* thing that was not practical to have as fd, was block requests. > But maybe threadlets/syslets will handle those just fine, and close the gap. That means that we bind very small object like timer or signal to the whoe file structure - yes, as I stated - it is doable, but do we really have to create a file each time create_timer() or signal() is called? Signals as a filesystem are limited in that regard that we need to create additional structures to have signal number<->private data relations. I designed kevent to be as small as possible, so I removed file binding idea first. I do not say it is wrong or epoll (and threadlets) are broken (fsck, I hope people do understand that), but as is it can not handle that scenario, so it must be extended and/or a lot of other stuff written to be compatible with epoll design. Kevent has different design (which allows to work with old one though - there is a patch to implement epoll over kevent). > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > > > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi > > (davidel@xmailserver.org) wrote: > > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > > > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > > > > > I don't think he ever implied that. He was only suggesting that when you > > > post benchmarks, and even more when you make claims based on benchmarks, > > > you need to be extra carefull about what you measure. Otherwise the > > > external view that you give to others does not look good. > > > Kevent can be really faster than epoll, but if you post broken benchmarks > > > (that can be, unrealiable HTTP loaders, broken server implemenations, > > > etc..) and make claims based on that, the only effect that you have is to > > > lose your point. > > > > So, I only talked that kevent is superior compared to epoll because (and > > it is _main_ issue) of its ability to handle essentially any kind of > > events with very small overhead (the same as epoll has in struct file - > > list and spinlock) and without significant price of struct file binding > > to event. > > You've to excuse me if my memory is bad, but IIRC the whole discussion > and loong benchmark feast born with you throwing a benchmark at Ingo > (with kevent showing a 1.9x performance boost WRT epoll), not with you > making any other point. So, how does it sound? "Threadlets are bad for IO because kevent is 2 times faster than epoll?" I said threadlets are bad for IO (and we agreed that both approaches shouldbe usedfor the maximum performance) because of rescheduling overhead - tasks are quite heavy structuresa to move around - even pt_regs copy takes more than event structure, but not because there is something in other galaxy which might work faster than another something in another galaxy. That was stupid even to think about that. > As far as epoll not being able to handle other events. Said who? Of > course, with zero modifications, you can handle zero additional events. > With modifications, you can handle other events. But lets talk about those > other events. The *only* kind of event that ppl (and being the epoll > maintainer I tend to receive those requests) missed in epoll, was AIO > events, That's the *only* thing that was missed by real life application > developers. And if something like threadlets/syslets will prove effective, > the gap is closed WRT that requirement. > Epoll handle already the whole class of pollable devices inside the > kernel, and if you exclude block AIO, that's a pretty wide class already. > The *existing* f_op->poll subsystem can be used to deliver events at the > poll-head wakeup time (by using the "key" member of the poll callback), so > that you don't even need the extra f_op->poll call to fetch events. > And if you really feel raw about the single O(nready) loop that epoll > currently does, a new epoll_wait2 (or whatever) API could be used to > deliver the event directly into a userspace buffer [1], directly from the > poll callback, w/out extra delivery loops > (IRQ/event->epoll_callback->event_buffer). Signals, futexes, timers and userspace events I was requested to add into kevent, so far only futexes are missed because I was asked to freeze development so other hackers could check the project. > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > mlocked userspace buffer, or some kernel pages mapped to userspace. Callbacks never sleep - they add event into list just like current implementation (maybe some lock must be changed from mutex to spinlock, I do not rememeber), main problem is binding to the file structure, which is heavy. > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Ingo Molnar wrote: > * Davide Libenzi wrote: > > > [...] Status word and control bits should not be changed from > > underneath userspace AFAIK. [...] > > Note that the control bits do not just magically change during normal > FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense > to change those per-thread anyway. This is a non-issue anyway - what is > important is that the big bulk of 512 (or more) bytes of FPU state /are/ > callee-saved (both on 32-bit and on 64-bit), hence there's no need to > unlazy anything or to do expensive FPU state saves or other FPU juggling > around threadlet (or even syslet) use. Well, the unlazy/sync happen in any case later when we switch (given TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. Wouldn't it makes sense to carry over only the status word and the control bits eventually? Also, if the caller saves the whole context, and if we're scheduled while inside a system call (not totally unfrequent case), can't we implement a smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after schedule-in (do not do stts on this condition, so the newly scheduled task don't get a fault at all)? If the above conditions are true (no need context-copy for new head in async_exec), this should be possible too. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, 3 Mar 2007, Ingo Molnar wrote: * Davide Libenzi davidel@xmailserver.org wrote: [...] Status word and control bits should not be changed from underneath userspace AFAIK. [...] Note that the control bits do not just magically change during normal FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense to change those per-thread anyway. This is a non-issue anyway - what is important is that the big bulk of 512 (or more) bytes of FPU state /are/ callee-saved (both on 32-bit and on 64-bit), hence there's no need to unlazy anything or to do expensive FPU state saves or other FPU juggling around threadlet (or even syslet) use. Well, the unlazy/sync happen in any case later when we switch (given TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. Wouldn't it makes sense to carry over only the status word and the control bits eventually? Also, if the caller saves the whole context, and if we're scheduled while inside a system call (not totally unfrequent case), can't we implement a smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after schedule-in (do not do stts on this condition, so the newly scheduled task don't get a fault at all)? If the above conditions are true (no need context-copy for new head in async_exec), this should be possible too. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: Ingo, do you really think I will send mails with faked benchmarks? :)) I don't think he ever implied that. He was only suggesting that when you post benchmarks, and even more when you make claims based on benchmarks, you need to be extra carefull about what you measure. Otherwise the external view that you give to others does not look good. Kevent can be really faster than epoll, but if you post broken benchmarks (that can be, unrealiable HTTP loaders, broken server implemenations, etc..) and make claims based on that, the only effect that you have is to lose your point. So, I only talked that kevent is superior compared to epoll because (and it is _main_ issue) of its ability to handle essentially any kind of events with very small overhead (the same as epoll has in struct file - list and spinlock) and without significant price of struct file binding to event. You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. So, how does it sound? Threadlets are bad for IO because kevent is 2 times faster than epoll? I said threadlets are bad for IO (and we agreed that both approaches shouldbe usedfor the maximum performance) because of rescheduling overhead - tasks are quite heavy structuresa to move around - even pt_regs copy takes more than event structure, but not because there is something in other galaxy which might work faster than another something in another galaxy. That was stupid even to think about that. As far as epoll not being able to handle other events. Said who? Of course, with zero modifications, you can handle zero additional events. With modifications, you can handle other events. But lets talk about those other events. The *only* kind of event that ppl (and being the epoll maintainer I tend to receive those requests) missed in epoll, was AIO events, That's the *only* thing that was missed by real life application developers. And if something like threadlets/syslets will prove effective, the gap is closed WRT that requirement. Epoll handle already the whole class of pollable devices inside the kernel, and if you exclude block AIO, that's a pretty wide class already. The *existing* f_op-poll subsystem can be used to deliver events at the poll-head wakeup time (by using the key member of the poll callback), so that you don't even need the extra f_op-poll call to fetch events. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event-epoll_callback-event_buffer). Signals, futexes, timers and userspace events I was requested to add into kevent, so far only futexes are missed because I was asked to freeze development so other hackers could check the project. [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. Callbacks never sleep - they add event into list just like current implementation (maybe some lock must be changed from mutex to spinlock, I do not rememeber), main problem is binding to the file structure, which is heavy. - Davide -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: do we really want to have per process signalfs, timerfs and so on - each simple structure must be bound to a file, which becomes too cost. I may be old school, but if you ask me, and if you *really* want those events, yes. Reason? Unix's everything-is-a-file rule, and being able to use them with *existing* POSIX poll/select. Remember, not every app requires huge scalability efforts, so working with simpler and familiar APIs is always welcome. The *only* thing that was not practical to have as fd, was block requests. But maybe threadlets/syslets will handle those just fine, and close the gap. That means that we bind very small object like timer or signal to the whoe file structure - yes, as I stated - it is doable, but do we really have to create a file each time create_timer() or signal() is called? Signals as a filesystem are limited in that regard that we need to create additional structures to have signal number-private data relations. I designed kevent to be as small as possible, so I removed file binding idea first. I do not say it is wrong or epoll (and threadlets) are broken (fsck, I hope people do understand that), but as is it can not handle that scenario, so it must be extended and/or a lot of other stuff written to be compatible with epoll design. Kevent has different design (which allows to work with old one though - there is a patch to implement epoll over kevent). - Davide -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: I'm not well versed in modern kernel development discussions, and since you have put the thing into the networked context anyway, can you please ask on lkml why (if they want threadlets solely for AIO) not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). Developers already know the inteface, socket infrastructure is already in kernel, etc. And it might do precisely what they want: access file in disk cache - just like in case of socket it does access recv buffer of socket. Why bother with implicit threads/waiting/etc - if all they want some way to probe cache? Threadlets can work with any functionas a base - if it would be recv-like it will limit possible case for parallel programming, so you can code anything in threadlets - it is not only about IO. Ingo defined them as plain function calls as long as they do not block. But when/what function could block? (1) File descriptors. Read. If data are in cache it wouldn't block. Otherwise it would. Write. If there is space in cache it wouldn't block. Otherwise it would. (2) Network sockets. Recv. If data are in buffer they wouldn't block. Otherwise they would. Send. If there is space in send buffer it wouldn't block. Otherwise it would. (3) Pipes, fifos unix sockets. Unfortunately gain nothing since the reliable local communication used mostly for control information passing. If you have to block on such socket it most likely important information anyway. (e.g. X server communication or sql query to SQL server). (Or even less important here case of shell pipes.) And most users here are all single threaded and I/O bound: they would gain nothing from multi-threading - only PITA of added locking. What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: I'm not well versed in modern kernel development discussions, and since you have put the thing into the networked context anyway, can you please ask on lkml why (if they want threadlets solely for AIO) not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). Developers already know the inteface, socket infrastructure is already in kernel, etc. And it might do precisely what they want: access file in disk cache - just like in case of socket it does access recv buffer of socket. Why bother with implicit threads/waiting/etc - if all they want some way to probe cache? Threadlets can work with any functionas a base - if it would be recv-like it will limit possible case for parallel programming, so you can code anything in threadlets - it is not only about IO. Ingo defined them as plain function calls as long as they do not block. But when/what function could block? (1) File descriptors. Read. If data are in cache it wouldn't block. Otherwise it would. Write. If there is space in cache it wouldn't block. Otherwise it would. (2) Network sockets. Recv. If data are in buffer they wouldn't block. Otherwise they would. Send. If there is space in send buffer it wouldn't block. Otherwise it would. (3) Pipes, fifos unix sockets. Unfortunately gain nothing since the reliable local communication used mostly for control information passing. If you have to block on such socket it most likely important information anyway. (e.g. X server communication or sql query to SQL server). (Or even less important here case of shell pipes.) And most users here are all single threaded and I/O bound: they would gain nothing from multi-threading - only PITA of added locking. What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/3/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: Threadlets can work with any functionas a base - if it would be recv-like it will limit possible case for parallel programming, so you can code anything in threadlets - it is not only about IO. What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. I'm not talking about cache warm up. And if we do - and that the whole freaking point of AIO - Linux IIRC pins freshly loaded clean pages anyway. So there would be problem but only under memory pressure. If you under memory pressure - you already lost the game and do not care about performance/what threads you are using. It is the whole threadlets to threads on blocking things doesn't sound convincing. It sounds more like premature optimization. But anyway, not that I'm AIO specialist. For networking it is totally unnecessary since most applications which care have already rate control and buffer management built in. Network connections/sockets allows greater level of application control on what and how they do. Compared to blockdev's plain dumb read()/write() going through global cache. And not that (judging from interface) AIO changes that much - it is still dumb read() what IMHO makes no sense whatsoever to mmap() oriented Linux. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. So, how does it sound? Threadlets are bad for IO because kevent is 2 times faster than epoll? I said threadlets are bad for IO (and we agreed that both approaches shouldbe usedfor the maximum performance) because of rescheduling overhead - tasks are quite heavy structuresa to move around - even pt_regs copy takes more than event structure, but not because there is something in other galaxy which might work faster than another something in another galaxy. That was stupid even to think about that. Evgeny, other folks on this thread read what you said, so let's not drag this over. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event-epoll_callback-event_buffer). [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. Callbacks never sleep - they add event into list just like current implementation (maybe some lock must be changed from mutex to spinlock, I do not rememeber), main problem is binding to the file structure, which is heavy. I was referring to dropping an event directly to a userspace buffer, from the poll callback. If pages are not there, you might sleep, and you can't since the wakeup function holds a spinlock on the waitqueue head while looping through the waiters to issue the wakeup. Also, you don't know from where the poll wakeup is called. File binding heavy? The first, and by *far* biggest, source of events inside an event collector, of someone that cares about scalability, are sockets. And those are already files. Second would be AIO, and those (if performance figures agrees) can be hosted inside syslets/threadlets. Then you fall into the no-care category, where the extra 100 bytes do not make a case against the ability of using it with an existing POSIX infrastructure (poll/select). BTW, Linus made a signalfd sketch code time ago, to deliver signals to an fd. Code remained there and nobody cared. Question: Was it because 1) it had file bindings or 2) because nobody really cared to deliver signals to an event collector? And *if* later requirements come, you don't need to change the API by adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new XXEVENT-only submission structure. You create an API that automatically makes that new abstraction work with POSIX poll/select, and you get epoll support for free. Without even changing a bit in the epoll API. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. So, how does it sound? Threadlets are bad for IO because kevent is 2 times faster than epoll? I said threadlets are bad for IO (and we agreed that both approaches shouldbe usedfor the maximum performance) because of rescheduling overhead - tasks are quite heavy structuresa to move around - even pt_regs copy takes more than event structure, but not because there is something in other galaxy which might work faster than another something in another galaxy. That was stupid even to think about that. Evgeny, other folks on this thread read what you said, so let's not drag this over. Sure, I was wrong to start this again, but try to get my position - I really tired from trying to prove that I'm not a camel just because we had some misunderstanding at the start. I do think that threadlets are relly cool solution and are indeed very good approach for majority of the parallel processing, but my point is still that it is not a perfect solution for all tasks. Just to draw a line: kevent example is extrapolation of what can be achieved with event-driven model, but that does not mean that it must be _only_ used for AIO model - threadlets _and_ event driven model (yes, I accepted Ingo's point about its declining) is the best solution. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event-epoll_callback-event_buffer). [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. Callbacks never sleep - they add event into list just like current implementation (maybe some lock must be changed from mutex to spinlock, I do not rememeber), main problem is binding to the file structure, which is heavy. I was referring to dropping an event directly to a userspace buffer, from the poll callback. If pages are not there, you might sleep, and you can't since the wakeup function holds a spinlock on the waitqueue head while looping through the waiters to issue the wakeup. Also, you don't know from where the poll wakeup is called. Ugh, no, that is very limited solution - memory must be either pinned (which leads to dos and limited ring buffer), or callback must sleep. Actually in any way there _must_ exist a queue - if ring buffer is full event is not allowed to be dropped - it must be stored in some other place, for example in queue from where entries will be read (copied) which ring buffer will have entries (that is how it is implemented in kevent at least). File binding heavy? The first, and by *far* biggest, source of events inside an event collector, of someone that cares about scalability, are sockets. And those are already files. Second would be AIO, and those (if performance figures agrees) can be hosted inside syslets/threadlets. Then you fall into the no-care category, where the extra 100 bytes do not make a case against the ability of using it with an existing POSIX infrastructure (poll/select). Well, sockets are the files indeed, and sockets already are perfectly handled by epoll - but there are other users of petential interace - and it must be designed to scale in _any_ situation very well. Even if we right now do not have problems with some types of events, we must scale with any new one. BTW, Linus made a signalfd sketch code time ago, to deliver signals to an fd. Code remained there and nobody cared. Question: Was it because 1) it had file bindings or 2) because nobody really cared to deliver signals to an event collector? And *if* later requirements come, you don't need to change the API by adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new XXEVENT-only submission structure. You create an API that automatically makes that new abstraction work with POSIX poll/select, and you get epoll support for free. Without even changing a bit in the epoll API. Well, we get epoll support for free, but we need to create tons of other interfaces and infrastructure for kernel users, and we need to change userspace anyway. But epoll support requires to have quite heavy bindings to file structure, so why don't we want to design new interface (since we need to change userspace anyway) so that it could allow to scale and be very memory optimized from the beginning? - Davide -- Evgeniy Polyakov - To unsubscribe from this
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote: What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. Please don't take this the wrong way, but I don't think you understand the problem space that people are trying to address here. Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Further, not only do servers not want to block, they also want to cram a lot more requests into the kernel at once *for the kernel's benefit*. In particular, a server wants to issue a ton of stats and I/O in parallel so that the kernel can optimize which order to handle the requests. Finally, the biggest argument in favor of syslets/threadlets AFAICS is that -- if done correctly -- it would unify the AIO and normal IO paths in the kernel. The improved ease of long term maintenance on the kernel (and more test coverage, and more directed optimization, etc...) just for this point alone makes them worth considering for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really special cases of the entire package that the rest of us want. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: I was referring to dropping an event directly to a userspace buffer, from the poll callback. If pages are not there, you might sleep, and you can't since the wakeup function holds a spinlock on the waitqueue head while looping through the waiters to issue the wakeup. Also, you don't know from where the poll wakeup is called. Ugh, no, that is very limited solution - memory must be either pinned (which leads to dos and limited ring buffer), or callback must sleep. Actually in any way there _must_ exist a queue - if ring buffer is full event is not allowed to be dropped - it must be stored in some other place, for example in queue from where entries will be read (copied) which ring buffer will have entries (that is how it is implemented in kevent at least). I was not advocating for that, if you read carefully. The fact that epoll does not do that, should be a clear hint. The old /dev/epoll IIRC was only 10% faster than the current epoll under an *heavy* event frequency micro-bench like pipetest (and that version of epoll did not have the single pass over the ready set optimization). And /dev/epoll was delivering events *directly* on userspace visible (mmaped) memory in a zero-copy fashion. BTW, Linus made a signalfd sketch code time ago, to deliver signals to an fd. Code remained there and nobody cared. Question: Was it because 1) it had file bindings or 2) because nobody really cared to deliver signals to an event collector? And *if* later requirements come, you don't need to change the API by adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new XXEVENT-only submission structure. You create an API that automatically makes that new abstraction work with POSIX poll/select, and you get epoll support for free. Without even changing a bit in the epoll API. Well, we get epoll support for free, but we need to create tons of other interfaces and infrastructure for kernel users, and we need to change userspace anyway. Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. One of the reason about the Unix file abstraction interfaces, is that you do *not* have to plan and bloat interfaces before. As long as your new abstraction behave in a file-fashion, it can be automatically used with existing interfaces. And you create them *when* the need comes. [1] That was like 100 lines of code or so. See here: http://tinyurl.com/3yuna5 - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Sat, 3 Mar 2007, Davide Libenzi wrote: Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. One of the reason about the Unix file abstraction interfaces, is that you do *not* have to plan and bloat interfaces before. As long as your new abstraction behave in a file-fashion, it can be automatically used with existing interfaces. And you create them *when* the need comes. Now, if you don't mind, my spare time is really limited and I prefer to spend it looking at stuff the topic of this thread talks about. Even because the whole epoll/kevent discussion is heavily dependent on the fact that syslets/threadlets will or will not result a viable method for generic AIO. Savvy? - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On 3/3/07, Ray Lee [EMAIL PROTECTED] wrote: On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote: What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Why select/poll/epoll/friends do not work? I have programmed on both sides - user-space network servers and in-kernel network protocols - and never blocking thing was implemented in *nix in the times I was walking under table. One can poll() more or less *any* device in system. With frigging exception of - right - files. IOW for 75% of I/O problem doesn't exists since there is proper interface - e.g. sockets - in place. User-space-wise, check how squid (caching http proxy) does it: you have several (forked) instances to serve network requests and you have one/several disk I/O daemons. (So called diskd storeio) Why? Because you cannot poll() file descriptors, but you can poll unix socket connected to diskd. If diskd blocks, squid still can serve requests. How threadlets are better then pool of diskd instances? All nastiness of shared memory set loose... What I'm trying to get to. Threadlets wouldn't help existing single-threaded applications - what is about 95% of all applications. And multi-threaded applications would gain little because few real application create threads dynamically: creation need resources and can fail, uncontrollable thread spawning hurts overall manageability and additional care is needed regarding deadlocks/lock contentions proofing. (The category of applications which want the performance gain are also the applications which need to ensure greater stability over long non-stop runs. Uncontrollable dynamism helps nothing.) Having implemented several file servers - daemons serving file I/O to other daemons - I honestly hardly see any improvements. Now people configure such file servers to issue e.g. 10 file operations simultaneously - using pool of 10 threads. What threadlets change? In the end just to keep in check with threadlets I would need to issue pthread_join() after some number of threadlets created. And the latter number is the former e.g. 10. IOW, programmer-wise the implementation remain same - and all the limitations remain the same. And all overhead of user-space locking remain the same. (*) What's more, as having some limited experience of kernel programming, I fail to see what threadlets would simplify on kernel side. End result as I see it: user space becomes bit more complicated because of dynamic multi-threading and kernel-space becomes also more complicated because of the same added dynamism. (*) Hm... On other side, if application would be able to tell kernel to limit number of issued threadlets to N, then it might simplify the job. Application can tell kernel I need at most 10 blocking threadlets, block me if there are more and then dumbly throw I/O threadlets at kernel as they are coming in. And kernel would then put process to sleep if N+1 thredlets are blocking. That would definitely simplify the job in user-space: it wouldn't need to call pthread_join(). But it is still no replacement to poll()able file descriptor or truly async mmap(). -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
Ihar `Philips` Filipau wrote: On 3/3/07, Ray Lee [EMAIL PROTECTED] wrote: On 3/3/07, Ihar `Philips` Filipau [EMAIL PROTECTED] wrote: What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with fast threads. Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Why select/poll/epoll/friends do not work? I have programmed on both sides - user-space network servers and in-kernel network protocols - and never blocking thing was implemented in *nix in the times I was walking under table. Then you've never had to write something that watches serial port signals. Google on TIOCMIWAIT to see what I'm talking about. The only option for a userspace programmer to deal with that is to fork() or poll the signals every so many milliseconds. There are probably more easy examples, but that's the one off the top of my head that affected me. In short, this isn't just about network IO, this isn't just about file IO. One can poll() more or less *any* device in system. With frigging exception of - right - files. The problem is the more or less. Say you're right, and 95% of the system calls are either already asynchronous or non-blocking/poll()able. One of the questions on the table is how to extend it to the last 5%. User-space-wise, check how squid (caching http proxy) does it: you have several (forked) instances to serve network requests and you have one/several disk I/O daemons. (So called diskd storeio) Why? Because you cannot poll() file descriptors, but you can poll unix socket connected to diskd. If diskd blocks, squid still can serve requests. How threadlets are better then pool of diskd instances? All nastiness of shared memory set loose... Samba/lighttpd/git want to issue dozens of stats in parallel so that the kernel can have an opportunity to sort them better. Are you saying they should fork() a process per stat that they want to issue in parallel? What I'm trying to get to. Threadlets wouldn't help existing single-threaded applications - what is about 95% of all applications. Eh, I don't think that's right. Part of the reason threadlets and syslets are on the table because it may be a more efficient way to do AIO. And the differences between the syslet API and the current kernel Async IO API can be abstracted away by glibc, so that today's apps that do AIO would immediately benefit. What's more, as having some limited experience of kernel programming, I fail to see what threadlets would simplify on kernel side. You can yank the entire separate AIO path, and just treat them as another blocking API that syslets makes nonblocking. Immediate reduction of code, and everybody is now using the same code paths, which means higher test coverage and reduced maintenance cost. This last point is really important. Even if no extra functionality eventually makes it to userspace, this last point would still be enough to make the powers that be consider inclusion. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > Note that the control bits do not just magically change during normal > FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense > to change those per-thread anyway. This is a non-issue anyway - what is > important is that the big bulk of 512 (or more) bytes of FPU state /are/ > callee-saved (both on 32-bit and on 64-bit), hence there's no need to ^ caller-saved > unlazy anything or to do expensive FPU state saves or other FPU juggling > around threadlet (or even syslet) use. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > [...] Status word and control bits should not be changed from > underneath userspace AFAIK. [...] Note that the control bits do not just magically change during normal FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense to change those per-thread anyway. This is a non-issue anyway - what is important is that the big bulk of 512 (or more) bytes of FPU state /are/ callee-saved (both on 32-bit and on 64-bit), hence there's no need to unlazy anything or to do expensive FPU state saves or other FPU juggling around threadlet (or even syslet) use. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Nicholas Miell wrote: > On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote: > > On Fri, 2 Mar 2007, Nicholas Miell wrote: > > > > > The point Ingo was making is that the x86 ABI already requires the FPU > > > context to be saved before *all* function calls. > > > > I've not seen that among Ingo's points, but yeah some status is caller > > saved. But, aren't things like status word and control bits callee saved? > > If that's the case, it might require proper handling. > > > > Ingo mentioned it in one of the parts you cut out of your reply: > > > and here is where thinking about threadlets as a function call and not > > as an asynchronous context helps alot: the classic gcc convention for > > FPU use & function calls should apply: gcc does not call an external > > function with an in-use FPU stack/register, it always neatly unuses it, > > as no FPU register is callee-saved, all are caller-saved. > > The i386 psABI is ancient (i.e. it predates SSE, so no mention of the > XMM or MXCSR registers) and a bit vague (no mention at all of the FP > status word), but I'm fairly certain that Ingo is right. I'm not sure if that's the case. I'd be happy if it was, but I'm afraid it's not. Status word and control bits should not be changed from underneath userspace AFAIK. The ABI I remember tells me that those are callee saved. A quick gcc asm test tells me that too. And assuming that's the case, why don't we have a smarter unlazy_fpu() then, that avoid FPU context sync if we're scheduled while inside a syscall (this is no different than an enter inside sys_async_exec - userspace should have taken care of it)? IMO a syscall enter should not assume that userspace took care of saving the whole FPU context. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 05:36:01PM -0800, Nicholas Miell wrote: > > as an asynchronous context helps alot: the classic gcc convention for > > FPU use & function calls should apply: gcc does not call an external > > function with an in-use FPU stack/register, it always neatly unuses it, > > as no FPU register is callee-saved, all are caller-saved. > > The i386 psABI is ancient (i.e. it predates SSE, so no mention of the > XMM or MXCSR registers) and a bit vague (no mention at all of the FP > status word), but I'm fairly certain that Ingo is right. The FPU status word *must* be saved, as the rounding behaviour and error mode bits are assumed to be preserved. Iow, yes, there is state which is required. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote: > On Fri, 2 Mar 2007, Nicholas Miell wrote: > > > The point Ingo was making is that the x86 ABI already requires the FPU > > context to be saved before *all* function calls. > > I've not seen that among Ingo's points, but yeah some status is caller > saved. But, aren't things like status word and control bits callee saved? > If that's the case, it might require proper handling. > Ingo mentioned it in one of the parts you cut out of your reply: > and here is where thinking about threadlets as a function call and not > as an asynchronous context helps alot: the classic gcc convention for > FPU use & function calls should apply: gcc does not call an external > function with an in-use FPU stack/register, it always neatly unuses it, > as no FPU register is callee-saved, all are caller-saved. The i386 psABI is ancient (i.e. it predates SSE, so no mention of the XMM or MXCSR registers) and a bit vague (no mention at all of the FP status word), but I'm fairly certain that Ingo is right. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Nicholas Miell wrote: > The point Ingo was making is that the x86 ABI already requires the FPU > context to be saved before *all* function calls. I've not seen that among Ingo's points, but yeah some status is caller saved. But, aren't things like status word and control bits callee saved? If that's the case, it might require proper handling. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote: > On Fri, 2 Mar 2007, Ingo Molnar wrote: > > > > > * Davide Libenzi wrote: > > > > > I think that the "dirty" FPU context must, at least, follow the new > > > head. That's what the userspace sees, and you don't want an async_exec > > > to re-emerge with a different FPU context. > > > > well. I think there's some confusion about terminology, so please let me > > describe everything in detail. This is how execution goes: > > > > outer loop() { > > call_threadlet(); > > } > > > > this all runs in the 'head' context. call_threadlet() always switches to > > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, > > while executing the threadlet function, we block, then the > > threadlet-thread gets to keep the task (the threadlet stack and also the > > FPU), and blocks - and we pick a 'new head' from the thread pool and > > continue executing in that context - right after the call_threadlet() > > function, in the 'old' head's stack. I.e. it's as if we returned > > immediately from call_threadlet(), with a return code that signals that > > the 'threadlet went async'. > > > > now, the FPU state that was when the threadlet blocked is totally > > meaningless to the 'new head' - that FPU state is from the middle of the > > threadlet execution. > > For threadlets, it might be. Now think about a task wanting to dispatch N > parallel AIO requests as N independent syslets. > Think about this task having USEDFPU set, so the FPU context is dirty. > When it returns from async_exec, with one of the requests being become > sleepy, it needs to have the same FPU context it had when it entered, > otherwise it won't prolly be happy. > For the same reason a schedule() must preserve/sync the "prev" FPU > context, to be reloaded at the next FPU fault. The point Ingo was making is that the x86 ABI already requires the FPU context to be saved before *all* function calls. Unfortunately, this isn't true of other ABIs -- looking over the psABIs specs I have laying around, AMD64, PPC64, and MIPS require at least part of the FPU state to be preserved across function calls, and I'm sure this is also true of others. Then there's the other nasty details of new thread creation -- thankfully, the contents of the TLS isn't inherited from the parent thread, but it still needs to be initialized; not to mention all the other details involved in pthread creation and destruction. I don't see any way around the pthread issues other than making a libc upcall on return from the first system call that blocked. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/2/07, Davide Libenzi wrote: For threadlets, it might be. Now think about a task wanting to dispatch N parallel AIO requests as N independent syslets. Think about this task having USEDFPU set, so the FPU context is dirty. When it returns from async_exec, with one of the requests being become sleepy, it needs to have the same FPU context it had when it entered, otherwise it won't prolly be happy. For the same reason a schedule() must preserve/sync the "prev" FPU context, to be reloaded at the next FPU fault. And if you actually think this through, I think you will arrive at (a subset of) the conclusions I did a week ago: to keep the threadlets lightweight enough to schedule and migrate cheaply, they can't be allowed to "own" their own FPU and TLS context. They have to be allowed to _use_ the FPU (or they're useless) and to _use_ TLS (or they can't use any glibc wrapper around a syscall, since they practically all set the thread-local errno). But they have to "quiesce" the FPU and stash any thread-local state they want to keep on their stack before entering the next syscall, or else it'll get clobbered. Keep thinking, especially about FPU flags, and you'll see why threadlets spawned from the _same_ threadlet entrypoint should all run in the same pool of threads, one per CPU, while threadlets from _different_ entrypoints should never run in the same thread (FPU/TLS context). You'll see why threadlets in the same pool shouldn't be permitted to preempt one another except at syscalls that block, and the cost of preempting the real thread associated with one threadlet pool with another real thread associated with a different threadlet pool is the same as any other thread switch. At which point, threadlet pools are themselves first-class objects (to use the snake oil phrase), and might as well be enhanced to a data structure that has efficient operations for reprioritization, bulk cancellation, and all that jazz. Did I mention that there is actually quite a bit of prior art in this area, which makes a much better guide to the design of round wheels than micro-benchmarks do? Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > I think that the "dirty" FPU context must, at least, follow the new > > head. That's what the userspace sees, and you don't want an async_exec > > to re-emerge with a different FPU context. > > well. I think there's some confusion about terminology, so please let me > describe everything in detail. This is how execution goes: > > outer loop() { > call_threadlet(); > } > > this all runs in the 'head' context. call_threadlet() always switches to > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, > while executing the threadlet function, we block, then the > threadlet-thread gets to keep the task (the threadlet stack and also the > FPU), and blocks - and we pick a 'new head' from the thread pool and > continue executing in that context - right after the call_threadlet() > function, in the 'old' head's stack. I.e. it's as if we returned > immediately from call_threadlet(), with a return code that signals that > the 'threadlet went async'. > > now, the FPU state that was when the threadlet blocked is totally > meaningless to the 'new head' - that FPU state is from the middle of the > threadlet execution. For threadlets, it might be. Now think about a task wanting to dispatch N parallel AIO requests as N independent syslets. Think about this task having USEDFPU set, so the FPU context is dirty. When it returns from async_exec, with one of the requests being become sleepy, it needs to have the same FPU context it had when it entered, otherwise it won't prolly be happy. For the same reason a schedule() must preserve/sync the "prev" FPU context, to be reloaded at the next FPU fault. > > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU > > context with an early unlazy_fpu(), *and* copy the sync'd FPU context > > to the new head. This should really be a fork of the dirty FPU context > > IMO, and should only happen if the USEDFPU bit is set. > > why? The only effect this will have is a slowdown :) The FPU context > from the middle of the threadlet function is totally meaningless to the > 'new head'. It might be anything. (although in practice system calls are > almost never called with a truly in-use FPU.) See above ;) - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > I think that the "dirty" FPU context must, at least, follow the new > head. That's what the userspace sees, and you don't want an async_exec > to re-emerge with a different FPU context. well. I think there's some confusion about terminology, so please let me describe everything in detail. This is how execution goes: outer loop() { call_threadlet(); } this all runs in the 'head' context. call_threadlet() always switches to the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, while executing the threadlet function, we block, then the threadlet-thread gets to keep the task (the threadlet stack and also the FPU), and blocks - and we pick a 'new head' from the thread pool and continue executing in that context - right after the call_threadlet() function, in the 'old' head's stack. I.e. it's as if we returned immediately from call_threadlet(), with a return code that signals that the 'threadlet went async'. now, the FPU state that was when the threadlet blocked is totally meaningless to the 'new head' - that FPU state is from the middle of the threadlet execution. and here is where thinking about threadlets as a function call and not as an asynchronous context helps alot: the classic gcc convention for FPU use & function calls should apply: gcc does not call an external function with an in-use FPU stack/register, it always neatly unuses it, as no FPU register is callee-saved, all are caller-saved. > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU > context with an early unlazy_fpu(), *and* copy the sync'd FPU context > to the new head. This should really be a fork of the dirty FPU context > IMO, and should only happen if the USEDFPU bit is set. why? The only effect this will have is a slowdown :) The FPU context from the middle of the threadlet function is totally meaningless to the 'new head'. It might be anything. (although in practice system calls are almost never called with a truly in-use FPU.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > [...] We're still missing proper FPU context switch in the > > move_user_context(). [...] > > yeah - i'm starting to be of the opinion that the FPU context should > stay with the threadlet, exclusively. I.e. when calling a threadlet, the > 'outer loop' (the event loop) should not leak FPU context into the > threadlet and then expect it to be replicated from whatever random point > the threadlet ended up sleeping at. It would be possible, but it just > makes no sense. What makes most sense is to just keep the FPU context > with the threadlet, and to let the 'new head' use an initial (unused) > FPU context. And it's in fact the threadlet that will most likely have > an acrive FPU context across a system call, not the outer loop. In other > words: no special FPU support needed at all for threadlets (i.e. no > flipping needed even) - this behavior just naturally happens in the > current implementation. Hm? I think that the "dirty" FPU context must, at least, follow the new head. That's what the userspace sees, and you don't want an async_exec to re-emerge with a different FPU context. I think it should also follow the async thread (old, going-to-sleep, thread), since a threadlet might have that dirtied, and as a consequence it'll want to find it back when it's re-scheduled. So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU context with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head. This should really be a fork of the dirty FPU context IMO, and should only happen if the USEDFPU bit is set. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > [...] We're still missing proper FPU context switch in the > move_user_context(). [...] yeah - i'm starting to be of the opinion that the FPU context should stay with the threadlet, exclusively. I.e. when calling a threadlet, the 'outer loop' (the event loop) should not leak FPU context into the threadlet and then expect it to be replicated from whatever random point the threadlet ended up sleeping at. It would be possible, but it just makes no sense. What makes most sense is to just keep the FPU context with the threadlet, and to let the 'new head' use an initial (unused) FPU context. And it's in fact the threadlet that will most likely have an acrive FPU context across a system call, not the outer loop. In other words: no special FPU support needed at all for threadlets (i.e. no flipping needed even) - this behavior just naturally happens in the current implementation. Hm? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Davide Libenzi wrote: > And if you really feel raw about the single O(nready) loop that epoll > currently does, a new epoll_wait2 (or whatever) API could be used to > deliver the event directly into a userspace buffer [1], directly from the > poll callback, w/out extra delivery loops > (IRQ/event->epoll_callback->event_buffer). And if you ever wonder from where the "epoll" name came, it came from the old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs everywhere events where needed and was delivering those events in O(1) *directly* on a user visible (mmap'd) buffer, in a zero-copy fashion. The old /dev/epoll was faster the the current epoll, but the latter was chosen because despite being sloghtly slower, it had support for every pollable device, *without* adding more plugs into the existing code. Performance and code maintainance are not to be taken disjointly whenever you evaluate a solution. That's the reason I got excited about this new generic AIO slution. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > After your changes epoll increased to 5k. > > Can we please stop this pointless episode of benchmarketing, where every > mail of yours shows different results and you even deny having said > something which you clearly said just a few days ago? At this point i > simply cannot trust the numbers you are posting, nor is the discussion > style you are following productive in any way in my opinion. Agreed. Can we focus on the topic here? We're still missing proper FPU context switch in the move_user_context(). In v6? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > do we really want to have per process signalfs, timerfs and so on - each > simple structure must be bound to a file, which becomes too cost. I may be old school, but if you ask me, and if you *really* want those events, yes. Reason? Unix's everything-is-a-file rule, and being able to use them with *existing* POSIX poll/select. Remember, not every app requires huge scalability efforts, so working with simpler and familiar APIs is always welcome. The *only* thing that was not practical to have as fd, was block requests. But maybe threadlets/syslets will handle those just fine, and close the gap. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi > (davidel@xmailserver.org) wrote: > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > > > I don't think he ever implied that. He was only suggesting that when you > > post benchmarks, and even more when you make claims based on benchmarks, > > you need to be extra carefull about what you measure. Otherwise the > > external view that you give to others does not look good. > > Kevent can be really faster than epoll, but if you post broken benchmarks > > (that can be, unrealiable HTTP loaders, broken server implemenations, > > etc..) and make claims based on that, the only effect that you have is to > > lose your point. > > So, I only talked that kevent is superior compared to epoll because (and > it is _main_ issue) of its ability to handle essentially any kind of > events with very small overhead (the same as epoll has in struct file - > list and spinlock) and without significant price of struct file binding > to event. You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. As far as epoll not being able to handle other events. Said who? Of course, with zero modifications, you can handle zero additional events. With modifications, you can handle other events. But lets talk about those other events. The *only* kind of event that ppl (and being the epoll maintainer I tend to receive those requests) missed in epoll, was AIO events, That's the *only* thing that was missed by real life application developers. And if something like threadlets/syslets will prove effective, the gap is closed WRT that requirement. Epoll handle already the whole class of pollable devices inside the kernel, and if you exclude block AIO, that's a pretty wide class already. The *existing* f_op->poll subsystem can be used to deliver events at the poll-head wakeup time (by using the "key" member of the poll callback), so that you don't even need the extra f_op->poll call to fetch events. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event->epoll_callback->event_buffer). [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > > > [...] The numbers are still highly suspect - and we are already > > > > > down from the prior claim of kevent being almost twice as fast > > > > > to a 25% difference. > > > > > > > > Btw, there were never almost twice perfromance increase - epoll in > > > > my tests always showed 4-5 thousands requests per second, kevent - > > > > up to 7 thausands. > > > > > > i'm referring to your claim in this mail of yours from 4 days ago > > > for example: > > > > > > http://lkml.org/lkml/2007/2/25/116 > > > > > > "But note, that on my athlon64 3500 test machine kevent is about 7900 > > > requests per second compared to 4000+ epoll, so expect a challenge." > > > > > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is > > > "almost twice". > > > > After your changes epoll increased to 5k. > > Can we please stop this pointless episode of benchmarketing, where every > mail of yours shows different results and you even deny having said > something which you clearly said just a few days ago? At this point i > simply cannot trust the numbers you are posting, nor is the discussion > style you are following productive in any way in my opinion. I just show what I see in tests - I do not perform deep analysis of that, since I do not see why it should be done - it is not fake, it is not fantasy - real behaviour which is observed in my test machine, if it will sudenly change I will report it. Btw, I showed cases when epoll behaved better than kevent and performance was unbeatable 9k requests per second - I do not know, why it happend - maybe some cache related issues, other process all slept in once, increased radiation or strong wind blew away my bad aura - it is not reproducible on demand too. > (you are never ever wrong, and if you are proven wrong on topic A you > claim it is an irrelevant topic (without even admitting you were wrong > about it) and you point to topic B claiming it's the /real/ topic you > talked about all along. And along the way you are slandering other > projects like epoll and threadlets, distorting the discussion. This kind > of keep-the-ball-moving discussion style is effective in politics but > IMO it's a waste of time when developing a kernel.) Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends up with politics :) What we are talking about - we try to insult each other with something, that was supposed to be said after some assumption on theoretical mental exercise? I can only laugh on that :) Ingo, I never ever tried to show that something is broken - that is fantasy based on straight words, not on the real intension. I never said epoll is broken. Absolutely. I never said threadlet is broken. Absolutely. I just showed that it is not (in my opinion) right decision to use threadlets for IO based model instead of event driven - it is not based on kevent performance (I _never_ stated it as a main factor - kevent was only an example of event driven model, you were confused it with kevent AIO, which is different beast), but instead on experience with nptl threads and linuxthreads, and related rescheduling overhead compared to userspace one. I showed kevent as a possible usage scenario - since it does support own AIO. And you started to fight against it in every detail, since you think kevent is not a good idea to handle AIO model - well, it can be pefectly correct, I showed kevent AIO (please do not think that kevent and kevent AIO are the same - the latter is just one of the possible users I implemented, it only uses kevent to deliver completion event to userspace) as possible AIO implementation, but not _kevent_ itself. But somehow we ended with binding to me some words I never said and ideas I never based my assumptions on... I do not really think you even remotely wanted to make any somehow personal assumptions on what we had discussed. We even concluded, that perfect IO model should use both approaches to really scale - both threadlets with its on-demand-only rescheduling, and event driven ring. You pointed your opinion on kevents - well, I can not agree with it, but that is your right not to like something. Let's not continue bad practice of kicking each other just because there were some problematic roots which noone even remember correctly - let's do not make a mistake of pointing something personal out of trivial bits - if you will be in Russia of around any time soon I will happily buy you a beer or what you prefer :) So, let's just draw a line: kevent was showed to people, and its performance, although flacky, is a bit faster than epoll. Threadlets bound to any event driven ring do not show any performance degradation in network driven setup with small number of reschedulings with all advantages of simpler programming. So, repeating myself, both models (not kevent and
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > Even if kevent has the same speed, it still allows to handle _any_ > > kind of events without any major surgery - a very tiny structure of > > lock and list head and you can process your own kernel event in > > userspace with timers, signals, io events, private userspace events > > and others without races and invention of differnet hacks for > > different types - _this_ is main point. > > did it ever occur to you to ... extend epoll? To speed it up? To add a > new wait syscall to it? Instead of introducing a whole new parallel > framework? Yes, I thought about its extension more than a year ago before started kevent, but epoll() is absolutely based on file structure and its file_operations with poll methodt, so it is quite impossible to work with sockets to implement network AIO. Eventually it had gathered a lot of other systems - do we really want to have per process signalfs, timerfs and so on - each simple structure must be bound to a file, which becomes too cost. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > [...] The numbers are still highly suspect - and we are already > > > > down from the prior claim of kevent being almost twice as fast > > > > to a 25% difference. > > > > > > Btw, there were never almost twice perfromance increase - epoll in > > > my tests always showed 4-5 thousands requests per second, kevent - > > > up to 7 thausands. > > > > i'm referring to your claim in this mail of yours from 4 days ago > > for example: > > > > http://lkml.org/lkml/2007/2/25/116 > > > > "But note, that on my athlon64 3500 test machine kevent is about 7900 > > requests per second compared to 4000+ epoll, so expect a challenge." > > > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is > > "almost twice". > > After your changes epoll increased to 5k. Can we please stop this pointless episode of benchmarketing, where every mail of yours shows different results and you even deny having said something which you clearly said just a few days ago? At this point i simply cannot trust the numbers you are posting, nor is the discussion style you are following productive in any way in my opinion. (you are never ever wrong, and if you are proven wrong on topic A you claim it is an irrelevant topic (without even admitting you were wrong about it) and you point to topic B claiming it's the /real/ topic you talked about all along. And along the way you are slandering other projects like epoll and threadlets, distorting the discussion. This kind of keep-the-ball-moving discussion style is effective in politics but IMO it's a waste of time when developing a kernel.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > Even if kevent has the same speed, it still allows to handle _any_ > kind of events without any major surgery - a very tiny structure of > lock and list head and you can process your own kernel event in > userspace with timers, signals, io events, private userspace events > and others without races and invention of differnet hacks for > different types - _this_ is main point. did it ever occur to you to ... extend epoll? To speed it up? To add a new wait syscall to it? Instead of introducing a whole new parallel framework? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) wrote: > Maybe. It is not up to me to decide. But "it is faster" is _not_ the > only merge criterium. Of course not! Even if kevent has the same speed, it still allows to handle _any_ kind of events without any major surgery - a very tiny structure of lock and list head and you can process your own kernel event in userspace with timers, signals, io events, private userspace events and others without races and invention of differnet hacks for different types - _this_ is main point. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Hi! > > > > If you can replace them with something simpler, and no worse than 10% > > > > slower in worst case, then go ahead. (We actually tried to do that at > > > > some point, only to realize that efence stresses vm subsystem in very > > > > unexpected/unfriendly way). > > > > > > Agh, only 10% in the worst case. > > > I think you can not even imagine what tricks network uses to get at > > > least aditional 1% out of the box. > > > > Yep? Feel free to rewrite networking to assembly on Eugenix. That > > should get you 1% improvement. If you reserve few registers to be only > > used by kernel (not allowed by userspace), you can speedup networking > > 5%, too. Ouch and you could turn off MMU, that is sure way to get few > > more percent improvement in your networking case. > > It is not _my_ networking, but taht one you use everyday in every Linux > box. Notice which tricks are used to remove single byte from > sk_buff. Ok, so tricks were worth it in sk_buff case. > It is called optimization, and if it does us a single plus it must be > implemented. Not all people have magical fear of new things. But that does not mean "every optimalization must be implemented". Only optimalizations that are "worth it" are... > > > Using such logic you can just abandon any further development, since it > > > work as is right now. > > > > Stop trying to pervert my logic. > > Ugh? :) > I just say in simple words your 'we do not need something if adds 10%, > but is complex to understand'. Yes... but that does not mean "stop development". You are still free to clean up the code _while_ making it faster. > > If your code is so complex that it is almost impossible to use from > > userspace, that is good enough reason not to be merged. "But it is 3% > > faster if..." is not a good-enough argument. > > Is it enough for you? > > epoll 4794.23 req/sec > kevent 6468.95 req/sec Maybe. It is not up to me to decide. But "it is faster" is _not_ the only merge criterium. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > I don't think he ever implied that. He was only suggesting that when you > post benchmarks, and even more when you make claims based on benchmarks, > you need to be extra carefull about what you measure. Otherwise the > external view that you give to others does not look good. > Kevent can be really faster than epoll, but if you post broken benchmarks > (that can be, unrealiable HTTP loaders, broken server implemenations, > etc..) and make claims based on that, the only effect that you have is to > lose your point. We seems to move far away from original topic - I never built any assumptions on top of kevent _performance_ - kevent is a logical extrapolation of the epoll, I only showed that event driven model can be fast and it outperforms threadlet one - after we changed topic we were unable to actually test threadlets in networking environment, since the only test I ran showed that threadlest do not reschedule at all, and Ingo's tests showed small number of reschedulings. So, I only talked that kevent is superior compared to epoll because (and it is _main_ issue) of its ability to handle essentially any kind of events with very small overhead (the same as epoll has in struct file - list and spinlock) and without significant price of struct file binding to event. I did not want and do not want to hurt anyone (even Ingo, although he is against kevent :), but my opinion is that thread moved from nice discussion about threads and events with jokes and fun into quite angry word throwings, and that is too good - let's make it fun again. I'm not a native english speaker (and do not use a dictionary), so it is quite possible that some my phrases were not exactly nice, but it was unintentional (at least not very) :) Peace? > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: Ingo, do you really think I will send mails with faked benchmarks? :)) I don't think he ever implied that. He was only suggesting that when you post benchmarks, and even more when you make claims based on benchmarks, you need to be extra carefull about what you measure. Otherwise the external view that you give to others does not look good. Kevent can be really faster than epoll, but if you post broken benchmarks (that can be, unrealiable HTTP loaders, broken server implemenations, etc..) and make claims based on that, the only effect that you have is to lose your point. We seems to move far away from original topic - I never built any assumptions on top of kevent _performance_ - kevent is a logical extrapolation of the epoll, I only showed that event driven model can be fast and it outperforms threadlet one - after we changed topic we were unable to actually test threadlets in networking environment, since the only test I ran showed that threadlest do not reschedule at all, and Ingo's tests showed small number of reschedulings. So, I only talked that kevent is superior compared to epoll because (and it is _main_ issue) of its ability to handle essentially any kind of events with very small overhead (the same as epoll has in struct file - list and spinlock) and without significant price of struct file binding to event. I did not want and do not want to hurt anyone (even Ingo, although he is against kevent :), but my opinion is that thread moved from nice discussion about threads and events with jokes and fun into quite angry word throwings, and that is too good - let's make it fun again. I'm not a native english speaker (and do not use a dictionary), so it is quite possible that some my phrases were not exactly nice, but it was unintentional (at least not very) :) Peace? - Davide -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
Hi! If you can replace them with something simpler, and no worse than 10% slower in worst case, then go ahead. (We actually tried to do that at some point, only to realize that efence stresses vm subsystem in very unexpected/unfriendly way). Agh, only 10% in the worst case. I think you can not even imagine what tricks network uses to get at least aditional 1% out of the box. Yep? Feel free to rewrite networking to assembly on Eugenix. That should get you 1% improvement. If you reserve few registers to be only used by kernel (not allowed by userspace), you can speedup networking 5%, too. Ouch and you could turn off MMU, that is sure way to get few more percent improvement in your networking case. It is not _my_ networking, but taht one you use everyday in every Linux box. Notice which tricks are used to remove single byte from sk_buff. Ok, so tricks were worth it in sk_buff case. It is called optimization, and if it does us a single plus it must be implemented. Not all people have magical fear of new things. But that does not mean every optimalization must be implemented. Only optimalizations that are worth it are... Using such logic you can just abandon any further development, since it work as is right now. Stop trying to pervert my logic. Ugh? :) I just say in simple words your 'we do not need something if adds 10%, but is complex to understand'. Yes... but that does not mean stop development. You are still free to clean up the code _while_ making it faster. If your code is so complex that it is almost impossible to use from userspace, that is good enough reason not to be merged. But it is 3% faster if... is not a good-enough argument. Is it enough for you? epoll 4794.23 req/sec kevent 6468.95 req/sec Maybe. It is not up to me to decide. But it is faster is _not_ the only merge criterium. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) wrote: Maybe. It is not up to me to decide. But it is faster is _not_ the only merge criterium. Of course not! Even if kevent has the same speed, it still allows to handle _any_ kind of events without any major surgery - a very tiny structure of lock and list head and you can process your own kernel event in userspace with timers, signals, io events, private userspace events and others without races and invention of differnet hacks for different types - _this_ is main point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
* Evgeniy Polyakov [EMAIL PROTECTED] wrote: Even if kevent has the same speed, it still allows to handle _any_ kind of events without any major surgery - a very tiny structure of lock and list head and you can process your own kernel event in userspace with timers, signals, io events, private userspace events and others without races and invention of differnet hacks for different types - _this_ is main point. did it ever occur to you to ... extend epoll? To speed it up? To add a new wait syscall to it? Instead of introducing a whole new parallel framework? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
* Evgeniy Polyakov [EMAIL PROTECTED] wrote: [...] The numbers are still highly suspect - and we are already down from the prior claim of kevent being almost twice as fast to a 25% difference. Btw, there were never almost twice perfromance increase - epoll in my tests always showed 4-5 thousands requests per second, kevent - up to 7 thausands. i'm referring to your claim in this mail of yours from 4 days ago for example: http://lkml.org/lkml/2007/2/25/116 But note, that on my athlon64 3500 test machine kevent is about 7900 requests per second compared to 4000+ epoll, so expect a challenge. no matter how i look at it, but 7900 is 1.9 times 4000 - which is almost twice. After your changes epoll increased to 5k. Can we please stop this pointless episode of benchmarketing, where every mail of yours shows different results and you even deny having said something which you clearly said just a few days ago? At this point i simply cannot trust the numbers you are posting, nor is the discussion style you are following productive in any way in my opinion. (you are never ever wrong, and if you are proven wrong on topic A you claim it is an irrelevant topic (without even admitting you were wrong about it) and you point to topic B claiming it's the /real/ topic you talked about all along. And along the way you are slandering other projects like epoll and threadlets, distorting the discussion. This kind of keep-the-ball-moving discussion style is effective in politics but IMO it's a waste of time when developing a kernel.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: * Evgeniy Polyakov [EMAIL PROTECTED] wrote: Even if kevent has the same speed, it still allows to handle _any_ kind of events without any major surgery - a very tiny structure of lock and list head and you can process your own kernel event in userspace with timers, signals, io events, private userspace events and others without races and invention of differnet hacks for different types - _this_ is main point. did it ever occur to you to ... extend epoll? To speed it up? To add a new wait syscall to it? Instead of introducing a whole new parallel framework? Yes, I thought about its extension more than a year ago before started kevent, but epoll() is absolutely based on file structure and its file_operations with poll methodt, so it is quite impossible to work with sockets to implement network AIO. Eventually it had gathered a lot of other systems - do we really want to have per process signalfs, timerfs and so on - each simple structure must be bound to a file, which becomes too cost. Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: * Evgeniy Polyakov [EMAIL PROTECTED] wrote: [...] The numbers are still highly suspect - and we are already down from the prior claim of kevent being almost twice as fast to a 25% difference. Btw, there were never almost twice perfromance increase - epoll in my tests always showed 4-5 thousands requests per second, kevent - up to 7 thausands. i'm referring to your claim in this mail of yours from 4 days ago for example: http://lkml.org/lkml/2007/2/25/116 But note, that on my athlon64 3500 test machine kevent is about 7900 requests per second compared to 4000+ epoll, so expect a challenge. no matter how i look at it, but 7900 is 1.9 times 4000 - which is almost twice. After your changes epoll increased to 5k. Can we please stop this pointless episode of benchmarketing, where every mail of yours shows different results and you even deny having said something which you clearly said just a few days ago? At this point i simply cannot trust the numbers you are posting, nor is the discussion style you are following productive in any way in my opinion. I just show what I see in tests - I do not perform deep analysis of that, since I do not see why it should be done - it is not fake, it is not fantasy - real behaviour which is observed in my test machine, if it will sudenly change I will report it. Btw, I showed cases when epoll behaved better than kevent and performance was unbeatable 9k requests per second - I do not know, why it happend - maybe some cache related issues, other process all slept in once, increased radiation or strong wind blew away my bad aura - it is not reproducible on demand too. (you are never ever wrong, and if you are proven wrong on topic A you claim it is an irrelevant topic (without even admitting you were wrong about it) and you point to topic B claiming it's the /real/ topic you talked about all along. And along the way you are slandering other projects like epoll and threadlets, distorting the discussion. This kind of keep-the-ball-moving discussion style is effective in politics but IMO it's a waste of time when developing a kernel.) Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends up with politics :) What we are talking about - we try to insult each other with something, that was supposed to be said after some assumption on theoretical mental exercise? I can only laugh on that :) Ingo, I never ever tried to show that something is broken - that is fantasy based on straight words, not on the real intension. I never said epoll is broken. Absolutely. I never said threadlet is broken. Absolutely. I just showed that it is not (in my opinion) right decision to use threadlets for IO based model instead of event driven - it is not based on kevent performance (I _never_ stated it as a main factor - kevent was only an example of event driven model, you were confused it with kevent AIO, which is different beast), but instead on experience with nptl threads and linuxthreads, and related rescheduling overhead compared to userspace one. I showed kevent as a possible usage scenario - since it does support own AIO. And you started to fight against it in every detail, since you think kevent is not a good idea to handle AIO model - well, it can be pefectly correct, I showed kevent AIO (please do not think that kevent and kevent AIO are the same - the latter is just one of the possible users I implemented, it only uses kevent to deliver completion event to userspace) as possible AIO implementation, but not _kevent_ itself. But somehow we ended with binding to me some words I never said and ideas I never based my assumptions on... I do not really think you even remotely wanted to make any somehow personal assumptions on what we had discussed. We even concluded, that perfect IO model should use both approaches to really scale - both threadlets with its on-demand-only rescheduling, and event driven ring. You pointed your opinion on kevents - well, I can not agree with it, but that is your right not to like something. Let's not continue bad practice of kicking each other just because there were some problematic roots which noone even remember correctly - let's do not make a mistake of pointing something personal out of trivial bits - if you will be in Russia of around any time soon I will happily buy you a beer or what you prefer :) So, let's just draw a line: kevent was showed to people, and its performance, although flacky, is a bit faster than epoll. Threadlets bound to any event driven ring do not show any performance degradation in network driven setup with small number of reschedulings with all advantages of simpler programming. So, repeating myself, both models (not kevent and threadlet, but event driven and thread based) should be used to achieve the maximum
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: Ingo, do you really think I will send mails with faked benchmarks? :)) I don't think he ever implied that. He was only suggesting that when you post benchmarks, and even more when you make claims based on benchmarks, you need to be extra carefull about what you measure. Otherwise the external view that you give to others does not look good. Kevent can be really faster than epoll, but if you post broken benchmarks (that can be, unrealiable HTTP loaders, broken server implemenations, etc..) and make claims based on that, the only effect that you have is to lose your point. So, I only talked that kevent is superior compared to epoll because (and it is _main_ issue) of its ability to handle essentially any kind of events with very small overhead (the same as epoll has in struct file - list and spinlock) and without significant price of struct file binding to event. You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. As far as epoll not being able to handle other events. Said who? Of course, with zero modifications, you can handle zero additional events. With modifications, you can handle other events. But lets talk about those other events. The *only* kind of event that ppl (and being the epoll maintainer I tend to receive those requests) missed in epoll, was AIO events, That's the *only* thing that was missed by real life application developers. And if something like threadlets/syslets will prove effective, the gap is closed WRT that requirement. Epoll handle already the whole class of pollable devices inside the kernel, and if you exclude block AIO, that's a pretty wide class already. The *existing* f_op-poll subsystem can be used to deliver events at the poll-head wakeup time (by using the key member of the poll callback), so that you don't even need the extra f_op-poll call to fetch events. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event-epoll_callback-event_buffer). [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: do we really want to have per process signalfs, timerfs and so on - each simple structure must be bound to a file, which becomes too cost. I may be old school, but if you ask me, and if you *really* want those events, yes. Reason? Unix's everything-is-a-file rule, and being able to use them with *existing* POSIX poll/select. Remember, not every app requires huge scalability efforts, so working with simpler and familiar APIs is always welcome. The *only* thing that was not practical to have as fd, was block requests. But maybe threadlets/syslets will handle those just fine, and close the gap. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: After your changes epoll increased to 5k. Can we please stop this pointless episode of benchmarketing, where every mail of yours shows different results and you even deny having said something which you clearly said just a few days ago? At this point i simply cannot trust the numbers you are posting, nor is the discussion style you are following productive in any way in my opinion. Agreed. Can we focus on the topic here? We're still missing proper FPU context switch in the move_user_context(). In v6? - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Davide Libenzi wrote: And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event-epoll_callback-event_buffer). And if you ever wonder from where the epoll name came, it came from the old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs everywhere events where needed and was delivering those events in O(1) *directly* on a user visible (mmap'd) buffer, in a zero-copy fashion. The old /dev/epoll was faster the the current epoll, but the latter was chosen because despite being sloghtly slower, it had support for every pollable device, *without* adding more plugs into the existing code. Performance and code maintainance are not to be taken disjointly whenever you evaluate a solution. That's the reason I got excited about this new generic AIO slution. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
* Davide Libenzi davidel@xmailserver.org wrote: [...] We're still missing proper FPU context switch in the move_user_context(). [...] yeah - i'm starting to be of the opinion that the FPU context should stay with the threadlet, exclusively. I.e. when calling a threadlet, the 'outer loop' (the event loop) should not leak FPU context into the threadlet and then expect it to be replicated from whatever random point the threadlet ended up sleeping at. It would be possible, but it just makes no sense. What makes most sense is to just keep the FPU context with the threadlet, and to let the 'new head' use an initial (unused) FPU context. And it's in fact the threadlet that will most likely have an acrive FPU context across a system call, not the outer loop. In other words: no special FPU support needed at all for threadlets (i.e. no flipping needed even) - this behavior just naturally happens in the current implementation. Hm? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: * Davide Libenzi davidel@xmailserver.org wrote: [...] We're still missing proper FPU context switch in the move_user_context(). [...] yeah - i'm starting to be of the opinion that the FPU context should stay with the threadlet, exclusively. I.e. when calling a threadlet, the 'outer loop' (the event loop) should not leak FPU context into the threadlet and then expect it to be replicated from whatever random point the threadlet ended up sleeping at. It would be possible, but it just makes no sense. What makes most sense is to just keep the FPU context with the threadlet, and to let the 'new head' use an initial (unused) FPU context. And it's in fact the threadlet that will most likely have an acrive FPU context across a system call, not the outer loop. In other words: no special FPU support needed at all for threadlets (i.e. no flipping needed even) - this behavior just naturally happens in the current implementation. Hm? I think that the dirty FPU context must, at least, follow the new head. That's what the userspace sees, and you don't want an async_exec to re-emerge with a different FPU context. I think it should also follow the async thread (old, going-to-sleep, thread), since a threadlet might have that dirtied, and as a consequence it'll want to find it back when it's re-scheduled. So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU context with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head. This should really be a fork of the dirty FPU context IMO, and should only happen if the USEDFPU bit is set. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
* Davide Libenzi davidel@xmailserver.org wrote: I think that the dirty FPU context must, at least, follow the new head. That's what the userspace sees, and you don't want an async_exec to re-emerge with a different FPU context. well. I think there's some confusion about terminology, so please let me describe everything in detail. This is how execution goes: outer loop() { call_threadlet(); } this all runs in the 'head' context. call_threadlet() always switches to the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, while executing the threadlet function, we block, then the threadlet-thread gets to keep the task (the threadlet stack and also the FPU), and blocks - and we pick a 'new head' from the thread pool and continue executing in that context - right after the call_threadlet() function, in the 'old' head's stack. I.e. it's as if we returned immediately from call_threadlet(), with a return code that signals that the 'threadlet went async'. now, the FPU state that was when the threadlet blocked is totally meaningless to the 'new head' - that FPU state is from the middle of the threadlet execution. and here is where thinking about threadlets as a function call and not as an asynchronous context helps alot: the classic gcc convention for FPU use function calls should apply: gcc does not call an external function with an in-use FPU stack/register, it always neatly unuses it, as no FPU register is callee-saved, all are caller-saved. So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU context with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head. This should really be a fork of the dirty FPU context IMO, and should only happen if the USEDFPU bit is set. why? The only effect this will have is a slowdown :) The FPU context from the middle of the threadlet function is totally meaningless to the 'new head'. It might be anything. (although in practice system calls are almost never called with a truly in-use FPU.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: * Davide Libenzi davidel@xmailserver.org wrote: I think that the dirty FPU context must, at least, follow the new head. That's what the userspace sees, and you don't want an async_exec to re-emerge with a different FPU context. well. I think there's some confusion about terminology, so please let me describe everything in detail. This is how execution goes: outer loop() { call_threadlet(); } this all runs in the 'head' context. call_threadlet() always switches to the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, while executing the threadlet function, we block, then the threadlet-thread gets to keep the task (the threadlet stack and also the FPU), and blocks - and we pick a 'new head' from the thread pool and continue executing in that context - right after the call_threadlet() function, in the 'old' head's stack. I.e. it's as if we returned immediately from call_threadlet(), with a return code that signals that the 'threadlet went async'. now, the FPU state that was when the threadlet blocked is totally meaningless to the 'new head' - that FPU state is from the middle of the threadlet execution. For threadlets, it might be. Now think about a task wanting to dispatch N parallel AIO requests as N independent syslets. Think about this task having USEDFPU set, so the FPU context is dirty. When it returns from async_exec, with one of the requests being become sleepy, it needs to have the same FPU context it had when it entered, otherwise it won't prolly be happy. For the same reason a schedule() must preserve/sync the prev FPU context, to be reloaded at the next FPU fault. So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU context with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head. This should really be a fork of the dirty FPU context IMO, and should only happen if the USEDFPU bit is set. why? The only effect this will have is a slowdown :) The FPU context from the middle of the threadlet function is totally meaningless to the 'new head'. It might be anything. (although in practice system calls are almost never called with a truly in-use FPU.) See above ;) - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/