Re: [ck] Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On 3/12/07, jos poortvliet <[EMAIL PROTECTED]> wrote: Op Monday 12 March 2007, schreef Con Kolivas: > On Tuesday 13 March 2007 01:14, Al Boldi wrote: > > Con Kolivas wrote: > > > > > The higher priority one always get 6-7ms whereas the lower priority > > > > > one runs 6-7ms and then one larger perfectly bound expiration > > > > > amount. Basically exactly as I'd expect. The higher priority task > > > > > gets precisely RR_INTERVAL maximum latency whereas the lower > > > > > priority task gets RR_INTERVAL min and full expiration (according > > > > > to the virtual deadline) as a maximum. That's exactly how I intend > > > > > it to work. Yes I realise that the max latency ends up being longer > > > > > intermittently on the niced task but that's -in my opinion- > > > > > perfectly fine as a compromise to ensure the nice 0 one always gets > > > > > low latency. > > > > > > > > I think, it should be possible to spread this max expiration latency > > > > across the rotation, should it not? > > > > > > There is a way that I toyed with of creating maps of slots to use for > > > each different priority, but it broke the O(1) nature of the virtual > > > deadline management. Minimising algorithmic complexity seemed more > > > important to maintain than getting slightly better latency spreads for > > > niced tasks. It also appeared to be less cache friendly in design. I > > > could certainly try and implement it but how much importance are we to > > > place on latency of niced tasks? Are you aware of any usage scenario > > > where latency sensitive tasks are ever significantly niced in the real > > > world? > > > > It only takes one negatively nice'd proc to affect X adversely. > > I have an idea. Give me some time to code up my idea. Lack of sleep is > making me very unpleasant. You're excited by RSDL and the positive comments, aren't you? Well, don't forget to sleep, sleeping makes ppl smarter you know ;-) IIRC, about 2 or three years ago (or maybe on the 2.6.10 timeframe), there was a patch which managed to pass the interactive from one app to another when there was a pipe or udp connection between them. This meant that a marked-as-interactive xterm would, when blocked waiting for an Xserver response, transfer some of its interactiveness to the Xserver, and aparently it worked very good for desktop workloads so, maybe adapting it for this new scheduler would be good. -- Greetz, Antonio Vargas aka winden of network http://network.amigascne.org/ [EMAIL PROTECTED] [EMAIL PROTECTED] Every day, every year you have to work you have to study you have to scene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On 3/12/07, jos poortvliet [EMAIL PROTECTED] wrote: Op Monday 12 March 2007, schreef Con Kolivas: On Tuesday 13 March 2007 01:14, Al Boldi wrote: Con Kolivas wrote: The higher priority one always get 6-7ms whereas the lower priority one runs 6-7ms and then one larger perfectly bound expiration amount. Basically exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL maximum latency whereas the lower priority task gets RR_INTERVAL min and full expiration (according to the virtual deadline) as a maximum. That's exactly how I intend it to work. Yes I realise that the max latency ends up being longer intermittently on the niced task but that's -in my opinion- perfectly fine as a compromise to ensure the nice 0 one always gets low latency. I think, it should be possible to spread this max expiration latency across the rotation, should it not? There is a way that I toyed with of creating maps of slots to use for each different priority, but it broke the O(1) nature of the virtual deadline management. Minimising algorithmic complexity seemed more important to maintain than getting slightly better latency spreads for niced tasks. It also appeared to be less cache friendly in design. I could certainly try and implement it but how much importance are we to place on latency of niced tasks? Are you aware of any usage scenario where latency sensitive tasks are ever significantly niced in the real world? It only takes one negatively nice'd proc to affect X adversely. I have an idea. Give me some time to code up my idea. Lack of sleep is making me very unpleasant. You're excited by RSDL and the positive comments, aren't you? Well, don't forget to sleep, sleeping makes ppl smarter you know ;-) IIRC, about 2 or three years ago (or maybe on the 2.6.10 timeframe), there was a patch which managed to pass the interactive from one app to another when there was a pipe or udp connection between them. This meant that a marked-as-interactive xterm would, when blocked waiting for an Xserver response, transfer some of its interactiveness to the Xserver, and aparently it worked very good for desktop workloads so, maybe adapting it for this new scheduler would be good. -- Greetz, Antonio Vargas aka winden of network http://network.amigascne.org/ [EMAIL PROTECTED] [EMAIL PROTECTED] Every day, every year you have to work you have to study you have to scene. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: State of Linux graphics
On 9/1/05, Alan Cox <[EMAIL PROTECTED]> wrote: > On Iau, 2005-09-01 at 08:00 +0200, Antonio Vargas wrote: > > 2. whole screen z-buffer, for depth comparison between the pixels > > generated from each window. > > That one I question in part - if the rectangles are (as is typically the > case) large then the Z buffer just ups the memory accesses. I guess for > round windows it might be handy. > There are multiple ways to enhance the speed for zbuffer: 1. Use an hierarchical z-buffer Divide the screen in 16x16 pixel tiles, and then a per-tile minimum value. When rendering a poly, you first check the tile-z against the poly-z and if it fails you can skip 256 pixels in one go. 2. Use scanline-major rendering: for_each_scanline{ clear_z_for_scanline(); for_each_polygon{ draw_pixels_for_current_polygon_and scanline(); } } This is easily done by modeling the scanliner with a coroutine for each polygon to be painted. The zbuffer is reduced to a scanline and is reused for all scanlines, so it's rather fast :) -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: State of Linux graphics
On 9/1/05, Ian Romanick <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Allen Akin wrote: > > On Wed, Aug 31, 2005 at 02:06:54PM -0700, Keith Packard wrote: > > | > > | ...So far, 3D driver work has proceeded almost entirely on the > > | newest documented hardware that people could get. Going back and > > | spending months optimizing software 3D rendering code so that it works > > | as fast as software 2D code seems like a thankless task. > > > > Jon's right about this: If you can accelerate a given simple function > > (blending, say) for a 2D driver, you can accelerate that same function > > in a Mesa driver for a comparable amount of effort, and deliver a > > similar benefit to apps. (More apps, in fact, since it helps > > OpenGL-based apps as well as Cairo-based apps.) > > The difference is that there is a much larger number of state > combinations possible in OpenGL than in something stripped down for > "just 2D". That can make it more difficult to know where to spend the > time tuning. I've spent a fair amount of time looking at Mesa's texture > blending code, so I know this to be true. > > The real route forward is to dig deeper into run-time code generation. > There are a large number of possible combinations, but they all look > pretty similar. This is ideal for run-time code gen. The problem is > that writing correct, tuned assembly for this stuff takes a pretty > experience developer, and writing correct, tuned code generation > routines takes an even more experienced developer. Experienced and more > experienced developers are, alas, in short supply. Ian, the easy way would be to concentrate on 2d-like operations and optimize them by hand. I mean if _we_ are developing the opegl-using application (xserver-over-opengl), we already know what opengl operations and moder are needed, so we can concentrate on coding them in software. And if this means that we have to detect the case when a triangle is z-constant, then so be it. Using an OSX desktop everyday and having experience on software-graphics for small machines, and assuming OSX is drawing the screen just by rendering each window to a offscreen-buffer and then compositing, our needs are: 1. offscreen buffers, that can be drawn into. we don't really need anything fancy, just be able to point the drawing operations to another memory space. they should be any size, not just power-of-two. 2. whole screen z-buffer, for depth comparison between the pixels generated from each window. 3. texture+alpha (RGBA) triangles, using any offscreen buffer as a texture. texturing from a non-power-of-two texture is not that difficult anymore since about '96 or '97. 4. alpha blending, where the incoming alpha is used as a blending factor with this equation: scr_color_new = scr_color * (1-alpha) + tex_color * alpha. 1+2+3 gives us a basic 3d-esque desktop. adding 4 provides the dropshadows ;) But, 3d software rendering is easily speeded-up by not using z-buffer, which is a PITA. Two aproaches for solving this: a. Just sort the polys (they are just 2 polys per window) back to front and draw at screen-buffer flip. This is easy. Previous work I did sugests you can reach 16fps for a 320x192x8bit screen with a 10 mips machine ([EMAIL PROTECTED]). b. Implement a scanline zbuffer, where we have to paint by scanlines instead of whole triangles. Drawing is delayed until screen-buffer flip and then we have an outer loop for each screen scanline, middle loop for each poly that is affected and inner loop for each pixel from that poly in that scanline. Software rendering is just detecting the common case and coding a proper code for it. It's not really that difficult to reach memory-speed if you simply forget about implementing all combinations of graphics modes. > BTW, Alan, when are you going to start writing code again? >:) > > > So long as people are encouraged by word and deed to spend their time on > > "2D" drivers, Mesa drivers will be further starved for resources and the > > belief that OpenGL has nothing to offer "2D" apps will become > > self-fulfilling. > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.2.6 (GNU/Linux) > > iD8DBQFDFnFQX1gOwKyEAw8RAgZsAJ9MoKf+JTX4OGrybrhD+i2axstONgCghwih > /Bln/u55IJb3BMWBwVTA3sk= > =k086 > -END PGP SIGNATURE- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: State of Linux graphics
On 9/1/05, Ian Romanick [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Allen Akin wrote: On Wed, Aug 31, 2005 at 02:06:54PM -0700, Keith Packard wrote: | | ...So far, 3D driver work has proceeded almost entirely on the | newest documented hardware that people could get. Going back and | spending months optimizing software 3D rendering code so that it works | as fast as software 2D code seems like a thankless task. Jon's right about this: If you can accelerate a given simple function (blending, say) for a 2D driver, you can accelerate that same function in a Mesa driver for a comparable amount of effort, and deliver a similar benefit to apps. (More apps, in fact, since it helps OpenGL-based apps as well as Cairo-based apps.) The difference is that there is a much larger number of state combinations possible in OpenGL than in something stripped down for just 2D. That can make it more difficult to know where to spend the time tuning. I've spent a fair amount of time looking at Mesa's texture blending code, so I know this to be true. The real route forward is to dig deeper into run-time code generation. There are a large number of possible combinations, but they all look pretty similar. This is ideal for run-time code gen. The problem is that writing correct, tuned assembly for this stuff takes a pretty experience developer, and writing correct, tuned code generation routines takes an even more experienced developer. Experienced and more experienced developers are, alas, in short supply. Ian, the easy way would be to concentrate on 2d-like operations and optimize them by hand. I mean if _we_ are developing the opegl-using application (xserver-over-opengl), we already know what opengl operations and moder are needed, so we can concentrate on coding them in software. And if this means that we have to detect the case when a triangle is z-constant, then so be it. Using an OSX desktop everyday and having experience on software-graphics for small machines, and assuming OSX is drawing the screen just by rendering each window to a offscreen-buffer and then compositing, our needs are: 1. offscreen buffers, that can be drawn into. we don't really need anything fancy, just be able to point the drawing operations to another memory space. they should be any size, not just power-of-two. 2. whole screen z-buffer, for depth comparison between the pixels generated from each window. 3. texture+alpha (RGBA) triangles, using any offscreen buffer as a texture. texturing from a non-power-of-two texture is not that difficult anymore since about '96 or '97. 4. alpha blending, where the incoming alpha is used as a blending factor with this equation: scr_color_new = scr_color * (1-alpha) + tex_color * alpha. 1+2+3 gives us a basic 3d-esque desktop. adding 4 provides the dropshadows ;) But, 3d software rendering is easily speeded-up by not using z-buffer, which is a PITA. Two aproaches for solving this: a. Just sort the polys (they are just 2 polys per window) back to front and draw at screen-buffer flip. This is easy. Previous work I did sugests you can reach 16fps for a 320x192x8bit screen with a 10 mips machine ([EMAIL PROTECTED]). b. Implement a scanline zbuffer, where we have to paint by scanlines instead of whole triangles. Drawing is delayed until screen-buffer flip and then we have an outer loop for each screen scanline, middle loop for each poly that is affected and inner loop for each pixel from that poly in that scanline. Software rendering is just detecting the common case and coding a proper code for it. It's not really that difficult to reach memory-speed if you simply forget about implementing all combinations of graphics modes. BTW, Alan, when are you going to start writing code again? :) So long as people are encouraged by word and deed to spend their time on 2D drivers, Mesa drivers will be further starved for resources and the belief that OpenGL has nothing to offer 2D apps will become self-fulfilling. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFDFnFQX1gOwKyEAw8RAgZsAJ9MoKf+JTX4OGrybrhD+i2axstONgCghwih /Bln/u55IJb3BMWBwVTA3sk= =k086 -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: State of Linux graphics
On 9/1/05, Alan Cox [EMAIL PROTECTED] wrote: On Iau, 2005-09-01 at 08:00 +0200, Antonio Vargas wrote: 2. whole screen z-buffer, for depth comparison between the pixels generated from each window. That one I question in part - if the rectangles are (as is typically the case) large then the Z buffer just ups the memory accesses. I guess for round windows it might be handy. There are multiple ways to enhance the speed for zbuffer: 1. Use an hierarchical z-buffer Divide the screen in 16x16 pixel tiles, and then a per-tile minimum value. When rendering a poly, you first check the tile-z against the poly-z and if it fails you can skip 256 pixels in one go. 2. Use scanline-major rendering: for_each_scanline{ clear_z_for_scanline(); for_each_polygon{ draw_pixels_for_current_polygon_and scanline(); } } This is easily done by modeling the scanliner with a coroutine for each polygon to be painted. The zbuffer is reduced to a scanline and is reused for all scanlines, so it's rather fast :) -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Virtualization patches, set 4
On 8/24/05, Zachary Amsden <[EMAIL PROTECTED]> wrote: > Transparent paravirtualization patches, set 4. This batch includes > mostly MMU hooks that can be used by the hypervisor for page allocation, > and allows the kernel to be compiled to step out of the way of the > hypervisor by making a hole in linear address space. > > Patches are based off 2.6.13-rc6-mm2; I've tested i386 PAE and non-PAE > as well as um-i386. Although these are mostly i386 specific, some of > the concepts are starting to apply to virtualization of other > architectures as well. Zach, have you had a look at the mac-on-linux virtualizer? It's coded for arch-ppc but most probably some of these generic concepts could get merged together... -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Virtualization patches, set 4
On 8/24/05, Zachary Amsden [EMAIL PROTECTED] wrote: Transparent paravirtualization patches, set 4. This batch includes mostly MMU hooks that can be used by the hypervisor for page allocation, and allows the kernel to be compiled to step out of the way of the hypervisor by making a hole in linear address space. Patches are based off 2.6.13-rc6-mm2; I've tested i386 PAE and non-PAE as well as um-i386. Although these are mostly i386 specific, some of the concepts are starting to apply to virtualization of other architectures as well. Zach, have you had a look at the mac-on-linux virtualizer? It's coded for arch-ppc but most probably some of these generic concepts could get merged together... -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: a 15 GB file on tmpfs
On 7/20/05, Erik Mouw <[EMAIL PROTECTED]> wrote: > On Wed, Jul 20, 2005 at 01:35:07PM +, Miquel van Smoorenburg wrote: > > In article <[EMAIL PROTECTED]>, > > Erik Mouw <[EMAIL PROTECTED]> wrote: > > >On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote: > > >AFAIK you can't use a 15 GB tmpfs on i386 because large memory support > > >is basically a hack to support multiple 4GB memory spaces (some VM guru > > >correct me if I'm wrong). > > > > I'm no VM guru but I have a 32 bit machine here with 8 GB of > > memory and 8 GB of swap: > > > > # mount -t tmpfs -o size=$((12*1024*1024*1024)) tmpfs /mnt > > # df > > Filesystem 1K-blocks Used Available Use% Mounted on > > /dev/sda1 19228276 1200132 17051396 7% / > > tmpfs 12582912 0 12582912 0% /mnt > > > > There you go, a 12 GB tmpfs. I haven't tried to create a 12 GB > > file on it, though, since this is a production machine and it > > needs the memory .. > > I stand corrected. > > > So yes that appears to work just fine. > > The question is if it's a good idea to use a 15GB tmpfs on a 32 bit > i386 class machine. I guess a real 64 bit machine will be much faster > in handling suchs amounts of data simply because you don't have to go > through the hurdles needed to address such memory on i386. > > > Erik > On 32bit: you would have to use read() and write() or mmap() munmap() mremap() to perform your own paging, since you can't fit 15GB on a 4GB address space. On 64bit: you would simply mmap() the whole file and you are done. Most probably the cost of programming and debugging the hand-made paging on 32bit machines will cost more than the difference for a 64bit machine. -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: a 15 GB file on tmpfs
On 7/20/05, Erik Mouw <[EMAIL PROTECTED]> wrote: > On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote: > > I have a 15 GB file which I want to place in memory via tmpfs. I want to do > > this because I need to have this data accessible with a very low seek time. > > That should be no problem on a 64 bit architecture. > > > I want to know if this is possible before spending 10,000 euros on a machine > > that has 16 GB of memory. > > If you want to spend that amount of money on memory anyway, the extra > cost for an AMD64 machine isn't that large. > > > The machine we plan to buy is a HP Proliant Xeon machine and I want to run a > > 32 bit linux kernel on it (the xeon we want doesn't have the 64-bit stuff > > yet) > > AFAIK you can't use a 15 GB tmpfs on i386 because large memory support > is basically a hack to support multiple 4GB memory spaces (some VM guru > correct me if I'm wrong). Just get an Athlon64 machine and run a 64 bit > kernel on it. If compatibility is a problem, you can still run a 32 bit > i386 userland on an x86_64 kernel. > > > Erik > > -- > +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- > | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands > - Bastian, Erik is dead-on on that one: go 64bit and forget all worries about your 15GB filesize. Just don't forget to look not only x86-64 (intel or amd) but also itanium, ppc64 and s390 machines, you never know about surprises! -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: a 15 GB file on tmpfs
On 7/20/05, Erik Mouw [EMAIL PROTECTED] wrote: On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote: I have a 15 GB file which I want to place in memory via tmpfs. I want to do this because I need to have this data accessible with a very low seek time. That should be no problem on a 64 bit architecture. I want to know if this is possible before spending 10,000 euros on a machine that has 16 GB of memory. If you want to spend that amount of money on memory anyway, the extra cost for an AMD64 machine isn't that large. The machine we plan to buy is a HP Proliant Xeon machine and I want to run a 32 bit linux kernel on it (the xeon we want doesn't have the 64-bit stuff yet) AFAIK you can't use a 15 GB tmpfs on i386 because large memory support is basically a hack to support multiple 4GB memory spaces (some VM guru correct me if I'm wrong). Just get an Athlon64 machine and run a 64 bit kernel on it. If compatibility is a problem, you can still run a 32 bit i386 userland on an x86_64 kernel. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - Bastian, Erik is dead-on on that one: go 64bit and forget all worries about your 15GB filesize. Just don't forget to look not only x86-64 (intel or amd) but also itanium, ppc64 and s390 machines, you never know about surprises! -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: a 15 GB file on tmpfs
On 7/20/05, Erik Mouw [EMAIL PROTECTED] wrote: On Wed, Jul 20, 2005 at 01:35:07PM +, Miquel van Smoorenburg wrote: In article [EMAIL PROTECTED], Erik Mouw [EMAIL PROTECTED] wrote: On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote: AFAIK you can't use a 15 GB tmpfs on i386 because large memory support is basically a hack to support multiple 4GB memory spaces (some VM guru correct me if I'm wrong). I'm no VM guru but I have a 32 bit machine here with 8 GB of memory and 8 GB of swap: # mount -t tmpfs -o size=$((12*1024*1024*1024)) tmpfs /mnt # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 19228276 1200132 17051396 7% / tmpfs 12582912 0 12582912 0% /mnt There you go, a 12 GB tmpfs. I haven't tried to create a 12 GB file on it, though, since this is a production machine and it needs the memory .. I stand corrected. So yes that appears to work just fine. The question is if it's a good idea to use a 15GB tmpfs on a 32 bit i386 class machine. I guess a real 64 bit machine will be much faster in handling suchs amounts of data simply because you don't have to go through the hurdles needed to address such memory on i386. Erik On 32bit: you would have to use read() and write() or mmap() munmap() mremap() to perform your own paging, since you can't fit 15GB on a 4GB address space. On 64bit: you would simply mmap() the whole file and you are done. Most probably the cost of programming and debugging the hand-made paging on 32bit machines will cost more than the difference for a 64bit machine. -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: prefetch on ppc64
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras <[EMAIL PROTECTED]> wrote: > Serge E. Hallyn writes: > > > While investigating the inordinate performance impact one of my patches > > seemed to be having, we tracked it down to two hlist_for_each_entry > > loops, and finally to the prefetch instruction in the loop. > > I would be interested to know what results you get if you leave the > loops using hlist_for_each_entry but change prefetch() and prefetchw() > to do the dcbt or dcbtst instruction only if the address is non-zero, > like this: > > static inline void prefetch(const void *x) > { > if (x) > __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x)); > } > > static inline void prefetchw(const void *x) > { > if (x) > __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x)); > } > > It seems that doing a prefetch on a NULL pointer, while it doesn't > cause a fault, does waste time looking for a translation of the zero > address. > > Paul. Don't know exactly about power5, but G5 processor is described on IBM docs as doing automatic whole-page prefetch read-ahead when detecting linear accesses. -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: prefetch on ppc64
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras [EMAIL PROTECTED] wrote: Serge E. Hallyn writes: While investigating the inordinate performance impact one of my patches seemed to be having, we tracked it down to two hlist_for_each_entry loops, and finally to the prefetch instruction in the loop. I would be interested to know what results you get if you leave the loops using hlist_for_each_entry but change prefetch() and prefetchw() to do the dcbt or dcbtst instruction only if the address is non-zero, like this: static inline void prefetch(const void *x) { if (x) __asm__ __volatile__ (dcbt 0,%0 : : r (x)); } static inline void prefetchw(const void *x) { if (x) __asm__ __volatile__ (dcbtst 0,%0 : : r (x)); } It seems that doing a prefetch on a NULL pointer, while it doesn't cause a fault, does waste time looking for a translation of the zero address. Paul. Don't know exactly about power5, but G5 processor is described on IBM docs as doing automatic whole-page prefetch read-ahead when detecting linear accesses. -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/