Re: [ck] Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread Antonio Vargas

On 3/12/07, jos poortvliet <[EMAIL PROTECTED]> wrote:

Op Monday 12 March 2007, schreef Con Kolivas:
> On Tuesday 13 March 2007 01:14, Al Boldi wrote:
> > Con Kolivas wrote:
> > > > > The higher priority one always get 6-7ms whereas the lower priority
> > > > > one runs 6-7ms and then one larger perfectly bound expiration
> > > > > amount. Basically exactly as I'd expect. The higher priority task
> > > > > gets precisely RR_INTERVAL maximum latency whereas the lower
> > > > > priority task gets RR_INTERVAL min and full expiration (according
> > > > > to the virtual deadline) as a maximum. That's exactly how I intend
> > > > > it to work. Yes I realise that the max latency ends up being longer
> > > > > intermittently on the niced task but that's -in my opinion-
> > > > > perfectly fine as a compromise to ensure the nice 0 one always gets
> > > > > low latency.
> > > >
> > > > I think, it should be possible to spread this max expiration latency
> > > > across the rotation, should it not?
> > >
> > > There is a way that I toyed with of creating maps of slots to use for
> > > each different priority, but it broke the O(1) nature of the virtual
> > > deadline management. Minimising algorithmic complexity seemed more
> > > important to maintain than getting slightly better latency spreads for
> > > niced tasks. It also appeared to be less cache friendly in design. I
> > > could certainly try and implement it but how much importance are we to
> > > place on latency of niced tasks? Are you aware of any usage scenario
> > > where latency sensitive tasks are ever significantly niced in the real
> > > world?
> >
> > It only takes one negatively nice'd proc to affect X adversely.
>
> I have an idea. Give me some time to code up my idea. Lack of sleep is
> making me very unpleasant.

You're excited by RSDL and the positive comments, aren't you? Well, don't
forget to sleep, sleeping makes ppl smarter you know ;-)




IIRC, about 2 or three years ago (or maybe on the 2.6.10 timeframe),
there was a patch which managed to pass the interactive from one app
to another when there was a pipe or udp connection between them. This
meant that a marked-as-interactive xterm would, when blocked waiting
for an Xserver response, transfer some of its interactiveness to the
Xserver, and aparently it worked very good for desktop workloads so,
maybe adapting it for this new scheduler would be good.



--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Every day, every year
you have to work
you have to study
you have to scene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread Antonio Vargas

On 3/12/07, jos poortvliet [EMAIL PROTECTED] wrote:

Op Monday 12 March 2007, schreef Con Kolivas:
 On Tuesday 13 March 2007 01:14, Al Boldi wrote:
  Con Kolivas wrote:
 The higher priority one always get 6-7ms whereas the lower priority
 one runs 6-7ms and then one larger perfectly bound expiration
 amount. Basically exactly as I'd expect. The higher priority task
 gets precisely RR_INTERVAL maximum latency whereas the lower
 priority task gets RR_INTERVAL min and full expiration (according
 to the virtual deadline) as a maximum. That's exactly how I intend
 it to work. Yes I realise that the max latency ends up being longer
 intermittently on the niced task but that's -in my opinion-
 perfectly fine as a compromise to ensure the nice 0 one always gets
 low latency.
   
I think, it should be possible to spread this max expiration latency
across the rotation, should it not?
  
   There is a way that I toyed with of creating maps of slots to use for
   each different priority, but it broke the O(1) nature of the virtual
   deadline management. Minimising algorithmic complexity seemed more
   important to maintain than getting slightly better latency spreads for
   niced tasks. It also appeared to be less cache friendly in design. I
   could certainly try and implement it but how much importance are we to
   place on latency of niced tasks? Are you aware of any usage scenario
   where latency sensitive tasks are ever significantly niced in the real
   world?
 
  It only takes one negatively nice'd proc to affect X adversely.

 I have an idea. Give me some time to code up my idea. Lack of sleep is
 making me very unpleasant.

You're excited by RSDL and the positive comments, aren't you? Well, don't
forget to sleep, sleeping makes ppl smarter you know ;-)




IIRC, about 2 or three years ago (or maybe on the 2.6.10 timeframe),
there was a patch which managed to pass the interactive from one app
to another when there was a pipe or udp connection between them. This
meant that a marked-as-interactive xterm would, when blocked waiting
for an Xserver response, transfer some of its interactiveness to the
Xserver, and aparently it worked very good for desktop workloads so,
maybe adapting it for this new scheduler would be good.



--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Every day, every year
you have to work
you have to study
you have to scene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of Linux graphics

2005-09-01 Thread Antonio Vargas
On 9/1/05, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Iau, 2005-09-01 at 08:00 +0200, Antonio Vargas wrote:
> > 2. whole screen z-buffer, for depth comparison between the pixels
> > generated from each window.
> 
> That one I question in part - if the rectangles are (as is typically the
> case) large then the Z buffer just ups the memory accesses. I guess for
> round windows it might be handy.
> 

There are multiple ways to enhance the speed for zbuffer:

1. Use an hierarchical z-buffer

Divide the screen in 16x16 pixel tiles, and then a per-tile minimum
value. When rendering a poly, you first check the tile-z against the
poly-z and if it fails you can skip 256 pixels in one go.

2. Use scanline-major rendering:

for_each_scanline{
  clear_z_for_scanline();
  for_each_polygon{
draw_pixels_for_current_polygon_and scanline();
  }
}

This is easily done by modeling the scanliner with a coroutine for each polygon
to be painted. The zbuffer is reduced to a scanline and is reused for
all scanlines,
so it's rather fast :)

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of Linux graphics

2005-09-01 Thread Antonio Vargas
On 9/1/05, Ian Romanick <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Allen Akin wrote:
> > On Wed, Aug 31, 2005 at 02:06:54PM -0700, Keith Packard wrote:
> > |
> > | ...So far, 3D driver work has proceeded almost entirely on the
> > | newest documented hardware that people could get. Going back and
> > | spending months optimizing software 3D rendering code so that it works
> > | as fast as software 2D code seems like a thankless task.
> >
> > Jon's right about this:  If you can accelerate a given simple function
> > (blending, say) for a 2D driver, you can accelerate that same function
> > in a Mesa driver for a comparable amount of effort, and deliver a
> > similar benefit to apps.  (More apps, in fact, since it helps
> > OpenGL-based apps as well as Cairo-based apps.)
> 
> The difference is that there is a much larger number of state
> combinations possible in OpenGL than in something stripped down for
> "just 2D".  That can make it more difficult to know where to spend the
> time tuning.  I've spent a fair amount of time looking at Mesa's texture
> blending code, so I know this to be true.
> 
> The real route forward is to dig deeper into run-time code generation.
> There are a large number of possible combinations, but they all look
> pretty similar.  This is ideal for run-time code gen.  The problem is
> that writing correct, tuned assembly for this stuff takes a pretty
> experience developer, and writing correct, tuned code generation
> routines takes an even more experienced developer.  Experienced and more
> experienced developers are, alas, in short supply.

Ian, the easy way would be to concentrate on 2d-like operations and
optimize them by hand. I mean if _we_ are developing the opegl-using
application (xserver-over-opengl), we already know what opengl
operations and moder are needed, so we can concentrate on coding them
in software. And if this means that we have to detect the case when a
triangle is z-constant, then so be it.

Using an OSX desktop everyday and having experience on
software-graphics for small machines, and assuming OSX is drawing the
screen just by rendering each window to a offscreen-buffer and then
compositing, our needs are:

1. offscreen buffers, that can be drawn into. we don't really need
anything fancy, just be able to point the drawing operations to
another memory space. they should be any size, not just power-of-two.

2. whole screen z-buffer, for depth comparison between the pixels
generated from each window.

3. texture+alpha (RGBA) triangles, using any offscreen buffer as a
texture. texturing from a non-power-of-two texture is not that
difficult anymore since about '96 or '97.

4. alpha blending, where the incoming alpha is used as a blending
factor with this equation: scr_color_new = scr_color * (1-alpha) +
tex_color * alpha.

1+2+3 gives us a basic 3d-esque desktop. adding 4 provides the dropshadows ;)

But, 3d software rendering is easily speeded-up by not using z-buffer,
which is a PITA. Two aproaches for solving this:

a. Just sort the polys (they are just 2 polys per window) back to
front and draw at screen-buffer flip. This is easy. Previous work I
did sugests you can reach 16fps for a 320x192x8bit screen with a 10
mips machine ([EMAIL PROTECTED]).

b. Implement a scanline zbuffer, where we have to paint by scanlines
instead of whole triangles. Drawing is delayed until screen-buffer
flip and then we have an outer loop for each screen scanline, middle
loop for each poly that is affected and inner loop for each pixel from
that poly in that scanline.

Software rendering is just detecting the common case and coding a
proper code for it. It's not really that difficult to reach
memory-speed if you simply forget about implementing all combinations
of graphics modes.

> BTW, Alan, when are you going to start writing code again? >:)
> 
> > So long as people are encouraged by word and deed to spend their time on
> > "2D" drivers, Mesa drivers will be further starved for resources and the
> > belief that OpenGL has nothing to offer "2D" apps will become
> > self-fulfilling.
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.6 (GNU/Linux)
> 
> iD8DBQFDFnFQX1gOwKyEAw8RAgZsAJ9MoKf+JTX4OGrybrhD+i2axstONgCghwih
> /Bln/u55IJb3BMWBwVTA3sk=
> =k086
> -END PGP SIGNATURE-
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of Linux graphics

2005-09-01 Thread Antonio Vargas
On 9/1/05, Ian Romanick [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Allen Akin wrote:
  On Wed, Aug 31, 2005 at 02:06:54PM -0700, Keith Packard wrote:
  |
  | ...So far, 3D driver work has proceeded almost entirely on the
  | newest documented hardware that people could get. Going back and
  | spending months optimizing software 3D rendering code so that it works
  | as fast as software 2D code seems like a thankless task.
 
  Jon's right about this:  If you can accelerate a given simple function
  (blending, say) for a 2D driver, you can accelerate that same function
  in a Mesa driver for a comparable amount of effort, and deliver a
  similar benefit to apps.  (More apps, in fact, since it helps
  OpenGL-based apps as well as Cairo-based apps.)
 
 The difference is that there is a much larger number of state
 combinations possible in OpenGL than in something stripped down for
 just 2D.  That can make it more difficult to know where to spend the
 time tuning.  I've spent a fair amount of time looking at Mesa's texture
 blending code, so I know this to be true.
 
 The real route forward is to dig deeper into run-time code generation.
 There are a large number of possible combinations, but they all look
 pretty similar.  This is ideal for run-time code gen.  The problem is
 that writing correct, tuned assembly for this stuff takes a pretty
 experience developer, and writing correct, tuned code generation
 routines takes an even more experienced developer.  Experienced and more
 experienced developers are, alas, in short supply.

Ian, the easy way would be to concentrate on 2d-like operations and
optimize them by hand. I mean if _we_ are developing the opegl-using
application (xserver-over-opengl), we already know what opengl
operations and moder are needed, so we can concentrate on coding them
in software. And if this means that we have to detect the case when a
triangle is z-constant, then so be it.

Using an OSX desktop everyday and having experience on
software-graphics for small machines, and assuming OSX is drawing the
screen just by rendering each window to a offscreen-buffer and then
compositing, our needs are:

1. offscreen buffers, that can be drawn into. we don't really need
anything fancy, just be able to point the drawing operations to
another memory space. they should be any size, not just power-of-two.

2. whole screen z-buffer, for depth comparison between the pixels
generated from each window.

3. texture+alpha (RGBA) triangles, using any offscreen buffer as a
texture. texturing from a non-power-of-two texture is not that
difficult anymore since about '96 or '97.

4. alpha blending, where the incoming alpha is used as a blending
factor with this equation: scr_color_new = scr_color * (1-alpha) +
tex_color * alpha.

1+2+3 gives us a basic 3d-esque desktop. adding 4 provides the dropshadows ;)

But, 3d software rendering is easily speeded-up by not using z-buffer,
which is a PITA. Two aproaches for solving this:

a. Just sort the polys (they are just 2 polys per window) back to
front and draw at screen-buffer flip. This is easy. Previous work I
did sugests you can reach 16fps for a 320x192x8bit screen with a 10
mips machine ([EMAIL PROTECTED]).

b. Implement a scanline zbuffer, where we have to paint by scanlines
instead of whole triangles. Drawing is delayed until screen-buffer
flip and then we have an outer loop for each screen scanline, middle
loop for each poly that is affected and inner loop for each pixel from
that poly in that scanline.

Software rendering is just detecting the common case and coding a
proper code for it. It's not really that difficult to reach
memory-speed if you simply forget about implementing all combinations
of graphics modes.

 BTW, Alan, when are you going to start writing code again? :)
 
  So long as people are encouraged by word and deed to spend their time on
  2D drivers, Mesa drivers will be further starved for resources and the
  belief that OpenGL has nothing to offer 2D apps will become
  self-fulfilling.
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.2.6 (GNU/Linux)
 
 iD8DBQFDFnFQX1gOwKyEAw8RAgZsAJ9MoKf+JTX4OGrybrhD+i2axstONgCghwih
 /Bln/u55IJb3BMWBwVTA3sk=
 =k086
 -END PGP SIGNATURE-
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of Linux graphics

2005-09-01 Thread Antonio Vargas
On 9/1/05, Alan Cox [EMAIL PROTECTED] wrote:
 On Iau, 2005-09-01 at 08:00 +0200, Antonio Vargas wrote:
  2. whole screen z-buffer, for depth comparison between the pixels
  generated from each window.
 
 That one I question in part - if the rectangles are (as is typically the
 case) large then the Z buffer just ups the memory accesses. I guess for
 round windows it might be handy.
 

There are multiple ways to enhance the speed for zbuffer:

1. Use an hierarchical z-buffer

Divide the screen in 16x16 pixel tiles, and then a per-tile minimum
value. When rendering a poly, you first check the tile-z against the
poly-z and if it fails you can skip 256 pixels in one go.

2. Use scanline-major rendering:

for_each_scanline{
  clear_z_for_scanline();
  for_each_polygon{
draw_pixels_for_current_polygon_and scanline();
  }
}

This is easily done by modeling the scanliner with a coroutine for each polygon
to be painted. The zbuffer is reduced to a scanline and is reused for
all scanlines,
so it's rather fast :)

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] Virtualization patches, set 4

2005-08-24 Thread Antonio Vargas
On 8/24/05, Zachary Amsden <[EMAIL PROTECTED]> wrote:
> Transparent paravirtualization patches, set 4.  This batch includes
> mostly MMU hooks that can be used by the hypervisor for page allocation,
> and allows the kernel to be compiled to step out of the way of the
> hypervisor by making a hole in linear address space.
> 
> Patches are based off 2.6.13-rc6-mm2; I've tested i386 PAE and non-PAE
> as well as um-i386.  Although these are mostly i386 specific, some of
> the concepts are starting to apply to virtualization of other
> architectures as well.

Zach, have you had a look at the mac-on-linux virtualizer? It's coded
for arch-ppc but most probably some of these generic concepts could
get merged together...

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] Virtualization patches, set 4

2005-08-24 Thread Antonio Vargas
On 8/24/05, Zachary Amsden [EMAIL PROTECTED] wrote:
 Transparent paravirtualization patches, set 4.  This batch includes
 mostly MMU hooks that can be used by the hypervisor for page allocation,
 and allows the kernel to be compiled to step out of the way of the
 hypervisor by making a hole in linear address space.
 
 Patches are based off 2.6.13-rc6-mm2; I've tested i386 PAE and non-PAE
 as well as um-i386.  Although these are mostly i386 specific, some of
 the concepts are starting to apply to virtualization of other
 architectures as well.

Zach, have you had a look at the mac-on-linux virtualizer? It's coded
for arch-ppc but most probably some of these generic concepts could
get merged together...

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: a 15 GB file on tmpfs

2005-07-20 Thread Antonio Vargas
On 7/20/05, Erik Mouw <[EMAIL PROTECTED]> wrote:
> On Wed, Jul 20, 2005 at 01:35:07PM +, Miquel van Smoorenburg wrote:
> > In article <[EMAIL PROTECTED]>,
> > Erik Mouw  <[EMAIL PROTECTED]> wrote:
> > >On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote:
> > >AFAIK you can't use a 15 GB tmpfs on i386 because large memory support
> > >is basically a hack to support multiple 4GB memory spaces (some VM guru
> > >correct me if I'm wrong).
> >
> > I'm no VM guru but I have a 32 bit machine here with 8 GB of
> > memory and 8 GB of swap:
> >
> > # mount -t tmpfs -o size=$((12*1024*1024*1024)) tmpfs /mnt
> > # df
> > Filesystem   1K-blocks  Used Available Use% Mounted on
> > /dev/sda1 19228276   1200132  17051396   7% /
> > tmpfs 12582912 0  12582912   0% /mnt
> >
> > There you go, a 12 GB tmpfs. I haven't tried to create a 12 GB
> > file on it, though, since this is a production machine and it
> > needs the memory ..
> 
> I stand corrected.
> 
> > So yes that appears to work just fine.
> 
> The question is if it's a good idea to use a 15GB tmpfs on a 32 bit
> i386 class machine. I guess a real 64 bit machine will be much faster
> in handling suchs amounts of data simply because you don't have to go
> through the hurdles needed to address such memory on i386.
> 
> 
> Erik
> 

On 32bit: you would have to use read() and write() or mmap() munmap()
mremap() to perform your own paging, since you can't fit 15GB on a 4GB
address space.

On 64bit: you would simply mmap() the whole file and you are done.

Most probably the cost of programming and debugging the hand-made
paging on 32bit machines will cost more than the difference for a
64bit machine.

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: a 15 GB file on tmpfs

2005-07-20 Thread Antonio Vargas
On 7/20/05, Erik Mouw <[EMAIL PROTECTED]> wrote:
> On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote:
> > I have a 15 GB file which I want to place in memory via tmpfs. I want to do
> > this because I need to have this data accessible with a very low seek time.
> 
> That should be no problem on a 64 bit architecture.
> 
> > I want to know if this is possible before spending 10,000 euros on a machine
> > that has 16 GB of memory.
> 
> If you want to spend that amount of money on memory anyway, the extra
> cost for an AMD64 machine isn't that large.
> 
> > The machine we plan to buy is a HP Proliant Xeon machine and I want to run a
> > 32 bit linux kernel on it (the xeon we want doesn't have the 64-bit stuff
> > yet)
> 
> AFAIK you can't use a 15 GB tmpfs on i386 because large memory support
> is basically a hack to support multiple 4GB memory spaces (some VM guru
> correct me if I'm wrong). Just get an Athlon64 machine and run a 64 bit
> kernel on it. If compatibility is a problem, you can still run a 32 bit
> i386 userland on an x86_64 kernel.
> 
> 
> Erik
> 
> --
> +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
> | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
> -

Bastian, Erik is dead-on on that one: go 64bit and forget all worries
about your 15GB filesize. Just don't forget to look not only x86-64
(intel or amd) but also itanium, ppc64 and s390 machines, you never
know about surprises!

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: a 15 GB file on tmpfs

2005-07-20 Thread Antonio Vargas
On 7/20/05, Erik Mouw [EMAIL PROTECTED] wrote:
 On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote:
  I have a 15 GB file which I want to place in memory via tmpfs. I want to do
  this because I need to have this data accessible with a very low seek time.
 
 That should be no problem on a 64 bit architecture.
 
  I want to know if this is possible before spending 10,000 euros on a machine
  that has 16 GB of memory.
 
 If you want to spend that amount of money on memory anyway, the extra
 cost for an AMD64 machine isn't that large.
 
  The machine we plan to buy is a HP Proliant Xeon machine and I want to run a
  32 bit linux kernel on it (the xeon we want doesn't have the 64-bit stuff
  yet)
 
 AFAIK you can't use a 15 GB tmpfs on i386 because large memory support
 is basically a hack to support multiple 4GB memory spaces (some VM guru
 correct me if I'm wrong). Just get an Athlon64 machine and run a 64 bit
 kernel on it. If compatibility is a problem, you can still run a 32 bit
 i386 userland on an x86_64 kernel.
 
 
 Erik
 
 --
 +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
 | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
 -

Bastian, Erik is dead-on on that one: go 64bit and forget all worries
about your 15GB filesize. Just don't forget to look not only x86-64
(intel or amd) but also itanium, ppc64 and s390 machines, you never
know about surprises!

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: a 15 GB file on tmpfs

2005-07-20 Thread Antonio Vargas
On 7/20/05, Erik Mouw [EMAIL PROTECTED] wrote:
 On Wed, Jul 20, 2005 at 01:35:07PM +, Miquel van Smoorenburg wrote:
  In article [EMAIL PROTECTED],
  Erik Mouw  [EMAIL PROTECTED] wrote:
  On Wed, Jul 20, 2005 at 02:16:36PM +0200, Bastiaan Naber wrote:
  AFAIK you can't use a 15 GB tmpfs on i386 because large memory support
  is basically a hack to support multiple 4GB memory spaces (some VM guru
  correct me if I'm wrong).
 
  I'm no VM guru but I have a 32 bit machine here with 8 GB of
  memory and 8 GB of swap:
 
  # mount -t tmpfs -o size=$((12*1024*1024*1024)) tmpfs /mnt
  # df
  Filesystem   1K-blocks  Used Available Use% Mounted on
  /dev/sda1 19228276   1200132  17051396   7% /
  tmpfs 12582912 0  12582912   0% /mnt
 
  There you go, a 12 GB tmpfs. I haven't tried to create a 12 GB
  file on it, though, since this is a production machine and it
  needs the memory ..
 
 I stand corrected.
 
  So yes that appears to work just fine.
 
 The question is if it's a good idea to use a 15GB tmpfs on a 32 bit
 i386 class machine. I guess a real 64 bit machine will be much faster
 in handling suchs amounts of data simply because you don't have to go
 through the hurdles needed to address such memory on i386.
 
 
 Erik
 

On 32bit: you would have to use read() and write() or mmap() munmap()
mremap() to perform your own paging, since you can't fit 15GB on a 4GB
address space.

On 64bit: you would simply mmap() the whole file and you are done.

Most probably the cost of programming and debugging the hand-made
paging on 32bit machines will cost more than the difference for a
64bit machine.

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: prefetch on ppc64

2005-03-29 Thread Antonio Vargas
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras <[EMAIL PROTECTED]> wrote:
> Serge E. Hallyn writes:
> 
> > While investigating the inordinate performance impact one of my patches
> > seemed to be having, we tracked it down to two hlist_for_each_entry
> > loops, and finally to the prefetch instruction in the loop.
> 
> I would be interested to know what results you get if you leave the
> loops using hlist_for_each_entry but change prefetch() and prefetchw()
> to do the dcbt or dcbtst instruction only if the address is non-zero,
> like this:
> 
> static inline void prefetch(const void *x)
> {
> if (x)
> __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
> }
> 
> static inline void prefetchw(const void *x)
> {
> if (x)
> __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
> }
> 
> It seems that doing a prefetch on a NULL pointer, while it doesn't
> cause a fault, does waste time looking for a translation of the zero
> address.
> 
> Paul.

Don't know exactly about power5, but G5 processor is described on IBM
docs as doing automatic whole-page prefetch read-ahead when detecting
linear accesses.

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: prefetch on ppc64

2005-03-29 Thread Antonio Vargas
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras [EMAIL PROTECTED] wrote:
 Serge E. Hallyn writes:
 
  While investigating the inordinate performance impact one of my patches
  seemed to be having, we tracked it down to two hlist_for_each_entry
  loops, and finally to the prefetch instruction in the loop.
 
 I would be interested to know what results you get if you leave the
 loops using hlist_for_each_entry but change prefetch() and prefetchw()
 to do the dcbt or dcbtst instruction only if the address is non-zero,
 like this:
 
 static inline void prefetch(const void *x)
 {
 if (x)
 __asm__ __volatile__ (dcbt 0,%0 : : r (x));
 }
 
 static inline void prefetchw(const void *x)
 {
 if (x)
 __asm__ __volatile__ (dcbtst 0,%0 : : r (x));
 }
 
 It seems that doing a prefetch on a NULL pointer, while it doesn't
 cause a fault, does waste time looking for a translation of the zero
 address.
 
 Paul.

Don't know exactly about power5, but G5 processor is described on IBM
docs as doing automatic whole-page prefetch read-ahead when detecting
linear accesses.

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/