Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Bill Davidsen
 timeout issue?


Just a quick update; it is really starting to look like there is
definitely an issue with the nbd kernel driver.  I booted the SLES10
2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the
nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD
hang.  But it _still_ occurs with the 4-step process I outlined above.

First, running an smp kernel with maxcpus=1 is not the same as running a 
uni kernel, not is nosmp option. The code is different.


Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, 
but was told it wouldn't work with smp and I kind of lost interest. If 
Neil thinks it should work in 2.6.21 or later I'll test it, since I have 
a machine which wants a fresh install soon, and is both backed up and 
available.

The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server
is no longer running (the node it was running on was powered off)...
however the nbd-client is still connected to the kernel (meaning the
kernel didn't return an error back to userspace).
Also, MD is still blocking waiting to write the superblock (presumably
to nbd0). 


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid5: coding style cleanup / refactor

2007-06-14 Thread Bill Davidsen

Dan Williams wrote:

In other words, it seemed like a good idea at the time, but I am open
to suggestions.



I went ahead and added the cleanup patch to the front of the
git-md-accel.patch series.  A few more whitespace cleanups, but no
major changes from what I posted earlier.  The new rebased series is
still passing my tests and Neil's tests in mdadm. 
When you are ready for wider testing, if you have a patch against a 
released kernel it makes testing easy, characteristics are pretty well 
known already.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-15 Thread Bill Davidsen

Paul Clements wrote:

Bill Davidsen wrote:

Second, AFAIK nbd hasn't working in a while. I haven't tried it in 
ages, but was told it wouldn't work with smp and I kind of lost 
interest. If Neil thinks it should work in 2.6.21 or later I'll test 
it, since I have a machine which wants a fresh install soon, and is 
both backed up and available.


Please stop this. nbd is working perfectly fine, AFAIK. I use it every 
day, and so do 100s of our customers. What exactly is it that not's 
working? If there's a problem, please send the bug report.
Could you clarify what kernel, distribution, and mdadm version is used, 
and how often the nbd server becomes unavailable to the clients? And 
your clients are SMP? By working perfectly fine, I assume you do mean 
in the same way as described in the original posting, and not just with 
the client, server, and network all fully functional.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-17 Thread Bill Davidsen

Neil Brown wrote:

On Thursday June 14, [EMAIL PROTECTED] wrote:
  

On Fri, 15 Jun 2007, Neil Brown wrote:



On Thursday June 14, [EMAIL PROTECTED] wrote:
  

what is the limit for the number of devices that can be in a single array?

I'm trying to build a 45x750G array and want to experiment with the
different configurations. I'm trying to start with raid6, but mdadm is
complaining about an invalid number of drives

David Lang


man mdadm  search for limits.  (forgive typos).
  

thanks.

why does it still default to the old format after so many new versions? 
(by the way, the documetnation said 28 devices, but I couldn't get it to 
accept more then 27)



Dunno - maybe I can't count...

  

it's now churning away 'rebuilding' the brand new array.

a few questions/thoughts.

why does it need to do a rebuild when makeing a new array? couldn't it 
just zero all the drives instead? (or better still just record most of the 
space as 'unused' and initialize it as it starts useing it?)



Yes, it could zero all the drives first.  But that would take the same
length of time (unless p/q generation was very very slow), and you
wouldn't be able to start writing data until it had finished.
You can dd /dev/zero onto all drives and then create the array with
--assume-clean if you want to.  You could even write a shell script to
do it for you.

Yes, you could record which space is used vs unused, but I really
don't think the complexity is worth it.

  
How about a simple solution which would get an array on line and still 
be safe? All it would take is a flag which forced reconstruct writes for 
RAID-5. You could set it with an option, or automatically if someone 
puts --assume-clean with --create, leave it in the superblock until the 
first repair runs to completion. And for repair you could make some 
assumptions about bad parity not being caused by error but just unwritten.


Thought 2: I think the unwritten bit is easier than you think, you only 
need it on parity blocks for RAID5, not on data blocks. When a write is 
done, if the bit is set do a reconstruct, write the parity block, and 
clear the bit. Keeping a bit per data block is madness, and appears to 
be unnecessary as well.
while I consider zfs to be ~80% hype, one advantage it could have (but I 
don't know if it has) is that since the filesystem an raid are integrated 
into one layer they can optimize the case where files are being written 
onto unallocated space and instead of reading blocks from disk to 
calculate the parity they could just put zeros in the unallocated space, 
potentially speeding up the system by reducing the amount of disk I/O.



Certainly.  But the raid doesn't need to be tightly integrated
into the filesystem to achieve this.  The filesystem need only know
the geometry of the RAID and when it comes to write, it tries to write
full stripes at a time.  If that means writing some extra blocks full
of zeros, it can try to do that.  This would require a little bit
better communication between filesystem and raid, but not much.  If
anyone has a filesystem that they want to be able to talk to raid
better, they need only ask...
 
  
is there any way that linux would be able to do this sort of thing? or is 
it impossible due to the layering preventing the nessasary knowledge from 
being in the right place?



Linux can do anything we want it to.  Interfaces can be changed.  All
it takes is a fairly well defined requirement, and the will to make it
happen (and some technical expertise, and lots of time  and
coffee?).
  
Well, I gave you two thoughts, one which would be slow until a repair 
but sounds easy to do, and one which is slightly harder but works better 
and minimizes performance impact.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-17 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:

On Sat, 16 Jun 2007, Neil Brown wrote:


It would be possible to have a 'this is not initialised' flag on the
array, and if that is not set, always do a reconstruct-write rather
than a read-modify-write.  But the first time you have an unclean
shutdown you are going to resync all the parity anyway (unless you
have a bitmap) so you may as well resync at the start.

And why is it such a big deal anyway?  The initial resync doesn't stop
you from using the array.  I guess if you wanted to put an array into
production instantly and couldn't afford any slowdown due to resync,
then you might want to skip the initial resync but is that really
likely?


in my case it takes 2+ days to resync the array before I can do any 
performance testing with it. for some reason it's only doing the 
rebuild at ~5M/sec (even though I've increased the min and max rebuild 
speeds and a dd to the array seems to be ~44M/sec, even during the 
rebuild)


I want to test several configurations, from a 45 disk raid6 to a 45 
disk raid0. at 2-3 days per test (or longer, depending on the tests) 
this becomes a very slow process.


I've been doing stuff like this, but I just build the array on a 
partition per drive so the init is livable. For the stuff I'm doing a 
total of 500-100GB is ample to do performance testing.
also, when a rebuild is slow enough (and has enough of a performance 
impact) it's not uncommon to want to operate in degraded mode just 
long enought oget to a maintinance window and then recreate the array 
and reload from backup.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-21 Thread Bill Davidsen
I didn't get a comment on my suggestion for a quick and dirty fix for 
-assume-clean issues...


Bill Davidsen wrote:

Neil Brown wrote:

On Thursday June 14, [EMAIL PROTECTED] wrote:
 

it's now churning away 'rebuilding' the brand new array.

a few questions/thoughts.

why does it need to do a rebuild when makeing a new array? couldn't 
it just zero all the drives instead? (or better still just record 
most of the space as 'unused' and initialize it as it starts useing 
it?)



Yes, it could zero all the drives first.  But that would take the same
length of time (unless p/q generation was very very slow), and you
wouldn't be able to start writing data until it had finished.
You can dd /dev/zero onto all drives and then create the array with
--assume-clean if you want to.  You could even write a shell script to
do it for you.

Yes, you could record which space is used vs unused, but I really
don't think the complexity is worth it.

  
How about a simple solution which would get an array on line and still 
be safe? All it would take is a flag which forced reconstruct writes 
for RAID-5. You could set it with an option, or automatically if 
someone puts --assume-clean with --create, leave it in the superblock 
until the first repair runs to completion. And for repair you could 
make some assumptions about bad parity not being caused by error but 
just unwritten.


Thought 2: I think the unwritten bit is easier than you think, you 
only need it on parity blocks for RAID5, not on data blocks. When a 
write is done, if the bit is set do a reconstruct, write the parity 
block, and clear the bit. Keeping a bit per data block is madness, and 
appears to be unnecessary as well.
while I consider zfs to be ~80% hype, one advantage it could have 
(but I don't know if it has) is that since the filesystem an raid 
are integrated into one layer they can optimize the case where files 
are being written onto unallocated space and instead of reading 
blocks from disk to calculate the parity they could just put zeros 
in the unallocated space, potentially speeding up the system by 
reducing the amount of disk I/O.



Certainly.  But the raid doesn't need to be tightly integrated
into the filesystem to achieve this.  The filesystem need only know
the geometry of the RAID and when it comes to write, it tries to write
full stripes at a time.  If that means writing some extra blocks full
of zeros, it can try to do that.  This would require a little bit
better communication between filesystem and raid, but not much.  If
anyone has a filesystem that they want to be able to talk to raid
better, they need only ask...
 
 
is there any way that linux would be able to do this sort of thing? 
or is it impossible due to the layering preventing the nessasary 
knowledge from being in the right place?



Linux can do anything we want it to.  Interfaces can be changed.  All
it takes is a fairly well defined requirement, and the will to make it
happen (and some technical expertise, and lots of time  and
coffee?).
  
Well, I gave you two thoughts, one which would be slow until a repair 
but sounds easy to do, and one which is slightly harder but works 
better and minimizes performance impact.





--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-22 Thread Bill Davidsen

David Greaves wrote:

[EMAIL PROTECTED] wrote:

On Fri, 22 Jun 2007, David Greaves wrote:

That's not a bad thing - until you look at the complexity it brings 
- and then consider the impact and exceptions when you do, eg 
hardware acceleration? md information fed up to the fs layer for 
xfs? simple long term maintenance?


Often these problems are well worth the benefits of the feature.

I _wonder_ if this is one where the right thing is to just say no :)
so for several reasons I don't see this as something that's deserving 
of an atomatic 'no'


David Lang


Err, re-read it, I hope you'll see that I agree with you - I actually 
just meant the --assume-clean workaround stuff :)


If you end up 'fiddling' in md because someone specified 
--assume-clean on a raid5 [in this case just to save a few minutes 
*testing time* on system with a heavily choked bus!] then that adds 
*even more* complexity and exception cases into all the stuff you 
described.


A few minutes? Are you reading the times people are seeing with 
multi-TB arrays? Let's see, 5TB at a rebuild rate of 20MB... three days. 
And as soon as you believe that the array is actually usable you cut 
that rebuild rate, perhaps in half, and get dog-slow performance from 
the array. It's usable in the sense that reads and writes work, but for 
useful work it's pretty painful. You either fail to understand the 
magnitude of the problem or wish to trivialize it for some reason.


By delaying parity computation until the first write to a stripe only 
the growth of a filesystem is slowed, and all data are protected without 
waiting for the lengthly check. The rebuild speed can be set very low, 
because on-demand rebuild will do most of the work.


I'm very much for the fs layer reading the lower block structure so I 
don't have to fiddle with arcane tuning parameters - yes, *please* 
help make xfs self-tuning!


Keeping life as straightforward as possible low down makes the upwards 
interface more manageable and that goal more realistic... 


Those two paragraphs are mutually exclusive. The fs can be simple 
because it rests on a simple device, even if the simple device is 
provided by LVM or md. And LVM and md can stay simple because they rest 
on simple devices, even if they are provided by PATA, SATA, nbd, etc. 
Independent layers make each layer more robust. If you want to 
compromise the layer separation, some approach like ZFS with full 
integration would seem to be promising. Note that layers allow 
specialized features at each point, trading integration for flexibility.


My feeling is that full integration and independent layers each have 
benefits, as you connect the layers to expose operational details you 
need to handle changes in those details, which would seem to make layers 
more complex. What I'm looking for here is better performance in one 
particular layer, the md RAID5 layer. I like to avoid unnecessary 
complexity, but I feel that the current performance suggests room for 
improvement.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-31 Thread Bill Davidsen

Neil Brown wrote:

On Monday May 28, [EMAIL PROTECTED] wrote:
  

There are two things I'm not sure you covered.

First, disks which don't support flush but do have a cache dirty 
status bit you can poll at times like shutdown. If there are no drivers 
which support these, it can be ignored.



There are really devices like that?  So to implement a flush, you have
to stop sending writes and wait and poll - maybe poll every
millisecond?
  


Yes, there really are (or were). But I don't think that there are 
drivers, so it's not an issue.

That wouldn't be very good for performance  maybe you just
wouldn't bother with barriers on that sort of device?
  


That is why there are no drivers...

Which reminds me:  What is the best way to turn off barriers?
Several filesystems have -o nobarriers or -o barriers=0,
or the inverse.
  


If they can function usefully without, the admin gets to make that choice.

md/raid currently uses barriers to write metadata, and there is no
way to turn that off.  I'm beginning to wonder if that is best.
  


I don't see how you can have reliable operation without it, particularly 
WRT bitmap.

Maybe barrier support should be a function of the device.  i.e. the
filesystem or whatever always sends barrier requests where it thinks
it is appropriate, and the block device tries to honour them to the
best of its ability, but if you run
   blockdev --enforce-barriers=no /dev/sda
then you lose some reliability guarantees, but gain some throughput (a
bit like the 'async' export option for nfsd).

  
Since this is device dependent, it really should be in the device 
driver, and requests should have status of success, failure, or feature 
unavailability.




Second, NAS (including nbd?). Is there enough information to handle this  really 
right?



NAS means lots of things, including NFS and CIFS where this doesn't
apply.
  


Well, we're really talking about network attached devices rather than 
network filesystems. I guess people do lump them together.



For 'nbd', it is entirely up to the protocol.  If the protocol allows
a barrier flag to be sent to the server, then barriers should just
work.  If it doesn't, then either the server disables write-back
caching, or flushes every request, or you lose all barrier
guarantees. 
  


Pretty much agrees with what I said above, it's at a level closer to the 
device, and status should come back from the physical i/o request.

For 'iscsi', I guess it works just the same as SCSI...
  


Hopefully.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-31 Thread Bill Davidsen

Jens Axboe wrote:

On Thu, May 31 2007, David Chinner wrote:
  

On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote:


On Thu, May 31 2007, David Chinner wrote:
  

IOWs, there are two parts to the problem:

1 - guaranteeing I/O ordering
2 - guaranteeing blocks are on persistent storage.

Right now, a single barrier I/O is used to provide both of these
guarantees. In most cases, all we really need to provide is 1); the
need for 2) is a much rarer condition but still needs to be
provided.


if I am understanding it correctly, the big win for barriers is that you 
do NOT have to stop and wait until the data is on persistant media before 
you can continue.
  

Yes, if we define a barrier to only guarantee 1), then yes this
would be a big win (esp. for XFS). But that requires all filesystems
to handle sync writes differently, and sync_blockdev() needs to
call blkdev_issue_flush() as well

So, what do we do here? Do we define a barrier I/O to only provide
ordering, or do we define it to also provide persistent storage
writeback? Whatever we decide, it needs to be documented


The block layer already has a notion of the two types of barriers, with
a very small amount of tweaking we could expose that. There's absolutely
zero reason we can't easily support both types of barriers.
  

That sounds like a good idea - we can leave the existing
WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
behaviour that only guarantees ordering. The filesystem can then
choose which to use where appropriate



Precisely. The current definition of barriers are what Chris and I came
up with many years ago, when solving the problem for reiserfs
originally. It is by no means the only feasible approach.

I'll add a WRITE_ORDERED command to the #barrier branch, it already
contains the empty-bio barrier support I posted yesterday (well a
slightly modified and cleaned up version).

  
Wait. Do filesystems expect (depend on) anything but ordering now? Does 
md? Having users of barriers as they currently behave suddenly getting 
SYNC behavior where they expect ORDERED is likely to have a negative 
effect on performance. Or do I misread what is actually guaranteed by 
WRITE_BARRIER now, and a flush is currently happening in all cases?


And will this also be available to user space f/s, since I just proposed 
a project which uses one? :-(
I think the goal is good, more choice is almost always better choice, I 
just want to be sure there won't be big disk performance regressions.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Bill Davidsen

Jens Axboe wrote:

On Thu, May 31 2007, Bill Davidsen wrote:
  

Jens Axboe wrote:


On Thu, May 31 2007, David Chinner wrote:
 
  

On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote:
   


On Thu, May 31 2007, David Chinner wrote:
 
  

IOWs, there are two parts to the problem:

1 - guaranteeing I/O ordering
2 - guaranteeing blocks are on persistent storage.

Right now, a single barrier I/O is used to provide both of these
guarantees. In most cases, all we really need to provide is 1); the
need for 2) is a much rarer condition but still needs to be
provided.

   

if I am understanding it correctly, the big win for barriers is that 
you do NOT have to stop and wait until the data is on persistant media 
before you can continue.
 
  

Yes, if we define a barrier to only guarantee 1), then yes this
would be a big win (esp. for XFS). But that requires all filesystems
to handle sync writes differently, and sync_blockdev() needs to
call blkdev_issue_flush() as well

So, what do we do here? Do we define a barrier I/O to only provide
ordering, or do we define it to also provide persistent storage
writeback? Whatever we decide, it needs to be documented
   


The block layer already has a notion of the two types of barriers, with
a very small amount of tweaking we could expose that. There's absolutely
zero reason we can't easily support both types of barriers.
 
  

That sounds like a good idea - we can leave the existing
WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
behaviour that only guarantees ordering. The filesystem can then
choose which to use where appropriate
   


Precisely. The current definition of barriers are what Chris and I came
up with many years ago, when solving the problem for reiserfs
originally. It is by no means the only feasible approach.

I'll add a WRITE_ORDERED command to the #barrier branch, it already
contains the empty-bio barrier support I posted yesterday (well a
slightly modified and cleaned up version).

 
  
Wait. Do filesystems expect (depend on) anything but ordering now? Does 
md? Having users of barriers as they currently behave suddenly getting 
SYNC behavior where they expect ORDERED is likely to have a negative 
effect on performance. Or do I misread what is actually guaranteed by 
WRITE_BARRIER now, and a flush is currently happening in all cases?



See the above stuff you quote, it's answered there. It's not a change,
this is how the Linux barrier write has always worked since I first
implemented it. What David and I are talking about is adding a more
relaxed version as well, that just implies ordering.
  


I was reading the documentation in block/biodoc.txt, which seems to just 
say ordered:


   1.2.1 I/O Barriers

   There is a way to enforce strict ordering for i/os through barriers.
   All requests before a barrier point must be serviced before the barrier
   request and any other requests arriving after the barrier will not be
   serviced until after the barrier has completed. This is useful for
   higher
   level control on write ordering, e.g flushing a log of committed updates
   to disk before the corresponding updates themselves.

   A flag in the bio structure, BIO_BARRIER is used to identify a
   barrier i/o.
   The generic i/o scheduler would make sure that it places the barrier
   request and
   all other requests coming after it after all the previous requests
   in the
   queue. Barriers may be implemented in different ways depending on the
   driver. A SCSI driver for example could make use of ordered tags to
   preserve the necessary ordering with a lower impact on throughput.
   For IDE
   this might be two sync cache flush: a pre and post flush when
   encountering
   a barrier write.

The flush comment is associated with IDE, so it wasn't clear that the 
device cache is always cleared to force the data to the platter.


And will this also be available to user space f/s, since I just proposed 
a project which uses one? :-(



I see several uses for that, so I'd hope so.

  
I think the goal is good, more choice is almost always better choice, I 
just want to be sure there won't be big disk performance regressions.



We can't get more heavy weight than the current barrier, it's about as
conservative as you can get.

  



--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Bill Davidsen

Neil Brown wrote:

On Friday June 1, [EMAIL PROTECTED] wrote:
  

On Thu, May 31, 2007 at 02:31:21PM -0400, Phillip Susi wrote:


David Chinner wrote:
  

That sounds like a good idea - we can leave the existing
WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
behaviour that only guarantees ordering. The filesystem can then
choose which to use where appropriate

So what if you want a synchronous write, but DON'T care about the order? 
  

submit_bio(WRITE_SYNC, bio);

Already there, already used by XFS, JFS and direct I/O.



Are you sure?

You seem to be saying that WRITE_SYNC causes the write to be safe on
media before the request returns.  That isn't my understanding.
I think (from comments near the definition and a quick grep through
the code) that WRITE_SYNC expedites the delivery of the request
through the elevator, but doesn't do anything special about getting it
onto the media.


My impression is that the sync will return when the i/o has been 
delivered to the device, and will get special treatment by the elevator 
code (I looked quickly, more is needed). I'm sore someone will tell me 
if I misread this. ;-)


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Bill Davidsen

Jens Axboe wrote:

On Thu, May 31 2007, Phillip Susi wrote:
  

Jens Axboe wrote:


No Stephan is right, the barrier is both an ordering and integrity
constraint. If a driver completes a barrier request before that request
and previously submitted requests are on STABLE storage, then it
violates that principle. Look at the code and the various ordering
options.
  
I am saying that is the wrong thing to do.  Barrier should be about 
ordering only.  So long as the order they hit the media is maintained, 
the order the requests are completed in can change.  barrier.txt bears 



But you can't guarentee ordering without flushing the data out as well.
It all depends on the type of cache on the device, of course. If you
look at the ordinary sata/ide drive with write back caching, you can't
just issue the requests in order and pray that the drive cache will make
it to platter.

If you don't have write back caching, or if the cache is battery backed
and thus guarenteed to never be lost, maintaining order is naturally
enough.
  


Do I misread this? If ordered doesn't reach all the way to the platter 
then there will be failure modes which result in order not preserved. 
Battery backed cache doesn't prevect failures between the cache and the 
platter.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-02 Thread Bill Davidsen

Jens Axboe wrote:

On Fri, Jun 01 2007, Bill Davidsen wrote:
  

Jens Axboe wrote:


On Thu, May 31 2007, Bill Davidsen wrote:
 
  

Jens Axboe wrote:
   


On Thu, May 31 2007, David Chinner wrote:

 
  

On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote:
  
   


On Thu, May 31 2007, David Chinner wrote:

 
  

IOWs, there are two parts to the problem:

1 - guaranteeing I/O ordering
2 - guaranteeing blocks are on persistent storage.

Right now, a single barrier I/O is used to provide both of these
guarantees. In most cases, all we really need to provide is 1); the
need for 2) is a much rarer condition but still needs to be
provided.

  
   

if I am understanding it correctly, the big win for barriers is that 
you do NOT have to stop and wait until the data is on persistant 
media before you can continue.

 
  

Yes, if we define a barrier to only guarantee 1), then yes this
would be a big win (esp. for XFS). But that requires all filesystems
to handle sync writes differently, and sync_blockdev() needs to
call blkdev_issue_flush() as well

So, what do we do here? Do we define a barrier I/O to only provide
ordering, or do we define it to also provide persistent storage
writeback? Whatever we decide, it needs to be documented
  
   


The block layer already has a notion of the two types of barriers, with
a very small amount of tweaking we could expose that. There's 
absolutely

zero reason we can't easily support both types of barriers.

 
  

That sounds like a good idea - we can leave the existing
WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
behaviour that only guarantees ordering. The filesystem can then
choose which to use where appropriate
  
   


Precisely. The current definition of barriers are what Chris and I came
up with many years ago, when solving the problem for reiserfs
originally. It is by no means the only feasible approach.

I'll add a WRITE_ORDERED command to the #barrier branch, it already
contains the empty-bio barrier support I posted yesterday (well a
slightly modified and cleaned up version).


 
  
Wait. Do filesystems expect (depend on) anything but ordering now? Does 
md? Having users of barriers as they currently behave suddenly getting 
SYNC behavior where they expect ORDERED is likely to have a negative 
effect on performance. Or do I misread what is actually guaranteed by 
WRITE_BARRIER now, and a flush is currently happening in all cases?
   


See the above stuff you quote, it's answered there. It's not a change,
this is how the Linux barrier write has always worked since I first
implemented it. What David and I are talking about is adding a more
relaxed version as well, that just implies ordering.
 
  
I was reading the documentation in block/biodoc.txt, which seems to just 
say ordered:


   1.2.1 I/O Barriers

   There is a way to enforce strict ordering for i/os through barriers.
   All requests before a barrier point must be serviced before the barrier
   request and any other requests arriving after the barrier will not be
   serviced until after the barrier has completed. This is useful for
   higher
   level control on write ordering, e.g flushing a log of committed updates
   to disk before the corresponding updates themselves.

   A flag in the bio structure, BIO_BARRIER is used to identify a
   barrier i/o.
   The generic i/o scheduler would make sure that it places the barrier
   request and
   all other requests coming after it after all the previous requests
   in the
   queue. Barriers may be implemented in different ways depending on the
   driver. A SCSI driver for example could make use of ordered tags to
   preserve the necessary ordering with a lower impact on throughput.
   For IDE
   this might be two sync cache flush: a pre and post flush when
   encountering
   a barrier write.

The flush comment is associated with IDE, so it wasn't clear that the 
device cache is always cleared to force the data to the platter.



The above should mention that the ordered tag comment for SCSI assumes
that the drive uses write through caching. If it does, then an ordered
tag is enough. If it doesn't, then you need a bit more than that (a post
flush, after the ordered tag has completed).

  

Thanks, go it.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scheduling oddity on 2.6.20.3 stock

2007-06-02 Thread Bill Davidsen

David Schwartz wrote:

bunzip2 -c $file.bz2 |gzip -9 $file.gz


So here are some actual results from a dual P3-1Ghz machine (2.6.21.1,
CFSv9). First lets time each operation individually:

$ time bunzip2 -k linux-2.6.21.tar.bz2

real1m5.626s
user1m2.240s
sys 0m3.144s


$ time gzip -9 linux-2.6.21.tar

real1m17.652s
user1m15.609s
sys 0m1.912s

The compress was the most complex (no surprise there) but they are close
enough that efficient overlap will definitely affect the total wall time. If
we can both decompress and compress in 1:17, we are optimal. First, let's
try the normal way:

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9  test1)

real1m45.051s
user2m16.945s
sys 0m2.752s

1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the
two processes ('accel' creates a 32MB cache and uses 'select' to fill from
stdin and empty to stdout without blocking either direction):

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9  test2)

real1m18.361s
user2m19.589s
sys 0m6.356s

Within testing accuracy of optimal.

So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate
input/output buffering. I don't think it's unreasonable to consider this a
defect in those programs.


They are hardly designed to optimize this operation...

For a tunable buffer program allowing the buffer size and buffers in the 
pool to be set, see www.tmr.com/~public/source program ptbuf. I wrote it 
as a proof of concept for a pthreads presentation I was giving, and it 
happened to be useful.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


glitch1 results - 2.6.21.3-cfs-v15

2007-06-04 Thread Bill Davidsen
I have added cfs15 to the chart at 
www.tmr.com/~davidsen/sched_smooth_05.html and updated the source of the 
test at www.tmr.com/~public/source if anyone wants to run test on their 
hardware.


I feel that on my hardware cfs-13 was the smoothest for this test and 
for watching videos. Even relatively light load:

  nice -10 make -j4 -s
of a kernel would cause jumps on the video, gears or youtube.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] resume doesn't run suspended kernel?

2007-06-05 Thread Bill Davidsen

Stefan Seyfried wrote:

Hi,

On Sat, May 26, 2007 at 06:42:37PM -0400, Bill Davidsen wrote:
  

I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing
environment (RCE) needs. The idea is that if power fails, after some short
time on UPS the system does susp2disk with a time set, and boots back every
so often to see if power is stable.



Interesting use case.
 
  

No, I don't want susp2mem until I debug it, console come up in useless mode,
console as kalidescope is not what I need.



You probably need to reset the video mode. Try the s2ram workaround,
specifically -m.

  

Anyway, I pulled the plug on the UPS, and the system shut down. But when it
powered up, it booted the default kernel rather than the test kernel, decided
that it couldn't resume, and then did a cold boot.

I can bypass this by making the debug kernel the default, but WHY? Is the
kernel not saved such that any kernel can be rolled back into memory and run?



The Kernel does nothing to the bootloader during suspend. The kernel does not
even know that you are using a bootloader and how it might be configured.

  
What I really expected is that what I was running would be save, and 
resume would restore what I was running and then jump back to where that 
suspended itself. Without having to address the issue of booting the 
right kernel, but having any functional kernel which was booted then 
restore whar was originally suspended.


From discussion here, I conclude that it could work that way but doesn't.

Userland has to do this (and SUSE's pm-utils actually do. I thought the
Fedora pm-utils also did, but i cannot say for sure). Just find out which
entry in menu.lst corresponds to the currently running kernel, and preselect
it for the next boot. It is doable.

So it's a problem of your distro's userland (and if you did not use
pm-hibernate to suspend, it is your very own problem).

You could of course simply go for GRUB's default saved and savedefault
feature, to always boot the last-booted kernel unless changed in the menu.
  
I'm being very careful to avoid changing the default boot kernel. If the 
system suspends (ie. deliberately) I want to resume in the running 
kernel, but if it crashes I want the cold boot to bring up a known 
stable kernel, even though that may be lacking in features, have an old 
scheduler, etc.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 04/18] include/linux/logfs.h

2007-06-05 Thread Bill Davidsen

Segher Boessenkool wrote:

It would be better if GCC had a 'nopadding' attribute which gave us what
we need without the _extra_ implications about alignment.


That's impossible; removing the padding from a struct
_will_ make accesses to its members unaligned (think
about arrays of that struct).
And many platforms happily support unaligned CPU access in hardware at a 
price in performance, while other support it in software at great cost 
in performance. None of that maps into impossible, Some i/o hardware may 
not support at all and require some bounce buffering, at cost in memory 
and CPU.


None of that equates with impossible. It is readily argued that it could 
mean inadvisable on some architectures, slow as government assistance 
and ugly as the north end of a south-bound hedgehog, but it's not 
impossible.


Do NOT take this to mean I think it would be a good thing in a Linux 
kernel, or that it should be added to gcc, but in some use like embedded 
applications where memory use is an important cost driver, people are 
probably doing it already by hand to pack struct arrays into minimal 
bytes. It's neither impossible nor totally useless.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Extend Linux to support proportional-share scheduling

2007-06-06 Thread Bill Davidsen

Willy Tarreau wrote:

On Tue, Jun 05, 2007 at 09:31:33PM -0700, Li, Tong N wrote:
  

Willy,

These are all good comments. Regarding the cache penalty, I've done some
measurements using benchmarks like SPEC OMP on an 8-processor SMP and
the performance with this patch was nearly identical to that with the
mainline. I'm sure some apps may suffer from the potentially more
migrations with this design. In the end, I think what we want is to
balance fairness and performance. This design currently emphasizes on
fairness, but it could be changed to relax fairness when performance
does become an issue (which could even be a user-tunable knob depending
on which aspect the user cares more).



Maybe storing in each task a small list of the 2 or 4 last CPUs used would
help the scheduler in trying to place them. I mean, let's say you have 10
tasks and 8 CPUs. You first assign tasks 1..8 CPUs 1..8 for 1 timeslice.
Then you will give 9..10 a run on CPUs 1..2, and CPUs 3..8 will be usable
for other tasks. It wil be optimal to run tasks 3..8 on them. Then you will
stop some of those because they are in advance, and run 9..10 and 1..2
again. You'll have to switch 1..2 to another group of CPUs to maintain hot
cache on CPUs 1..2 for tasks 9..10. But another possibility would be to
consider that 9..10 and 1..2 have performed the same amount of work, so
let's 9..10 take some advance and benefit from the hot cache, then try to
place 1..2 there again. But it will mean that 3..8 will now have run 2
timeslices more than others. At this moment, it should be wise to make
them sleep and keep their CPU history for future use.

Maybe on end-user systems, the CPUs history is not that important because
of the often small caches, but on high-end systems with large L2/L3 caches,
I think that we can often keep several tasks in the cache, justifying the
ability to select one of the last CPUs used.

  
CPU affinity to preserve cache is a very delicate balance. It makes 
sense to try to run a process on the same CPU, but since even a few ms 
of running some other process is long enough to refill the cache with 
new contents (depending on what it does, obviously) that long delays in 
running a process to get it on the right CPU are not always a saving, 
using the previous CPU becomes less beneficial rapidly.


Some omnipotent scheduler would have a count of pages evicted from cache 
as process A runs, and deduct that from the affinity of process B 
previously on the same CPU. Then make a perfect decision when it's 
better to migrate the task and how far. Since the schedulers now being 
advanced are fair rather than perfect, everyone is making educated 
guesses on optimal process migration policy, migrating all threads to 
improve cache hit vs. spread them to better run threads in parallel, etc.


For a desktop I want a scheduler which doesn't suck at the things I do 
regularly. For a server I'm more concerned with overall tps than the 
latency of one transaction. Most users would trade a slowdown in kernel 
compiles for being able to watch youtube while the compile runs, and 
conversely people with heavily loaded servers would usually trade a 
slower transaction for more of them per second. Obviously within 
reason... what people will tolerate is a bounded value.

Not an easy thing to do, but probably very complementary to your work IMHO.
  

Agree, not easy at all.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [md-accel PATCH 00/19] md raid acceleration and the async_tx api

2007-06-27 Thread Bill Davidsen

Dan Williams wrote:

Greetings,

Per Andrew's suggestion this is the md raid5 acceleration patch set
updated with more thorough changelogs to lower the barrier to entry for
reviewers.  To get started with the code I would suggest the following
order:
[md-accel PATCH 01/19] dmaengine: refactor dmaengine around 
dma_async_tx_descriptor
[md-accel PATCH 04/19] async_tx: add the async_tx api
[md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside 
sh-lock
[md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx 
raid engines

The patch set can be broken down into three main categories:
1/ API (async_tx: patches 1 - 4)
2/ implementation (md changes: patches 5 - 15)
3/ driver (iop-adma: patches 16 - 19)

I have worked with Neil to get approval of the category 2 changes.
However for the category 1 and 3 changes there was no obvious
merge-path/maintainer to work through.  I have thus far extrapolated
Neil's comments about 2 out to 1 and 3, Jeff gave some direction on a
early revision about the scalability of the API, and the patch set has
picked up various fixes and suggestions from being in -mm for a few
releases.  Please help me ensure that this code is ready for Linus to
pull for 2.6.23.

git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus
  


Dan, I hope you will release these as a patchset against 2.6.22 when 
it's out or 2.6.21. I find I have a lot more confidence in results, good 
or bad, when comparing something I have run in production with just one 
patchset added. There are enough other changes in an -rc to confuse the 
issue, and I don't run them in production (at least not usually).


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How would I do this? (expert tricks) OT

2007-06-27 Thread Bill Davidsen

Marc Perkel wrote:

I have a server with port 25 closed. I was to be able
to run a script every time someone tries to connect to
port 25, but from the outside the port remains closed.
I need the script that I'm going to run get the IP
address that tried to connect.

I know it's off topic but it's part of an experiment
to stop spam. 


Put a rule in iptables to jump to a user table to do a log and drop. You 
are doing it the wrong way, you want to set syslog to write the log 
message to a FIFO and have a permanent running program reading it (I do 
just this for other things).


Alternatively you can use redirect to send it to a program of your 
choosing, which can run a script if you really want to. Beware that rate 
limiting is desirable if you are going to start a process for ANY type 
of attack packets.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about fair schedulers

2007-06-27 Thread Bill Davidsen

Alberto Gonzalez wrote:

On Saturday 23 June 2007, Tom Spink wrote:

Alberto,

If you're feeling adventurous, grab the latest kernel and patch it
with Ingo's scheduler: CFS.

You may be pleasantly surprised.


Thanks, I might if I have to courage to patch and compile my own kernel :)

However, I'd also need to change all my applications to set them with the 
right priority to see the good results, so I think I might just wait until it 
lands in mainline.


In general not the case. I generally don't diddle my priorities, there's 
rarely a need.


Just to check if I understood everything correctly:

The mainline scheduler tries to be smart and guess the priority of each task, 
and while it mostly hits the nail right in the head, sometimes it hits you 
right in the thumb.


Fair schedulers, on the contrary, forget about trying to be smart and just 
care about being fair, leaving the priority settings to where they belong: 
applications.


Is this more or less correct?


Incomplete. The CFS scheduler seems to do better with latency, so you 
may get less CPU to a process but it doesn't wind up waiting a long time 
to get a fair share. So it feels better without micro tuning.


Face it, if you have more jobs than CPU no scheduler is going to make 
you really happy.


Alberto.




--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New format Intel microcode...

2007-06-28 Thread Bill Davidsen

Andi Kleen wrote:

Daniel J Blueman [EMAIL PROTECTED] writes:


On 23/03/07, Shaohua Li [EMAIL PROTECTED] wrote:

On Thu, 2007-03-22 at 23:45 +, Daniel J Blueman wrote:

Hi Shao-hua,

Is the tool you mentioned last June [1] available for splitting up the
old firmware files to the new format (eg
/lib/firmware/intel-ucode/06-0d-06), or are updates available from
Intel (or otherwise) in this new format?

Yes, we are preparing the new format data files and maybe put it into a
new website. We will announce it when it's ready.

It's been a while; is there any sign of the ucode updates being
available, especially in light of the C2D/Q incorrect TLB invalidation
+ recent ucode to fix this?


That microcode update is not needed on any recent Linux kernel; it flushes
the TLBs in a way that is fine.

Slashdot carried an article this morning saying that an error in Intel 
microcode was being fixed. However, it listed only Windows related sites 
for the fix download. Is this the same TLB issue? And are these really 
fixes for Windows to flush the TLB properly the way Linux does?


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New format Intel microcode...

2007-06-28 Thread Bill Davidsen

Andi Kleen wrote:
Slashdot carried an article this morning saying that an error in Intel 
microcode was being fixed. However, it listed only Windows related sites 



That's a little misleading. Always dangerous getting your information
from slashdot. Let's say Intel clarified some corner 
cases in TLB flushing that have changed with Core2 and not everybody

got that right. I wouldn't say it was a Intel bug though.

  
Given that the Slashdot note was a pointer to Microsoft and echo of 
their statements of a firmware fix, and that same information is on the 
Microsoft site, I find it hard to find fault with them as a source for 
pointers and some context on why they might be useful. If Intel has 
released new microcode to address the issue, then it seems the code 
didn't function as desired, and it doesn't matter what you call it.
for the fix download. Is this the same TLB issue? And are these really 



I think so.

  

That was one question.

fixes for Windows to flush the TLB properly the way Linux does?



On newer Linux 2.6 yes. On 2.4/x86-64 you would need in theory the microcode 
update too.  (it'll probably show up at some point at the usual place 
http://urbanmyth.org/microcode/).  Linux/i386 is always fine.


But the problem is very obscure and you can likely ignore it too. If your 
machine crashes it's very likely something else.
  


I don't ignore anything I can fix. An ounce of prevention is worth a 
pound of cure. My systems don't currently crash, and that's the intended 
behavior.


I was mainly concerned with this being a new issue, and curious if 
Microsoft was calling an O/S bug a microcode fix, given that the 
average Windows user doesn't know microcode from nanotech anyway. The 
non-answer from Arjan didn't answer either, and started by calling the 
report FUD, implying that Slashdot was wrong (not about this), and 
issuing so little answer and so much obfuscation that I thought he might 
be running for President. ;-)


I'd like the microcode update, some people elsewhere speculate that user 
level code could effect reliability if not security. I worry that an old 
2.4 kernel would be an issue, even in kvm, if that were the case.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: man-pages-2.59 and man-pages-2.60 are released

2007-06-28 Thread Bill Davidsen

Michael Kerrisk wrote:

Alexander,


I just released man-pages-2.59 and man-pages-2.60.

These releases are now available for download at:

http://www.kernel.org/pub/linux/docs/manpages


Yes, just this morning I decided to tidy away some of the old
tarballs into a newly created old directory.
 
There is one little problem with this: there is no stable URL for a given 
version. 


Well, there never really was.  To date, most old tarballs have
had only a limited life on kernel.org.

Why? I'm not questioning the policy, it's just that if HUGE kernel 
versions are kept available forever, a tiny man page tar would not seem 
to be a disk space issue.


This hurts, e.g., automated Linux From Scratch rebuilds (the 
official script grabs the URL from the book, but it becomes invalid too

soon).

Could you please, in order to avoid this, do what SAMBA team does: place 
into http://www.kernel.org/pub/linux/docs/manpages/Old not only old 
versions, but also the current version? This way, LFS will be sure that
the  2.60 version is always available as 


As noted above old versions never were always available on
kernel.org...

http://www.kernel.org/pub/linux/docs/manpages/Old/man-pages-2.60.tar.bz2 
(even if it is in fact the latest version).


How about a link in /pub/linux/docs/manpages/ of the form 
LATEST-IS-m.xy?  Rob Landley was wanting something like this,

and I guess it would be easy for LFS to build a simple
script that looks for that link and deduces man-pages-m.xy 
from it.  (I've just now created such a link in the directory,

as an example.)

Why not just a link with a fixed name (LATEST?) which could be updated? 
I assume installing a new version is automated to create and install the 
tar, any needed links, the push to mirrors, etc. So it would just be a 
single step added to an automated procedure. You could have a link in 
Old as requested, and any other links as well.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New format Intel microcode...

2007-06-28 Thread Bill Davidsen

Chuck Ebbert wrote:

On 06/28/2007 11:27 AM, Andi Kleen wrote:
But the problem is very obscure and you can likely ignore it too. If your 
machine crashes it's very likely something else.


What about deliberate exploits of these bugs from userspace? Theo thinks
they are possible...

Do you have any details? One of the folks in a chat was saying something 
similar, but thought that causing as crash was the extent of it, rather 
than any access violation. Obviously I don't know the extent of that 
claim, so more information would be good.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v18

2007-07-02 Thread Bill Davidsen

Vegard Nossum wrote:

Hello,

On 6/23/07, Ingo Molnar [EMAIL PROTECTED] wrote:

i'm pleased to announce release -v18 of the CFS scheduler patchset.



As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome!


I have been running cfs-v18 for a couple of days now, and today I
stumbled upon a rather strange problem. Consider the following short
program:

while(1)
   printf(%ld\r, 1000 * clock() / CLOCKS_PER_SEC);

Running this in an xterm makes the xterm totally unresponsive. Ctrl-C
takes about two seconds to terminate the program, during which the
program will keep running. In fact, it seems that the longer it runs,
the longer it takes to terminate (towards 5 seconds after running for
a couple of minutes). This is rather surprising, as the rest of the
system is quite responsive (even remarkably so). I think this is also
in contrast with the expected behaviour, that Ctrl-C/program
termination should be prioritized somehow.

This sounds as though it might be related to the issues I see with my 
glitch1 script, posted here a while ago. With cfs-v18 the effect of 
having multiple xterms scrolling is obvious, occasionally they behave as 
if they were owed more CPU and get paid back all at once. I've seen 
this effect to one degree or another since cfs-v13, which did NOT show 
the effect.



Some other observations: X.Org seems to be running at about 75% CPU on
CPU 1, the xterm at about 45% on CPU 0, and a.out at about 20% on CPU
0. (HT processor)

Killing with -2 or -9 from another terminal works immediately. Ctrl-Z
takes the same time as Ctrl-C.

I think this is because the shell to read the keypress is getting high 
latency, rather than the process taking a long time to react. I have 
been wrong before...


I read Ingo's reply to this, I'll gather the same information when the 
test machine is available later this morning and send it off to Ingo.



Another thing to note is that simply looping with no output retains
the expected responsiveness of the xterm. Printing i++ is somewhere
halfway in between.


See http://www.tmr.com/~public/source  (note the tilde) for glitch1.


Is this behaviour expected or even intended? My main point is that
Ctrl-C is a safety fallback which suddenly doesn't work as usual. I
might even go so far as to call it a regression.

I'd also like to point out that [EMAIL PROTECTED] seems to draw more CPU
than it should. Or, at least, in top, it shows up as using 50% CPU
even though other processes are demanding as much as they can get. The
FAH program should be running with idle priority. I expect it to fall
to near 0% when other programs are running at full speed, but it keeps
trotting along. And I am pretty sure that this is not due to SMP/HT (I
made sure to utilize both CPUs).

Lastly, I'd like to mention that I got BUGs (soft lockups) with -v8,
though it has not been reproducible with -v18, so I suppose it must
have been fixed already.

Otherwise, I am satisfied with the performance of CFS. Especially the
desktop is noticably smoother. Thanks!

Kind regards,
Vegard Nossum



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Old bug in tg3 driver unfixed?

2007-07-02 Thread Bill Davidsen

Tim Boneko wrote:

Hello!
I am not subscribed to this list so please CC answers to my mail
address. THX!

I recently replaced the mainboard of one of my servers with a Tyan
Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After
compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface
could not be configured: Device not found.
While searching the net i found a few other people with the same problem
but no solution.

By coincidence i found that a simpe ifconfig eth1 worked OK and
afterwards the device could be configured and used as desired. After
searching this list, i found this posting


Probably unrelated, but what's eth0?


http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html

by someone with obviously the same problem.
Has some patch of the driver been reversed or is the hardware buggy?

BTW the chip is connected via PCI Express.

I have that chip in a system, but I didn't find it quickly, it may be at 
another location, unless the controller which shows up as 3C940 on my 
ASUS P4P800 is the Broadcom.



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the overdue eepro100 removal

2007-07-02 Thread Bill Davidsen

Adrian Bunk wrote:

This patch contains the overdue removal of the eepro100 driver.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

The hardware supported by this driver is still in use, thanks. It's 
probably easier to leave the eepro100 driver in than find anyone who 
wants to investigate why the other driver (e100? from memory) doesn't 
work with some cards. As I recall this was suggested over a year ago and 
it was decided to leave it in, all of the reasons for doing so still 
seem valid. There really doesn't seem to be a benefit, it's not like 
people are working night and day to support new cards for this chip.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspend2 is getting a new name.

2007-07-02 Thread Bill Davidsen

Nigel Cunningham wrote:

Hi all.

Suspend2's name is changing to TuxOnIce.

This is for a couple of reasons:

In recent discussions on LKML, the point was made that the word Suspend is 
confusing. It is used to refer to both suspending to disk and suspending to 
ram. Life will be simpler if we more clearly differentiate the two.


The name Suspend2 came about a couple of years ago when we made the 2.0 
release and started self-hosting. If we ever get to a 3.0 release, the name 
could become even more confusing! (And there are already problems with people 
confusing the name with swsusp and talking about uswsusp as version 3!).


http://www.suspend2.net is still working at the moment, but we'll shift to 
http://www.tuxonice.net over the next while. The wiki and bugzilla are 
already done; email will remain on suspend2.net for a little while and git 
trees will be renamed at the time of the next stable release.


I guess this is good news, bad news time. The good news is that the 
suspend with working resume project is still active, the bad news is 
that making provisions for long term out of mainline operation sounds as 
if you have no hope of getting this code into the mainline kernel. :-(


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-07-05 Thread Bill Davidsen

Zoltan Boszormenyi wrote:

Hi,

Zoltan Boszormenyi írta:

Hi,

I am testing your current code with akpm's beautifying patches
for about an hour now. I have seen no problems with it so far.


Still using the patch on 2.6.22-rc6 and no problems so far.
It's really stable. I am looking forward to the next version and
the inclusion into mainstream kernels. Thanks!

I am going to hold off any more -rc testing, but if there's a patch 
against 2.6.22 when it releases, I would certainly try it on a system 
which is about to be redeploted. I'm also scheduling testing of several 
RAID queueing patches there as well.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: blink driver power saving

2007-07-05 Thread Bill Davidsen

Pavel Machek wrote:


...drivers are not expected to act on their own. I was expecting to
get nice /sys/class/led* interface to my keyboard leds.

What's the benefit of such an interface? If you're able to trigger
keyboard LEDs via that interface, you're also able to use the ioctl()
on /dev/console.


Well, at least it is standartized interface... plus it can do stuff
like blink that led on disk access.

One of many useful things for system without blinking lights, disk 
network, thermal alert, etc. And a cheap helper for handicapped folks 
who can't hear an audible alert.



I think the intention of the blink driver was to have a *early* blink,
i.e. before initrd (and on systems without intrd, before the first
init script runs).


...and yes, it can autoblink, too. It should be even possible to set
default behaviour of led to blink, doing what the blink driver does,
but in a clean way.


Endlessly useful, alarm clock, non-fatal errors on boot, etc.

it would be nice
If this were done, priority levels would be nice, so the I'm taking a 
dump or panic would block lower level system use like disk or network 
lights, and user applications would have some policy to put them higher 
or lower than the pseudo disk light (or not).

/not nice

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-22 Thread Bill Davidsen

Miguel Figueiredo wrote:

Bill Davidsen wrote:
I generated a table of results from the latest glitch1 script, using 
an HTML postprocessor I not *quite* ready to foist on the word. In 
any case it has some numbers for frames per second, fairness of the 
processor time allocated to the compute bound processes which 
generate a lot of other screen activity for X, and my subjective 
comments on how smooth it looked and felt.


The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for 
your viewing pleasure. The only tuned result was with sd, since 
what I observed was so bad using the default settings. If any 
scheduler developers would like me to try other tunings or new 
versions let me know.




As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 on 
the same machine i found *very* odd those numbers you posted, so i 
tested myself those kernels to see the numbers I get instead of 
talking about the usage of kernel xpto feels like.


I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2 
inside Debian's GNOME environment. The hardware is an AMD Sempron64 
3.0 GHz, 1 GB RAM, Nvidia 6800XT.

Average and standard deviation from the gathered data:

* 2.6.21: average = 11251.1; stdev = 0.172
* 2.6.21-cfs-v13:average = 11242.8; stdev = 0.033
* 2.6.21-ck2:average = 11257.8; stdev = 0.067

Keep in mind those numbers don't mean anything we all know glxgears is 
not a benchmark, their purpose is only to be used as comparison under 
the same conditions.


One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time applet 
in the bar skipped some minutes (e.g. 16:23 - 16:25) several times.


The data is available on: 
http://www.debianPT.org/~elmig/pool/kernel/20070520/



How did you get your data? I am affraid your data it's wrong, there's 
no  such big difference between the schedulers...


The glitch1 script starts multiple scrolling xterms at the same time as 
the glxgears, and allows observation of smoothness of the gears. It's 
not a benchmark, although the fps is reported since fast or slow and 
scheduler with fair aspirations should have similar results in 5 sec 
time slices, and between multiple CPU-bound xterms scrolling  with the 
same code. The comments column can be used to report the user 
impressions, since that's the important thing if you want to listen to 
music or watch video.


Perhaps my data appear wrong because you have failed to measure the 
same thing?


You can get the most recent info at http://www.tmr.com/~public/source/ 
if you want to duplicate the test on your hardware, or view the most 
recent tests at http://www.tmr.com/~davidsen/sched_smooth_03.html to see 
what the data look like when you run the same test.


Note: there have been some minor changes in the test and analysis 
resulting from suggestions, only the recent results are worth investigating.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


lzo code

2007-05-22 Thread Bill Davidsen

It is derived from original LZO 2.02 code found at:
http://www.oberhumer.com/opensource/lzo/download/
The code has also been reformatted to match general kernel style.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Increased ipw2200 power usage with dynticks

2007-05-22 Thread Bill Davidsen

Björn Steinbrink wrote:

On 2007.05.20 20:55:35 +0200, Andi Kleen wrote:

Björn Steinbrink [EMAIL PROTECTED] writes:

Ok, it seems that ipw2200 is just a trigger for the problem here. AFAICT
the cause of the worse C state usage is that after ipw2200 has woken the
cpu, acpi_processor_idle() chooses C2 (due to dma? bm? I have no
idea...) as the prefered sleep state. Now without NO_HZ or when I hold
down a key, there are interrupts that wake up the CPU and when
acpi_processor_idle() is called again the promotion to C3/C4 happens.
But with NO_HZ, there are no such interrupts, most wakeups are caused by
ipw2200 and so the processor doesn't go any deeper than C2 most of the
time and thus wastes lots of power.

The cpuidle governour code Venki is working on is supposed to address this.
There have been also earlier prototype patches by Adam Belay and
Thomas Renninger.


Venki (at least I think it was him) also told me about cpuidle and the
menu governor on #powertop. Unfortunately, cpuidle seems to be gone from
acpi-test (or I'm simply still too stupid for git/gitweb). I manually
added the cpuidle and menu governor patches on top of my 2.6.22-rc1-hrt8
kernel, but that broke C-state duration accounting.

On the bright side of things is power usage though, which is down to an
incredible 13.9W in idle+ipw2200 :)

Very encouraging, hopefully that can get into mainline soon, as power 
usage is an issue with laptops. Until then, it sounds as if dynticks is 
a negative power save for ipw2200 (and probably many other things).


Dare we hope that this will allow use of USB on laptops without draining 
the battery?


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-22 Thread Bill Davidsen

Miguel Figueiredo wrote:

Ray Lee wrote:

On 5/20/07, Miguel Figueiredo [EMAIL PROTECTED] wrote:

As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 on the
same machine i found *very* odd those numbers you posted, so i tested
myself those kernels to see the numbers I get instead of talking about
the usage of kernel xpto feels like.

I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2
inside Debian's GNOME environment. The hardware is an AMD Sempron64 3.0
GHz, 1 GB RAM, Nvidia 6800XT.
Average and standard deviation from the gathered data:

* 2.6.21:   average = 11251.1; stdev = 0.172
* 2.6.21-cfs-v13:   average = 11242.8; stdev = 0.033
* 2.6.21-ck2:   average = 11257.8; stdev = 0.067

Keep in mind those numbers don't mean anything we all know glxgears is
not a benchmark, their purpose is only to be used as comparison under
the same conditions.


Uhm, then why are you trying to use them to compare against Bill's
numbers? You two have completely different hardware setups, and this
is a test that is dependent upon hardware. Stated differently, this is
a worthless comparison between your results and his as you are
changing multiple variables at the same time. (At minimum: the
scheduler, cpu, and video card.)


The only thing i want to see it's the difference between the behaviour 
of the different schedulers on the same test setup. In my test -ck2 was 
a bit better, not 200% worse as in Bill's measurements. I don't compare 
absolute values on different test setups.


Since I didn't test ck2 I'm sure your numbers are unique, I only tested 
the sd-0.48 patch set. I have the ck2 patch, just haven't tried it 
yet... But since there are a lot of other things in it, I'm unsure how 
it relates to what I was testing.



One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time applet in
the bar skipped some minutes (e.g. 16:23 - 16:25) several times.

The data is available on:
http://www.debianPT.org/~elmig/pool/kernel/20070520/


How did you get your data? I am affraid your data it's wrong, there's no
  such big difference between the schedulers...


It doesn't look like you were running his glitch1 script which starts
several in glxgears parallel. Were you, or were you just running one?


No i'm not, i'm running only one instance of glxgears inside the GNOME's 
environment.



If you test the same conditions as I did let me know your results.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-22 Thread Bill Davidsen

Jeff Zheng wrote:

 Fix confirmed, filled the whole 11T hard disk, without crashing.
I presume this would go into 2.6.22

Since it results in a full loss of data, I would hope it goes into 
2.6.21.x -stable.



Thanks again.

Jeff


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff Zheng

Sent: Thursday, 17 May 2007 5:39 p.m.
To: Neil Brown; [EMAIL PROTECTED]; Michal Piotrowski; Ingo 
Molnar; [EMAIL PROTECTED]; 
linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
Subject: RE: Software raid0 will crash the file-system, when 
each disk is 5TB



Yeah, seems you've locked it down, :D. I've written 600GB of 
data now, and anything is still fine.
Will let it run overnight, and fill the whole 11T. I'll post 
the result tomorrow


Thanks a lot though.

Jeff 


-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED]
Sent: Thursday, 17 May 2007 5:31 p.m.
To: [EMAIL PROTECTED]; Jeff Zheng; Michal Piotrowski; Ingo Molnar; 
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org; 
[EMAIL PROTECTED]
Subject: RE: Software raid0 will crash the file-system, 
when each disk 

is 5TB

On Thursday May 17, [EMAIL PROTECTED] wrote:

Uhm, I just noticed something.
'chunk' is unsigned long, and when it gets shifted up, we

might lose
bits.  That could still happen with the 4*2.75T 
arrangement, but is 

much more likely in the 2*5.5T arrangement.

Actually, it cannot be a problem with the 4*2.75T arrangement.
  chuck  chunksize_bits

will not exceed the size of the underlying device *in*kilobytes*.
In that case that is 0xAE9EC800 which will git in a 32bit long.
We don't double it to make sectors until after we add
zone-dev_offset, which is sector_t and so 64bit 

arithmetic is used.
So I'm quite certain this bug will cause exactly the problems 
experienced!!



Jeff, can you try this patch?

Don't bother about the other tests I mentioned, just try this one.
Thanks.

NeilBrown


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid0.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c
--- .prev/drivers/md/raid0.c	2007-05-17 

10:33:30.0 +1000

+++ ./drivers/md/raid0.c2007-05-17 15:02:15.0 +1000
@@ -475,7 +475,7 @@ static int raid0_make_request (request_q
x = block  chunksize_bits;
tmp_dev = zone-dev[sector_div(x, zone-nb_dev)];
}
-   rsect = (((chunk  chunksize_bits) + zone-dev_offset)1)
+   rsect = sector_t)chunk  chunksize_bits) +
+zone-dev_offset)1)
+ sect_in_chunk;
  
 	bio-bi_bdev = tmp_dev-bdev;

-
To unsubscribe from this list: send the line unsubscribe 
linux-raid in the body of a message to 
[EMAIL PROTECTED] More majordomo info at  
http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SideWinder GameVoice driver

2007-05-22 Thread Bill Davidsen

Tomas Carnecky wrote:

Despite it's a Microsoft product, it's actually very nice and useful. A
little pad with a few buttons and connectors for a headset. It's an USB
device, but it doesn't represent itself as an input/HID device:
   HID device not claimed by input or hiddev

I plugged it into a windows box and the USB protocol it uses looks very
simple (see attachment): everytime I press one of the eight buttons, it
sends one byte, a bitmap of the pressed buttons.

What would be the best way to have this device appear in the system?
Having a separate driver/device node? Or is it possible to have a small
driver that would translate the gamevoice commands into evdev messages
and have a new /dev/input/eventX device appear?

I could write something like that myself, my C skills are good enough
for that, I'd just need some advice how to use the kernel USB/evdev
interfaces.

From your description it sounds as though it would be useful in 
applications where voice connect was useful and visual wasn't, such as 
blind users and embedded applications where a USB pluggable interface 
might be useful in unusual situations.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v13

2007-05-22 Thread Bill Davidsen

Anant Nitya wrote:

On Thursday 17 May 2007 23:15:33 Ingo Molnar wrote:

i'm pleased to announce release -v13 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc1, v2.6.21.1 or v2.6.20.10 can be
downloaded from the usual place:

 http://people.redhat.com/mingo/cfs-scheduler/

-v13 is a fixes-only release. It fixes a smaller accounting bug, so if
you saw small lags during desktop use under certain workloads then
please re-check that workload under -v13 too. It also tweaks SMP
load-balancing a bit. (Note: the load-balancing artifact reported by
Peter Williams is not a CFS-specific problem and he reproduced it in
v2.6.21 too. Nevertheless -v13 should be less prone to such artifacts.)

I know about no open CFS regression at the moment, so please re-test
-v13 and if you still see any problem please re-report it. Thanks!

Changes since -v12:

 - small tweak: made the fork flow of reniced tasks zero-sum

 - debugging update: /proc/PID/sched is now seqfile based and echoing
   0 to it clears the maximum-tracking counters.

 - more debugging counters

 - small rounding fix to make the statistical average of rounding errors
   zero

 - scale both the runtime limit and the granularity on SMP too, and make
   it dependent on HZ

 - misc cleanups

As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,

Ingo
-

Hi
Been testing this version of CFS from last an hour or so and still facing same 
lag problems while browsing sites with heavy JS and or flash usage. Mouse 
movement is pathetic and audio starts to skip. I haven't face this behavior 
with CFS till v11.



'm not seeing this, do have a site or two as examples?

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

2007-05-22 Thread Bill Davidsen

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:

I have posted the results of my initial testing, measuring IPC rates 
using various schedulers under no load, limited nice load, and heavy 
load at nice 0.


http://www.tmr.com/~davidsen/ctxbench_testing.html


nice! For this to become really representative though i'd like to ask 
for a real workload function to be used after the task gets the 
lock/message. The reason is that there is an inherent balancing conflict 
in this area: should the scheduler 'spread' tasks to other CPUs or not? 
In general, for all workloads that matter, the answer is almost always: 
'yes, it should'.


Added to the short to-do list. Note that this was originally simply a 
check to see which IPC works best (or at all) in an o/s. It has been 
useful for some other things, and an option for work will be forthcoming.


But in your ctxbench results the work a task performs after doing IPC is 
not reflected (the benchmark goes about to do the next IPC - hence 
penalizing scheduling strategies that move tasks to other CPUs) - hence 
the bonus of a scheduler properly spreading out tasks is not measured 
fairly. A real-life IPC workload is rarely just about messaging around 
(a single task could do that itself) - some real workload function is 
used. You can see this effect yourself: do a taskset -p 01 $$ before 
running ctxbench and you'll see the numbers improve significantly on all 
of the schedulers.


As a solution i'd suggest to add a workload function with a 100 or 200 
usecs (or larger) cost (as a fixed-length loop or something like that) 
so that the 'spreading' effect/benefit gets measured fairly too.



Can do.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

2007-05-22 Thread Bill Davidsen

William Lee Irwin III wrote:

On Thu, May 17, 2007 at 07:26:38PM -0400, Bill Davidsen wrote:
I have posted the results of my initial testing, measuring IPC rates 
using various schedulers under no load, limited nice load, and heavy 
load at nice 0.

http://www.tmr.com/~davidsen/ctxbench_testing.html


Kernel compiles are not how to stress these. The way to stress them is
to have multiple simultaneous independent chains of communicators and
deeper chains of communicators.

Kernel compiles are little but background cpu/memory load for these
sorts of tests.


Just so. What is being quantified is the rate of slowdown due to 
external load. I would hope that each IPC method would slow by some 
similar factor.



...  Something expected to have some sort of mutual
interference depending on quality of implementation would be a better
sort of competing load, one vastly more reflective of real workloads.
For instance, another set of processes communicating using the same
primitive.

The original intent was purely to measure IPC speed under no load 
conditions, since fairness is in vogue I also attempted to look for 
surprising behavior. Corresponding values under equal load may be useful 
in relation to one another, but this isn't (and hopefully doesn't claim 
to be) a benchmark. It may or may not be useful viewed in that light, 
but that's not the target.



Perhaps best of all would be a macrobenchmark utilizing a variety of
the primitives under consideration. Unsurprisingly, major commercial
databases do so for major benchmarks.

And that's a very good point, either multiple copies or more forked 
processes might be useful, and I do intend to add threaded tests on the 
next upgrade, but perhaps a whole new code might be better for 
generating the load you suggest.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-22 Thread Bill Davidsen

Miguel Figueiredo wrote:

Bill Davidsen wrote:

Miguel Figueiredo wrote:

Ray Lee wrote:

On 5/20/07, Miguel Figueiredo [EMAIL PROTECTED] wrote:
As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 
on the

same machine i found *very* odd those numbers you posted, so i tested
myself those kernels to see the numbers I get instead of talking 
about

the usage of kernel xpto feels like.

I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2
inside Debian's GNOME environment. The hardware is an AMD 
Sempron64 3.0

GHz, 1 GB RAM, Nvidia 6800XT.
Average and standard deviation from the gathered data:

* 2.6.21:   average = 11251.1; stdev = 0.172
* 2.6.21-cfs-v13:   average = 11242.8; stdev = 0.033
* 2.6.21-ck2:   average = 11257.8; stdev = 0.067

Keep in mind those numbers don't mean anything we all know 
glxgears is

not a benchmark, their purpose is only to be used as comparison under
the same conditions.


Uhm, then why are you trying to use them to compare against Bill's
numbers? You two have completely different hardware setups, and this
is a test that is dependent upon hardware. Stated differently, this is
a worthless comparison between your results and his as you are
changing multiple variables at the same time. (At minimum: the
scheduler, cpu, and video card.)


The only thing i want to see it's the difference between the 
behaviour of the different schedulers on the same test setup. In my 
test -ck2 was a bit better, not 200% worse as in Bill's 
measurements. I don't compare absolute values on different test setups.


Since I didn't test ck2 I'm sure your numbers are unique, I only 
tested the sd-0.48 patch set. I have the ck2 patch, just haven't 
tried it yet... But since there are a lot of other things in it, I'm 
unsure how it relates to what I was testing.


One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time 
applet in

the bar skipped some minutes (e.g. 16:23 - 16:25) several times.

The data is available on:
http://www.debianPT.org/~elmig/pool/kernel/20070520/


How did you get your data? I am affraid your data it's wrong, 
there's no

  such big difference between the schedulers...


It doesn't look like you were running his glitch1 script which starts
several in glxgears parallel. Were you, or were you just running one?


No i'm not, i'm running only one instance of glxgears inside the 
GNOME's environment.



If you test the same conditions as I did let me know your results.



Hi Bill,

if i've understood correctly the script runs glxgears for 43 seconds 
and in that time generates random numbers in a random number of times 
(processes, fork and forget), is that it?


No, I haven't made it clear. A known number (default four) of xterms are 
started, each of which calculates random numbers and prints them, using 
much CPU time and causing a lot of scrolling. At the same time glxgears 
is running, and the smoothness (or not) is observed manually. The script 
records raw data on the number of frames per second and the number of 
random numbers calculated by each shell. Since these are FAIR 
schedulers, the variance between the scripts, and between multiple 
samples from glxgears is of interest. To avoid startup effects the 
glxgears value from the first sample is reported separately and not 
included in the statistics.


I looked at your results, and they are disturbing to say the least, it 
appears that using the ck2 scheduler glxgears stopped for all practical 
purposes. You don't have quite the latest glitch1, the new one runs 
longer and allows reruns to get several datasets, but the results still 
show very slow gears and a large difference between the work done by the 
four shells. That's not a good result, how did the system feel?
You find the data, for 2.6.21-{cfs-v13, ck2} in 
http://www.debianpt.org/~elmig/pool/kernel/20070522/


Thank you, these results are very surprising, and I would not expect the 
system to be pleasing the use under load, based on this.

Here's the funny part...

Lets call:

a) to random number of processes run while glxgears is running, 
gl_fairloops file


It's really the relative work done by identical processes, hopefully 
they are all nearly the same, magnitude is interesting but related to 
responsiveness rather than fairness.
b) to generated frames while running a burst of processes aka 
massive and uknown amount of operations in one process, gl_gears file


Well, top or ps will give you a good idea of processing, but it tried to 
use all of one CPU if allowed. Again, similarity of samples reflects 
fairness and magnitude reflects work done.

kernel2.6.21-cfs-v132.6.21-ck2
a)194464254669   
b)54159124



Everyone seems to like ck2, this makes it look as if the video display 
would be really pretty unusable. While sd-0.48 does show an occasional 
video glitch when watching video under heavy load, it's annoying rather 
than unusable.


Your subjective

Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-23 Thread Bill Davidsen
I was unable to reproduce the numbers Miguel generated, comments below. 
The -ck2 patch seems to run nicely, although the memory repopulation 
from swap would be most useful on system which have a lot of memory 
pressure.


Bill Davidsen wrote:

Miguel Figueiredo wrote:


Hi Bill,

if i've understood correctly the script runs glxgears for 43 seconds 
and in that time generates random numbers in a random number of times 
(processes, fork and forget), is that it?


No, I haven't made it clear. A known number (default four) of xterms 
are started, each of which calculates random numbers and prints them, 
using much CPU time and causing a lot of scrolling. At the same time 
glxgears is running, and the smoothness (or not) is observed manually. 
The script records raw data on the number of frames per second and the 
number of random numbers calculated by each shell. Since these are 
FAIR schedulers, the variance between the scripts, and between 
multiple samples from glxgears is of interest. To avoid startup 
effects the glxgears value from the first sample is reported 
separately and not included in the statistics.


I looked at your results, and they are disturbing to say the least, it 
appears that using the ck2 scheduler glxgears stopped for all 
practical purposes. You don't have quite the latest glitch1, the new 
one runs longer and allows reruns to get several datasets, but the 
results still show very slow gears and a large difference between the 
work done by the four shells. That's not a good result, how did the 
system feel?
You find the data, for 2.6.21-{cfs-v13, ck2} in 
http://www.debianpt.org/~elmig/pool/kernel/20070522/


Thank you, these results are very surprising, and I would not expect 
the system to be pleasing the use under load, based on this.

Here's the funny part...

Lets call:

a) to random number of processes run while glxgears is running, 
gl_fairloops file


It's really the relative work done by identical processes, hopefully 
they are all nearly the same, magnitude is interesting but related to 
responsiveness rather than fairness.
b) to generated frames while running a burst of processes aka 
massive and uknown amount of operations in one process, gl_gears file


Well, top or ps will give you a good idea of processing, but it tried 
to use all of one CPU if allowed. Again, similarity of samples 
reflects fairness and magnitude reflects work done.

kernel2.6.21-cfs-v132.6.21-ck2
a)194464254669   b)54159124


Everyone seems to like ck2, this makes it look as if the video display 
would be really pretty unusable. While sd-0.48 does show an occasional 
video glitch when watching video under heavy load, it's annoying 
rather than unusable.


I spent a few hours running the -ck2 patch, and I didn't see any numbers 
like yours. What I did see is going up with my previous results as 
http://www.tmr.com/~davidsen/sched_smooth_04.html. While there were 
still some minor pauses in glxgears with my test, performance was very 
similar to the sd-0.48 results. And I did try watching video with high 
load, without problems. Only when I run a lot of other screen-changing 
processes can I see pauses in the display.
Your subjective impressions would be helpful, and you may find that 
the package in the www.tmr.com/~public/source is slightly easier to 
use and gives more stable results. The documentation suggests the way 
to take samples (the way I did it) but if you feel more or longer 
samples would help it is tunable.


I added Con to the cc list, he may have comments or suggestions 
(against the current versions, please). Or he may feel that video 
combined with other heavy screen updating is unrealistic or not his 
chosen load. I'm told the load is similar to games which use threads 
and do lots of independent action, if that's a reference.



I'll include the -ck2 patch in my testing on other hardware.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-23 Thread Bill Davidsen

Con Kolivas wrote:

On Wednesday 23 May 2007 10:28, Bill Davidsen wrote:
  

kernel2.6.21-cfs-v132.6.21-ck2
a)194464254669
b)54159124
  

Everyone seems to like ck2, this makes it look as if the video display
would be really pretty unusable. While sd-0.48 does show an occasional
video glitch when watching video under heavy load, it's annoying rather
than unusable.



That's because the whole premise of your benchmark relies on a workload that 
yield()s itself to the eyeballs on most graphic card combinations when using 
glxgears. Your test remains a test of sched_yield in the presence of your 
workloads rather than anything else. If people like ck2 it's because in the 
real world with real workloads it is better, rather than on a yield() based 
benchmark. Repeatedly the reports are that 3d apps and games in normal usage 
under -ck are better than mainline and cfs.
  
I have to admit that I call in the teen reserves to actually get good 
feedback on games, but I do watch a fair number of videos and under high 
load I find sd acceptable and cfs totally smooth. The next time my game 
expert comes to visit I'll get some subjective feedback. My use of 
glxgears was mainly intended to use something readily available, and 
which gave me the ability to make both subjective and objective evaluations.


My -ck2 results certainly show no significant difference from sd-0.48, I 
suspect that on a machine with less memory the swap reload would be more 
beneficial.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-23 Thread Bill Davidsen

Michael Gerdau wrote:
That's because the whole premise of your benchmark relies on a workload that 
yield()s itself to the eyeballs on most graphic card combinations when using 
glxgears. Your test remains a test of sched_yield in the presence of your 
workloads rather than anything else. If people like ck2 it's because in the 
real world with real workloads it is better, rather than on a yield() based 
benchmark. Repeatedly the reports are that 3d apps and games in normal usage 
under -ck are better than mainline and cfs.



While I can't comment on the technical/implementational details of
Con's claim I definitely have to agree from a users POV.

  
Any of the sd/ck/cfs schedulers are an improvement on the current 
mainline, and hopefully they will continue to cross pollinate and 
evolve. Perhaps by 2.6.23 a clear best will emerge, or Linus will 
change his mind and make sd and cfs be compile options at build time.

All my recent CPU intensive benchmarks show that both ck/sd and cfs
are very decent scheduler and IMO superior to mainline for all _my_
usecases. In particular playing supertux while otherwise fully utilizing
both CPUs on a dualcore works without any glitch and better than
on mainline for both sd and cfs.

  
I did some kernel compile timing numbers as part of my work with 
ctxbench, and there is little to choose between the schedulers under 
load, although the special case for sched_yield makes some loads perform 
better with cfs. With large memory and fast disk, a kernel make becomes 
a CPU benchmark, there's virtually no iowait not filled with another 
process.

For me the huge difference you have for sd to the others increases the
likelyhood the glxgears benchmark does not measure scheduling of graphic
but something else.

  
The glitch1 script generates a number of CPU bound processes updating 
the screen independently, which stresses both graphics performance and 
scheduler fairness. And once again I note that it's a *characterization* 
rather than a benchmark. The ability of the scheduler to deliver the 
same resources to multiple identical processes, and to keep another CPU 
bound process (glxgears) getting the processor at regular intervals is 
more revealing than the frames per second or loops run.


I would expect sd to be better at this, since it uses a deadline 
concept, but in practice the gears pause, and then move rapidly or 
appear to jump. My reading on this is that the process starves for some 
ms, then gets a lot of CPU because it is owed more. I think I see this 
in games, but not being a game player I can't tell from experience if 
it's artifact or the games suck. That's what my test rig, based on a 15 
year old boy and several cans of high caffeine soda, is used for. ;-)

Anyway, I'm still in the process of collecting data or more precisely
until recently constantly refined what data to collect and how. I plan
to provide new benchmark results on CPU intensive tasks in a couple of
days.
  

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-23 Thread Bill Davidsen

Miguel Figueiredo wrote:

Bill Davidsen wrote:
I was unable to reproduce the numbers Miguel generated, comments 
below. The -ck2 patch seems to run nicely, although the memory 
repopulation from swap would be most useful on system which have a 
lot of memory pressure.


I spent a few hours running the -ck2 patch, and I didn't see any 
numbers like yours. What I did see is going up with my previous 
results as http://www.tmr.com/~davidsen/sched_smooth_04.html. While 
there were still some minor pauses in glxgears with my test, 
performance was very similar to the sd-0.48 results. And I did try 
watching video with high load, without problems. Only when I run a 
lot of other screen-changing processes can I see pauses in the display.
Your subjective impressions would be helpful, and you may find that 
the package in the www.tmr.com/~public/source is slightly easier to 
use and gives more stable results. The documentation suggests the 
way to take samples (the way I did it) but if you feel more or 
longer samples would help it is tunable.


I added Con to the cc list, he may have comments or suggestions 
(against the current versions, please). Or he may feel that video 
combined with other heavy screen updating is unrealistic or not his 
chosen load. I'm told the load is similar to games which use threads 
and do lots of independent action, if that's a reference.



I'll include the -ck2 patch in my testing on other hardware.



Hi Bill,

 the numbers i posted before are repeatable on that machine.

The numbers you posted in [EMAIL PROTECTED] are not the 
same... From my inbox I grab some very non-matching values:

=
Here's the funny part...

Lets call:

a) to random number of processes run while glxgears is running, 
gl_fairloops file


b) to generated frames while running a burst of processes aka massive 
and uknown amount of operations in one process, gl_gears file


kernel2.6.21-cfs-v132.6.21-ck2
a)194464254669   
b)54159124

=

The numbers in your glitch1.html file show a close correlation for cfs 
and -ck2, well within what I would expect. The stddev for the loops is 
larger for -cf2, but not out of line with what I see, and nothing like 
the numbers you originally sent me (which may have been testing 
something else, or from an old version before I made improvements, or 
???). In any case thanks for testing.




I did run, again, glitch1 on my laptop (T2500 CoreDuo, also Nvidia) 
please check: http://www.debianpt.org/~elmig/pool/kernel/20070523/




Thanks, those data seem as expected.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Race free attributes in sysfs

2007-05-26 Thread Bill Davidsen

Greg KH wrote:

On Wed, May 23, 2007 at 09:27:12AM -0400, Mark Lord wrote:

 Greg KH wrote:

And yes, it only starts to look for things when it recieves an event, it
does not scan sysfs at all.

 Does it look for only that one event, or does it scan at that point?


udev will act on that event, and as I mentioned, not read anything from
sysfs at all, unless a custom rule is in the rules file asking it to
read a specific sysfs file in the tree.

So no scanning happens unless specificically asked for.

And as mentioned, udev can work just fine without sysfs enabled at all
now, with the exception of some custom rules for some devices.

I think what Mark is asking is about the case where udev gets an event, 
is told to look in sysfs, and while looking encounters a partially 
described device.


Now that the this won't happen unless... cases, could someone cover 
this and state that it either can't happen because {reason} or that if 
it does the result will be {description}.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.1 on Fedora Core 6 breaks LVM/vgscan

2007-05-26 Thread Bill Davidsen

Jonathan Woithe wrote:

On 21 May 2007 I wrote:


Attempting to compile a 2.6.21.1 kernel for use on a Fedora Core 6 box
results in a panic at boot because the root filesystem can't be found.


I have just compiled 2.6.22-rc2 with the configuration file given in my
previous post and the resulting kernel successfully boots on the machine
concerned.  Whatever broke LVM for this machine in between 2.6.18 and
2.6.21.1 has now been fixed.

I haven't had any problem booting with any of the kernels, but when I 
try to build a kernel with a Fedora config from /boot, it builds fine 
but doesn't boot after install. I started by building a very basic 
kernel for testing, and then started adding features to get everything I 
need. But just using the latest FC6 config file gets me a kernel which 
fails in just the way you mention.



There is still a problem with the CDROM but I will follow up in another
thread about that.

Happy to say I don't see that, I'm using PATA optical devices, and USB 
on some machines, both work. I can't get scanning to work even after 
buying a supported scanner, so I may have to go back to Slackware and 
a 2.4 kernel on one machine, but boot and run does fine.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected

2007-05-26 Thread Bill Davidsen

Jonathan Woithe wrote:

A collegue of mine has an Intel mainboard with the i865 chipset onboard
(DQ965).  All kernels up to and including 2.6.22-rc2 do not detect the IDE
CDROM/DVDROM when booting.  The SATA hard drive is found without any
problems.

Let me belatedly ask if the device shows up in POST at cold boot. It may 
need some BIOS setting to be visible.



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Unify dma blacklist in ide-dma.c and libata-core.c

2007-05-26 Thread Bill Davidsen

Junio C Hamano wrote:

This introduces a shared header file that defines the entries
for two dma blacklists in ide-dma.c and libata-core.c to make it
easier to keep them in sync.

Why wasn't this done this way in the first place? Out of tree 
development for libata or something?


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.21.1] resume doesn't run suspended kernel?

2007-05-26 Thread Bill Davidsen
I was testing susp2disk in 2.6.21.1 under FC6, to support reliable 
computing environment (RCE) needs. The idea is that if power fails, 
after some short time on UPS the system does susp2disk with a time set, 
and boots back every so often to see if power is stable.


No, I don't want susp2mem until I debug it, console come up in useless 
mode, console as kalidescope is not what I need.


Anyway, I pulled the plug on the UPS, and the system shut down. But when 
it powered up, it booted the default kernel rather than the test kernel, 
decided that it couldn't resume, and then did a cold boot.


I can bypass this by making the debug kernel the default, but WHY? Is 
the kernel not saved such that any kernel can be rolled back into memory 
and run? Actually, the answer is HELL NO, so I really ask if this is the 
intended mode of operation, that only the default boot kernel will restore.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Documentation on /sys/power/resume

2007-05-26 Thread Bill Davidsen
Not in the ABI doc, is there and doc at all, and if not could someone 
who knows where it's used might give me a hint, as a quick look didn't 
bring enlightenment. Or is it a future hook which doesn't work yet?


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation on /sys/power/resume

2007-05-27 Thread Bill Davidsen

Rafael J. Wysocki wrote:

Hi,

On Sunday, 27 May 2007 01:51, Bill Davidsen wrote:
Not in the ABI doc, is there and doc at all, and if not could someone 
who knows where it's used might give me a hint, as a quick look didn't 
bring enlightenment. Or is it a future hook which doesn't work yet?


That's something that in theory may allow you to resume the system from
and initrd script.

Basically, you write your resume device's major and minor numbers
into it as the MAJ:MIN string (eg. 8:3 for /dev/sda3 on my box) and the
kernel will try to read the image from this device and restore it.

It only works with partitions and the use of it us discouraged, so it's
deliberately undocumented.

Thanks, that's just different enough from what little info I had to make 
what I have not work. I'm looking at resume from a non-swap location.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] resume doesn't run suspended kernel?

2007-05-27 Thread Bill Davidsen

David Greaves wrote:

Bill Davidsen wrote:

Anyway, I pulled the plug on the UPS, and the system shut down. But when
it powered up, it booted the default kernel rather than the test kernel,
decided that it couldn't resume, and then did a cold boot.


Booting the machine isn't the kernel's job, it's the bootloader's job.

And resume is not the the bootloader's job... if memory and registers 
are restored, and a jump is made to the resume address, a resumed system 
should result. clearly some part of that didn't happen :-(



I can bypass this by making the debug kernel the default, but WHY? Is
the kernel not saved such that any kernel can be rolled back into memory
and run? Actually, the answer is HELL NO, so I really ask if this is the
intended mode of operation, that only the default boot kernel will restore.


Yes.

It is very dangerous to attempt a resume with a different kernel than the one
that has gone to sleep.
Different kernels may be compiled with different options that affect where or
how in-memory structures are saved.

If the mainline resume is depending on that no wonder resume is so 
fragile. User action can change order of module loads, kmalloc calls 
move allocated structures, etc. Counting on anything to be locked in 
place seems naive.



So you suspend with a kernel which holds your filesystem data/cache/inodes at
0x1234000 and restore with a kernel that expects to see your filesystem data at
0x1235000.

Ouch.

I would hope that the data used by the resumed kernel would be the same 
data that was suspended, not something from another kernel.



Personally I think the kernel suspend should write a signature - similar to a
hash of the bzImage - into the suspend image so it won't even attempt a resume
if there's a mismatch. (Yes, I made this mistake once whilst playing with 
suspend).

Someone else dropped a note saying the FC kernels use suspend2, and work 
fine. I'm off to look at the FC source and see if that's the case. That 
would explain why suspend works and resume doesn't, hopefully there's a 
2.6.21 suspend2 patch in that case.


Thanks for the feedback in any case.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation on /sys/power/resume

2007-05-27 Thread Bill Davidsen

Rafael J. Wysocki wrote:

On Sunday, 27 May 2007 14:53, Bill Davidsen wrote:
  

Rafael J. Wysocki wrote:


Hi,

On Sunday, 27 May 2007 01:51, Bill Davidsen wrote:
  
Not in the ABI doc, is there and doc at all, and if not could someone 
who knows where it's used might give me a hint, as a quick look didn't 
bring enlightenment. Or is it a future hook which doesn't work yet?


That's something that in theory may allow you to resume the system from
and initrd script.

Basically, you write your resume device's major and minor numbers
into it as the MAJ:MIN string (eg. 8:3 for /dev/sda3 on my box) and the
kernel will try to read the image from this device and restore it.

It only works with partitions and the use of it us discouraged, so it's
deliberately undocumented.

  
Thanks, that's just different enough from what little info I had to make 
what I have not work. I'm looking at resume from a non-swap location.



Only suspend2 can do this right now.  The built-in swsusp can resume from a
swap file as long as it's not located on LVM.
  
Sounds like sispend2 is still needed, I haven't needed a suspending 
kernel in a few years, and I was hoping that with suspend working in 
mainline that resume would have been implemented. Sounds as if that's 
not the case, my swap is RAID1, I was hoping to resume from one of the 
mirrors, since they are based on a partition. No joy wit or without 
/sys/power/resume, so I'll look further.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] resume doesn't run suspended kernel?

2007-05-27 Thread Bill Davidsen

Pavel Machek wrote:

On Sat 2007-05-26 18:42:37, Bill Davidsen wrote:
  
I was testing susp2disk in 2.6.21.1 under FC6, to 
support reliable computing environment (RCE) needs. The 
idea is that if power fails, after some short time on 
UPS the system does susp2disk with a time set, and boots 
back every so often to see if power is stable.


No, I don't want susp2mem until I debug it, console come 
up in useless mode, console as kalidescope is not what I 
need.


Anyway, I pulled the plug on the UPS, and the system 
shut down. But when it powered up, it booted the default 
kernel rather than the test kernel, decided that it 
couldn't resume, and then did a cold boot.


I can bypass this by making the debug kernel the 
default, but WHY? 



HELL YES :-). We do not save kernel code into image.

  
That's clear, I'll have to use xen or kvm or similar which restores the 
system as suspended. Thanks for the clarification of the limitations.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] resume doesn't run suspended kernel?

2007-05-28 Thread Bill Davidsen

Bill Davidsen wrote:

Pavel Machek wrote:

On Sat 2007-05-26 18:42:37, Bill Davidsen wrote:
 
I was testing susp2disk in 2.6.21.1 under FC6, to support reliable 
computing environment (RCE) needs. The idea is that if power fails, 
after some short time on UPS the system does susp2disk with a time 
set, and boots back every so often to see if power is stable.


No, I don't want susp2mem until I debug it, console come up in 
useless mode, console as kalidescope is not what I need.


Anyway, I pulled the plug on the UPS, and the system shut down. But 
when it powered up, it booted the default kernel rather than the 
test kernel, decided that it couldn't resume, and then did a cold boot.


I can bypass this by making the debug kernel the default, but WHY? 


HELL YES :-). We do not save kernel code into image.

  
That's clear, I'll have to use xen or kvm or similar which restores 
the system as suspended. Thanks for the clarification of the limitations.


Sorry, I wrote that late at night and quickly. I should have said 
design decision rather than limitation, For systems which don't do 
multiple kernels it's not an issue.


I certainly would not have made the same decision, but I didn't write 
the code. It seems more robust to save everything than to try to 
identify what has and hasn't changed in a modular kernel.


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-28 Thread Bill Davidsen

Neil Brown wrote:

We can think of there being three types of devices:
 
1/ SAFE.  With a SAFE device, there is no write-behind cache, or if
  there is it is non-volatile.  Once a write completes it is 
  completely safe.  Such a device does not require barriers

  or -issue_flush_fn, and can respond to them either by a
  no-op or with -EOPNOTSUPP (the former is preferred).

2/ FLUSHABLE.
  A FLUSHABLE device may have a volatile write-behind cache.
  This cache can be flushed with a call to blkdev_issue_flush.
  It may not support barrier requests.

3/ BARRIER.
  A BARRIER device supports both blkdev_issue_flush and
  BIO_RW_BARRIER.  Either may be used to synchronise any
  write-behind cache to non-volatile storage (media).

Handling of SAFE and FLUSHABLE devices is essentially the same and can
work on a BARRIER device.  The BARRIER device has the option of more
efficient handling.
  

There are two things I'm not sure you covered.

First, disks which don't support flush but do have a cache dirty 
status bit you can poll at times like shutdown. If there are no drivers 
which support these, it can be ignored.


Second, NAS (including nbd?). Is there enough information to handle this 
really rigt?


Otherwise looks good as a statement of issues. It seems to me that the 
filesystem should be able to pass the barrier request to the block layer 
and have it taken care of, rather than have code in each f/s to cope 
with odd behavior.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


What causes iowait other than waiting for i/o?

2007-05-28 Thread Bill Davidsen
I recently noted that my system was spending a lot of time in i/o wait 
when doing some tasks which I thought didn't involve i/o, as noted by 
the lack of disk light activity most of the time. I thought of network, 
certainly the NIC had no activity for this job. So I set up a little 
loop to capture all disk i/o and network activity (including loopback). 
That was no obvious help, and the program doesn't use pipes.


At this point I'm really curious, does someone have a good clue?

Note: I don't think this is a bug or performance issue, unless the 
kernel is doing something and charging time to iowait instead of system 
I don't see anything to fix, but I would like to understand.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.22-rc3

2007-05-28 Thread Bill Davidsen

Jeff Garzik wrote:


Several people have reported LITE-ON LTR-48246S detection failed
because SETXFER fails.  It seems the device raises IRQ too early after
SETXFER.  This is controller independent.  The same problem has been
reported for different controllers.

So, now we have pata_via where the controller raises IRQ before it's
ready after SETXFER and a device which does similar thing.  This patch
makes libata always execute SETXFER via polling.  As this only happens
during EH, performance impact is nil.  Setting ATA_TFLAG_POLLING is
also moved from issue hot path to ata_dev_set_xfermode() - the only
place where SETXFER can be issued.

Note that ATA_TFLAG_POLLING applies only to drivers which implement
SFF TF interface and use libata HSM.  More advanced controllers ignore
the flag.  This doesn't matter for this fix as SFF TF controllers are
the problematic ones.

Not only kills two birds with a single store, but will avoid having to 
re-solve the problem at sometime in the future. That's good software!


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: stuff ready to be deleted?

2007-05-28 Thread Bill Davidsen

Oliver Pinter wrote:

+ open sound system


Why?

OSS supports some hardware ALSA doesn't, it's maintained by an 
independent commercial company (4Front) so maintenance isn't an issue, 
and it's portable to many other operating systems.


Functionality and low TCO, what could be better?

New Linux code, including x86_64 3D drivers, was released in April, so 
there's no lack of new features and activity.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What causes iowait other than waiting for i/o?

2007-05-29 Thread Bill Davidsen

Satyam Sharma wrote:

Hi Bill,

On 5/29/07, Bill Davidsen [EMAIL PROTECTED] wrote:

I recently noted that my system was spending a lot of time in i/o wait
when doing some tasks which I thought didn't involve i/o, as noted by
the lack of disk light activity most of the time. I thought of network,
certainly the NIC had no activity for this job. So I set up a little
loop to capture all disk i/o and network activity (including loopback).
That was no obvious help, and the program doesn't use pipes.

At this point I'm really curious, does someone have a good clue?

Note: I don't think this is a bug or performance issue, unless the
kernel is doing something and charging time to iowait instead of system
I don't see anything to fix, but I would like to understand.


What tool / kernel instrumentation / mechanism are you using to
determine that some task(s) are indeed blocked waiting for i/o? Perhaps
some userspace process accounting tools could be broken in the sense
that they generalize all uninterruptible sleep as waiting for i/o ...


I wouldn't expect /proc/stat and similar to be broken in that way, but 
If no one has a better idea I guess I will assume there's a check needed 
of where time is added to iowait. I was hoping to avoid a full kernel 
search. Never thought of /proc data as a user space tool, but I guess.


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What causes iowait other than waiting for i/o?

2007-05-29 Thread Bill Davidsen

Rik van Riel wrote:

Bill Davidsen wrote:
I recently noted that my system was spending a lot of time in i/o 
wait when doing some tasks which I thought didn't involve i/o, as 
noted by the lack of disk light activity most of the time. I thought 
of network, certainly the NIC had no activity for this job. So I set 
up a little loop to capture all disk i/o and network activity 
(including loopback). That was no obvious help, and the program 
doesn't use pipes.


At this point I'm really curious, does someone have a good clue?

Note: I don't think this is a bug or performance issue, unless the 
kernel is doing something and charging time to iowait instead of 
system I don't see anything to fix, but I would like to understand.


All filesystem IO and direct disk IO can cause iowait.

This includes NFS activity.

If I didn't note it before, I'm read the the data from /proc, cpustats, 
net/dev, and diskstats. I assume that all i/o would show up in one of 
those places. NFS isn't involved, although this machine is a fileserver 
as a side job the modules weren't even loaded during testing.


A puzzlement for future consideration. If I get a chance later this week 
I'll make a pretty graphic of all the stuff going on when the iowait 
spiked, ctx rate, inq rate, hell the last time I even grabbed the CPU 
temp to see if it told me anything (didn't, thermal throttling NOT).


Thanks for the feedback, I think that lets out the obvious stuff.

--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.1 - 97% wait time on IDE operations

2007-05-29 Thread Bill Davidsen

Tommy Vercetti wrote:
Hi folks, 


I was trying to get answer to my question around, but no one knows.
I do have DMA turned on, etc, yet - on extensive harddrive operations wait 
time is 90+% , which means that machine is waiting, rather than doing 
something meanwhile. (I guess). 
Can someone describe to me , in more detail why is that happening, and what 
steps should I consider to avoid it ? I couldn't find any answers that would 
have help me on net.


thanks.


From later posts I suspect that your disk performance just sucks, but 
do  use one of the monitoring tools and follow the actual disk work, in 
term of seeks and transfer rate.


I've been looking at high iowait while disk and network are (nearly) 
idle, but it sounds as if you just have a bad match of CPU and disk speed.


Can you borrow a USB drive to use for some testing? USB 2 needed to be 
faster than your old 4200 rpm drive.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-13 Thread Bill Davidsen

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:

I've taken mainline git tree (freshly integrated CFS!) out for a 
multimedia spin.  I tested watching movies and listenign to music in 
the presence of various sleep/burn loads, pure burn loads, and mixed 
loads. All was peachy here.. I saw no frame drops or sound skips or 
other artifacts under any load where the processor could possibly 
meet demand.
I would agree with preliminary testing, save that if you get a lot of 
processes updating the screen at once, there seems to be a notable 
case of processes getting no CPU for 100-300ms, followed by a lot of 
CPU.


I see this clearly with the glitch1 test with four scrolling xterms 
and glxgears, but also watching videos with little busy processes on 
the screen. The only version where I never see this in test or with 
real use is cfs-v13.


just as a test, does this go away if you:

renice -20 pidof `Xorg`

i.e. is this connected to the way X is scheduled?

Doing this slows down the display rates, but doesn't significantly help 
the smoothness of the gears.


Another thing to check would be whether it goes away if you set the 
granularity to some really finegrained value:


echo 0  /proc/sys/kernel/sched_wakeup_granularity_ns
echo 50  /proc/sys/kernel/sched_granularity_ns

this really pushes things - but it tests the theory whether this is 
related to granularity.


I didn't test this with standard Xorg priority, I should go back and try 
that. But it didn't really make much difference. The gears and scrolling 
xterms ran slower with Xorg at -20 with any sched settings. I'll do that 
as soon as a build finishes and I can reboot.


I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors 
with FC6. Automount starts taking 30% of CPU (unused at the moment), the 
sensors applet doesn't work, etc. I hope over the weekend I can get bug 
reports out on all this, but there are lots of non-critical oddities.

Ingo



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Bill Davidsen

Ingo Molnar wrote:

* Chuck Ebbert [EMAIL PROTECTED] wrote:

  

On 07/13/2007 05:19 PM, Bill Davidsen wrote:


I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
with FC6. Automount starts taking 30% of CPU (unused at the moment)
  
Can you confirm whether CFS is involved, i.e. does it spin like that 
even without the CFS patch applied?



  
I will try that, but not until Tuesday night. I've been here too long 
today and have an out-of-state meeting tomorrow. I'll take a look after 
dinner. Note that the latest 2.6.21 with cfs-v19 doesn't have any 
problems of any nature, other than suspend to RAM not working, and I may 
have the config wrong. Runs really well otherwise, but I'll test drive 
2.6.22 w/o the patch.


hmmm  could you take out the kernel/time.c (sys_time()) changes from 
the CFS patch, does that solve the automount issue? If yes, could 
someone take a look at automount and check whether it makes use of 
time(2) and whether it combines it with finer grained time sources?


  

Will do.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Bill Davidsen

Ingo Molnar wrote:

* Ian Kent [EMAIL PROTECTED] wrote:

  
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;
  

OK, I'm with you, hi-res timer.
But even so, how is the time in the past after adding a second.

Is it because I'm not setting tv_nsec when it's close to a second 
boundary, and hence your recommendation above?



yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
1 second resolution timesource. I dont yet understand the failure mode 
though that results in that looping and in the 30% CPU time use - do you 
understand it perhaps? (and automount is still functional while this is 
happening, correct?)
  


Can't say, I have automount running because I get it by default, but I 
have nothing using at on my test machine. Why is it looping so fast when 
there are no mount points defined? If the config changes there's no 
requirement to notice right away, is there?


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-18 Thread Bill Davidsen

Ingo Molnar wrote:

* Ian Kent [EMAIL PROTECTED] wrote:

  
ah! It passes in a low-res time source into a high-res time 
interface (pthread_cond_timedwait()). Could you change the 
time(NULL) + 1 to time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;

does this solve the spinning?

Yes, adding in the offset within the current second appears to resolve 
the issue. Thanks Ingo.



i'm wondering how widespread this is. If automount is the only app 
doing this then _maybe_ we could get away with it by changing 
automount?

I don't think the change is unreasonable since I wasn't using an 
accurate time in the condition wait, so that's a coding mistake on my 
part which I will fix.



thanks Ian for taking care of this and for fixing it!

Linus, Thomas, what do you think, should we keep the time.c change? 
Automount is one app affected so far, and it's a borderline case: the 
increased (30%) CPU usage is annoying, but it does not prevent the 
system from working per se, and an upgrade to a fixed/enhanced automount 
version resolves it.


The temptation of using a really (and trivially) scalable low-resolution 
time-source (which is _easily_ vsyscall-able, on any platform) for DBMS 
use is really large, to me at least. Should i perhaps add a boot/config 
option that enables/disables this optimization, to allow distros finer

grained control about this? And we've also got to wait whether there's
any other app affected.
  
Allow it to be selected by the features so that admins can evaluate 
the implications without a reboot?  That would be a convenient interface 
if you could provide it.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-18 Thread Bill Davidsen

Linus Torvalds wrote:

On Tue, 17 Jul 2007, Ingo Molnar wrote:
  

* Ian Kent [EMAIL PROTECTED] wrote:


In several places I have code similar to:

wait.tv_sec = time(NULL) + 1;
wait.tv_nsec = 0;
  


Ok, that definitely should work.

Does the patch below help?

  
Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after that. 
Since the automount issue doesn't seem to start until something kicks it 
off, I didn't see it but that doesn't mean it's fixed.
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;



This is wrong. It's wrong for two reasons:

 - it really shouldn't be needed. I don't think time() has to be 
   *exactly* in sync, but I don't think it can be off by a third of a 
   second or whatever (as the 30% CPU load would seem to imply)


 - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
   timespec.


So if it actually makes a difference, it makes a difference for the 
*wrong* reason: the time is still totally nonsensical in the tv_nsec field 
(because it actually got filled in with msecs!), but now the tv_sec field 
is in sync, so it hides the bug.


Anyway, hopefully the patch below might help. But we probably should make 
this whole thing a much more generic routine (ie we have our internal 
getnstimeofday() that still is missing the second-overflow logic, and 
that is quite possibly the one that triggers the 30% off behaviour).


  

Hope that info helps.


Ingo, I'd suggest:
 - ger rid of timespec_add_ns(), or at least make it return a return 
   value for when it overflows.
 - make all the people who overflow into tv_sec call a fix_up_seconds() 
   thing that does the xtime overflow handling.


Linus
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-19 Thread Bill Davidsen

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as I 
recreate it.


Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Ingo

---
Subject: time: introduce xtime_seconds
From: Ingo Molnar [EMAIL PROTECTED]

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)


this way high-resolution time results update their seconds component at 
the same time sys_time() does it:


 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]


instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   28 
 3 files changed, 41 insertions(+), 25 deletions(-)

Index: linux/include/linux/time.h
===
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);

 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);

+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds)  0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME		(current_kernel_time())

-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);

 extern int do_settimeofday(struct timespec *tv);
Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
 	smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
 	do {

seq = read_seqbegin(xtime_lock);
-   
+
now = xtime;
} while (read_seqretry(xtime_lock, seq));
 
-	return now; 
+	return now;

 }
 
 EXPORT_SYMBOL(current_kernel_time);

@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv-tv_sec = sec;
tv-tv_usec = usec;
 
-	/*

-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv-tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(xtime_lock, flags);
-   update_wall_time();
-   write_sequnlock_irqrestore(xtime_lock, flags

Re: [patch] CFS scheduler, -v19

2007-07-19 Thread Bill Davidsen

Bill Davidsen wrote:

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
I recreate it.


Applied to 2.6.22-git9, building now.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v19

2007-07-19 Thread Bill Davidsen

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?
Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Okay, as noted off-list, after I exported the xtime_seconds it now 
builds and works. However, there are a *lot* of section mismatches 
which are not reassuring.


Boots, runs, glitch1 test runs reasonably smoothly. automount has not 
used significant CPU yet, but I don't know what triggers it, the bad 
behavior did not happen immediately without the patch. However, it looks 
very hopeful.


Warnings attached to save you the trouble...

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
Script started on Thu 19 Jul 2007 05:29:08 PM EDT
Common profile 1.13 lastmod 2006-01-04 22:43:25-05
No common directory available
Session time 17:29:08 on 07/19/07
posidon:davidsen time nice -10 make -j4 -s; sleep 2; exit
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CHK include/linux/compile.h
  CHK include/linux/compile.h
  UPD include/linux/compile.h
  CHK include/linux/version.h
  Building modules, stage 2.
WARNING: vmlinux(.text+0xc1001183): Section mismatch: reference to 
.init.text:start_kernel (between 'is386' and 'check_x87')
WARNING: vmlinux(.text+0xc1213fb4): Section mismatch: reference to .init.text: 
(between 'rest_init' and 'kthreadd_setup')
WARNING: vmlinux(.text+0xc1218786): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1218792): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc121879e): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc12187aa): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1214071): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'alloc_node_mem_map' and 
'zone_wait_table_init')
WARNING: vmlinux(.text+0xc1214117): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'zone_wait_table_init' and 'schedule')
WARNING: vmlinux(.text+0xc10fbaae): Section mismatch: reference to 
.init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta')
WARNING: vmlinux(.text+0xc1218eda): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
Root device is (253, 0)
Setup is 11240 bytes (padded to 11264 bytes).
System is 1915 kB
Kernel: arch/i386/boot/bzImage is ready  (#3)

real4m11.024s
user2m5.121s
sys 0m30.952s
exit

Script done on Thu 19 Jul 2007 05:33:35 PM EDT


[RFC] what should 'uptime' be on suspend?

2007-07-19 Thread Bill Davidsen
I just found a machine which will resume after suspend to memory, using 
the mainline kernel (no suspend2 patch).


On resume I was looking at the uptime output, and it was about six 
minutes, FAR longer than the time since resume. So the topic for 
discussion is, should the uptime be

- time sine the original boot
- total uptime since first boot, not counting the time suspended
- time since resume
- some other time around six minutes

Any of the first three could be useful and right for some casesm thus 
discussion invited.


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Where did KVM go in 2.6.22-git9?

2007-07-19 Thread Bill Davidsen
I just built a 2.6.22-git9 kernel, and when I run oldconfig it (a) sets 
the processor type to pentium-pro, and (b) the KVM stuff simply isn't in 
the config. Before I spend a lot of time on this, was it disabled 
temporarily for some reason, or is it a known bug, or ???


Processor is a Core2 E6600, and the starting config has KVM.

Strong suggestion: put KVM in processor type and options, and if the CPU 
type selected supports the feature, let the builder turn it on in one 
place and have Kconfig turn on whatever voodoo is needed to allow it, 
rather than have people trying to find out what depends have changed 
with each release. I see KVm depends on X86_CMPXCHG64 which simply 
doesn't seem to be defined directly anywhere.


Going back to 2.6.21.6 until whatever changed is at least documented.

--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mkfs.ext2 triggerd RAM corruption

2007-05-07 Thread Bill Davidsen

Jan-Benedict Glaw wrote:

On Fri, 2007-05-04 16:59:51 +0200, Bernd Schubert [EMAIL PROTECTED] wrote:
To see whats going on, I copied the entire / (so the initrd) into a tmpfs 
root, chrooted into it, also bind mounted the main / into this chroot and 
compared several times /bin of chroot/bin and the bind-mounted /bin while the 
mkfs.ext2 command was running.


beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/sleep and /oldroot/bin/sleep differ
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
Binary files /bin/cat and /oldroot/bin/cat differ
...

Also tested different schedulers, at least happens with deadline and 
anticipatory.


The corruption does NOT happen on running the mkfs command on /dev/sda1, but 
happens with sda2, sda3 and sda3. Also doesn't happen with extended 
partitions of sda1.


Is sda2 the largest filesystem out of sda2, sda3 (and the logical
partitions within the extended sda1, if these get mkfs'ed, too)?

I'm not too sure that this is a kernel bug, but probably a bad RAM
chip. Did you run memtest86 for a while? ...and can you reproduce this
problem on different machines?

MfG, JBG

Was this missing from your copy of the original post, or did you delete 
it without reading? Note last sentence...


Summary: The system ramdisk (initrd) gets corrupted while running 
mkfs.ext2 on a local sata disk partition.


Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (2.6.16 
doesn't run on any of the systems I can do tests with). Please note:

I could reproduce this on serveral systems, all of them use ECC
memory and the memory of most of them the memory is monitored using
EDAC.



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Preempt of BKL and with tickless systems

2007-05-08 Thread Bill Davidsen
I think I have a reasonable grip on the voluntary and full preempt 
models, can anyone give me any wisdom on the preempt of the BKL? I know 
what it does, the question is where it might make a difference under 
normal loads. Define normal as servers and desktops.


I've been running some sched tests, and it seems to make little 
difference how that's set. Before I run a bunch of extra tests, I 
thought I'd ask.



New topic: I have found preempt, both voluntary and forced, seems to 
help more with response as the HZ gets smaller. How does that play with 
tickless operation, or are you-all waiting for me to run my numbers with 
all values of HZ and not, and tell the world what I found? ;-)


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Some CFS and sd04[68] results for kernel build

2007-05-08 Thread Bill Davidsen

I have a results page here, I will repeat tests with tuning if asked.

http://www.tmr.com/~davidsen/Kernel%20build%20time%20results.html

--
bill davidsen [EMAIL PROTECTED]
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Preempt of BKL and with tickless systems

2007-05-10 Thread Bill Davidsen

Lee Revell wrote:

On 5/8/07, Bill Davidsen [EMAIL PROTECTED] wrote:

I think I have a reasonable grip on the voluntary and full preempt
models, can anyone give me any wisdom on the preempt of the BKL? I know
what it does, the question is where it might make a difference under
normal loads. Define normal as servers and desktops.


This was introduced by Ingo to solve a real problem that I found,
where some codepath would hold the BKL for long enough to introduce
excessive scheduling latencies - search list archive for details.  But
I don't remember the code path (scrolling the FB console?  VT
switching? reiser3?  misc. ioctl()s?).  Basically, taking the BKL
disabled preemption which caused long latencies.

It's certainly possible that whatever issue led to this was solved in
another way since.

Anything is possible. I feel that using voluntary + bkl is probably good 
for most servers, forced preempt for desktop, although it really doesn't 
seem to do much beyond voluntary.


Thanks for the clarification.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] volatile considered harmful document

2007-05-10 Thread Bill Davidsen

Jonathan Corbet wrote:


+There are still a few rare situations where volatile makes sense in the
+kernel:
+
+  - The above-mentioned accessor functions might use volatile on
+architectures where direct I/O memory access does work.  Essentially,
+each accessor call becomes a little critical section on its own and
+ensures that the access happens as expected by the programmer.
+
+  - Inline assembly code which changes memory, but which has no other
+visible side effects, risks being deleted by GCC.  Adding the volatile
+keyword to asm statements will prevent this removal.
+
+  - The jiffies variable is special in that it can have a different value
+every time it is referenced, but it can be read without any special
+locking.  So jiffies can be volatile, but the addition of other
+variables of this type is frowned upon.  Jiffies is considered to be a
+stupid legacy issue in this regard.


It would seem that any variable which is (a) subject to change by other 
threads or hardware, and (b) the value of which is going to be used 
without writing the variable, would be a valid use for volatile.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] volatile considered harmful document

2007-05-13 Thread Bill Davidsen

Krzysztof Halasa wrote:

Robert Hancock [EMAIL PROTECTED] writes:

  

You don't need volatile in that case, rmb() can be used.



rmb() invalidates all compiler assumptions, it can be much slower.
  
Yes, why would you use rmb() when a read of a volatile generates optimal 
code?


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3 years since last 2.2 release, why still on kernel.org main page?

2007-05-13 Thread Bill Davidsen

Rob Landley wrote:
Out of curiosity, since 2.2 hasn't had a release in 3 years, and the last 
prepatch was 2 years ago, why is its' status still on the kernel.org main 
page?


Not exactly something people are checking the status of on a daily basis...

Just wondering...

I assume because it's a handy place to the the current 2.2 kernel, which 
some people run for reasons which are valid to them.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Use platform mode by default

2007-05-13 Thread Bill Davidsen

Rafael J. Wysocki wrote:

On Friday, 11 May 2007 18:30, Linus Torvalds wrote:

On Fri, 11 May 2007, Rafael J. Wysocki wrote:

We're working on fixing the breakage, but currently it's difficult, because
none of my testboxes has problems with the 'platform' hibernation and I
cannot reproduce the reported issues.

The rule for anything ACPI-related has been: no regressions.

It doesn't matter if something fixes 10 boxes, if it breaks a single one, 
it's going to get reverted.


[Well, I think I should stop explaining decisions that weren't mine.  Yet, I
feel responsible for patches that I sign-off.]

Just to clarify, the change in question isn't new.  It was introduced by the
commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's
request and with Pavel's acceptance.

We had much too much of the two steps forward, one step back dance with 
ACPI a few years ago, which is the reason that rule got installed (and 
which is why it's ACPI-only: in some other subsystems we accept the fact 
that sometimes we don't know how to fix some hardware issue, but the new 
situation is at least better than the old one).


I agree that it can be aggravating to know that you can fix a problem for 
some people, but then being limited by the fact that it breaks for others. 
But beign able to *rely* on something that used to work is just too 
important, and with ACPI, you can never make a good judgement of which way 
works better (since it really just depends on some random firmware issues 
that we have zero visibility into).


Also, quite often, it may *seem* like something fixes more boxes than it 
breaks, but it's because people report *breakage* only, and then a few 
months later it turns out that it's exactly the other way around: now it's 
a hundred people who report breakage with the *new* code, and the reason 
people thought it fixed more than it broke was that the people for whom 
the old code worked fine obviously never reported it!


So this is why a single regression is considered more important than ten 
fixes - because a single regressionr report tends to actually be just the 
first indication of a lot of people who simply haven't tested the new code 
yet! People for whom the old code is broken are more likely to test new 
things.


So I'd just suggest changing the default back to PM_DISK_SHUTDOWN (but 
leave the pm_ops-enter testing in place - ie not reverting the other 
commits in the series).


The series actually preserves the 2.6.20/21 behavior.  By defaulting back to
PM_DISK_SHUTDOWN, we'll cause some users for whom 2.6.20 and 2.6.21 work to
report this change as a regression, so please let me avoid making this decision
(I'm not the maintainer of the hibernation code after all).

The problem is that we don't know about regressions until somebody reports them
and if that happens after two affected kernel releases, what should we do?

I think that one of the reasons people (guilty) don't report problems 
with suspend and hibernate is that it's been a problem on and off and 
when it breaks people don't bother to chase it, they just don't use it 
unless it's critical, or they install suspend2.


I only suggest that if 'platform' is more correct use that, don't change 
it again. Then fix platform.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: upgrade linux kernel

2007-05-13 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:

I am upgrading kernel from 2.4.20-8(default in RH9) to 2.6.xx. when I do
make command it gives some output and finally get error saying that,

BFD: Warning: Writing section '.bss' to huge ( ie negative) file offset
0xc0244000. 

 Objcopy: arch/i386/boot/compressed/vmlinux.bin: file truncated

make[2]:***[ arch/i386/boot/compressed/vmlinux.bin] Error 1

 make[1]: ***[ arch/i386/boot/compressed/vmlinux] Error 2

 make: *** [bzImage] Error 2

You can upgrade a bunch of system utilities, but I'm not sure it's worth 
doing. The system I'm on is RH8.0 patched to run later kernels when my 
development machine went down. I got 2.5.52 to boot, last stable was 
2.5.47-ac6 and I gave up.


Unless you have some major need to upgrade the kernel without the 
distribution, grab the latest RH kernel, or use the latest 2.4 kernel 
available.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: undeprecate raw driver.

2007-05-14 Thread Bill Davidsen

Robert P. J. Day wrote:

On Sun, 13 May 2007, Jan Engelhardt wrote:


On May 13 2007 12:32, Dave Jones wrote:


Despite repeated attempts over the last two and half years, this
driver seems somewhat persistant.  Remove its deprecated status as
it has existing users who may not be in a position to migrate their
apps

At least keep the it's obsolete Kconfig description. We don't want
new users/projects to jump on /dev/raw.


i just *know* this is a mistake, but i'm going to take one more shot
at distinguishing between deprecated and obsolete.

as i understand it, the raw driver is *deprecated*.  that is, it's
still there, it's still supported, people are still using it but its
use is *seriously* discouraged and everyone should be trying to move
off of it at their earliest possible convenience.

that is *not* the same as obsolete which should mean that that
feature is dead, dead, DEAD and *no one* should be using it anymore.

yes, i realize it sounds like splitting hairs, but it's this malleable
definition of deprecated that's causing all of this trouble in the
first place -- the fact that the raw driver is currently listed as
obsolete when it is, in fact, only deprecated.

in short, do *not* remove its deprecated status.  rather, remove its
obsolete status and *make* it deprecated.

Correct. Like the weird lady next door who fancies you, it's old, it's 
ugly, but it's not likely to go away any time soon.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] volatile considered harmful document

2007-05-14 Thread Bill Davidsen

Jeff Garzik wrote:

On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote:
  

Krzysztof Halasa wrote:


Robert Hancock [EMAIL PROTECTED] writes:
  

You don't need volatile in that case, rmb() can be used.



  

rmb() invalidates all compiler assumptions, it can be much slower.
  


It does not invalidate /all/ assumptions.


  
Yes, why would you use rmb() when a read of a volatile generates optimal 
code?



Read of a volatile is guaranteed to generate the least optimal code.
That's what volatile does, guarantee no optimization of that particular
access.
  
By optimal you seem to mean generate fewer CPU cycles by risking use of 
an obsolete value, while by the same term I mean read the correct and 
current value from the memory location without the overhead of locks. If 
your logic doesn't require the correct value, why read it at all? And if 
it does, how fewer cycles and cache impact can anything have than a 
single register load from memory?


Locks are useful when the value will be changed by a thread, or when the 
value must not be changed briefly. That's not always the case.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Bill Davidsen

Jeff Zheng wrote:

Here is the information of the created raid0. Hope it is enough.

  
If I read this correctly, the problem is with JFS rather than RAID? Have 
you tried not mounting the JFS filesystem but just starting the array 
which crashes, so you can read bits of it, etc, and verify that the 
array itself is working?


And can you run an fsck on the filesystem, if that makes sense? I assume 
you got to actually write a f/s at one time, and I've never used JFS 
under Linux. I spent five+ years using it on AIX, though, complex but 
robust.

The crashing one:
md: bindsdd
md: bindsde
md: raid0 personality registered for level 0
md0: setting max_sectors to 4096, segment boundary to 1048575
raid0: looking at sde
raid0:   comparing sde(5859284992) with sde(5859284992)
raid0:   END
raid0:   == UNIQUE
raid0: 1 zones
raid0: looking at sdd
raid0:   comparing sdd(5859284992) with sde(5859284992)
raid0:   EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 11718569984 blocks.
raid0 : conf-hash_spacing is 11718569984 blocks.
raid0 : nb_zone is 2.
raid0 : Allocating 8 bytes for hash.
JFS: nTxBlock = 8192, nTxLock = 65536

The working one:
md: bindsde
md: bindsdf
md: bindsdg
md: bindsdd
md0: setting max_sectors to 4096, segment boundary to 1048575
raid0: looking at sdd
raid0:   comparing sdd(2929641472) with sdd(2929641472)
raid0:   END
raid0:   == UNIQUE
raid0: 1 zones
raid0: looking at sdg
raid0:   comparing sdg(2929641472) with sdd(2929641472)
raid0:   EQUAL
raid0: looking at sdf
raid0:   comparing sdf(2929641472) with sdd(2929641472)
raid0:   EQUAL
raid0: looking at sde
raid0:   comparing sde(2929641472) with sdd(2929641472)
raid0:   EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 11718565888 blocks.
raid0 : conf-hash_spacing is 11718565888 blocks.
raid0 : nb_zone is 2.
raid0 : Allocating 8 bytes for hash.
JFS: nTxBlock = 8192, nTxLock = 65536

-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 16 May 2007 12:04 p.m.

To: Michal Piotrowski
Cc: Jeff Zheng; Ingo Molnar; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
Subject: Re: Software raid0 will crash the file-system, when each disk
is 5TB

On Wednesday May 16, [EMAIL PROTECTED] wrote:
  

Anybody have a clue?

  


No...
When a raid0 array is assemble, quite a lot of message get printed
about number of zones and hash_spacing etc.  Can you collect and post
those.  Both for the failing case (2*5.5T) and the working case
(4*2.55T) is possible.
  



--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Scheduler responsiveness under load

2007-05-17 Thread Bill Davidsen
I am just running a new series of response tests, which I expect to send 
to the list today or tomorrow. It includes operating at some high 
(LA20) loads, and gathering reproducible statistics. In the process I 
used the file completion feature while load was high, and noted that 
with sd0.48 typing the first characters and hitting tab was VERY slow 
compared to cfs12 or even fc6 recent release. The directory had about a 
dozen files, there was only one match, and I was just saving typing a 
long filename.


This was tested over three boots of sd0.48, cfs12, and fc6, as well as 
one boot of cfs9, and the problem was only with the most recent sd 
kernel I have built.


Hardware: Intel Core2duo 6600, 2.4MHz, 2GB mem, 600GB RAID5, kernel 
2.6.21 build with make -j20 to generate the load.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Scheduling tests on IPC methods, fc6, sd0.48, cfs12

2007-05-17 Thread Bill Davidsen
I have posted the results of my initial testing, measuring IPC rates 
using various schedulers under no load, limited nice load, and heavy 
load at nice 0.


http://www.tmr.com/~davidsen/ctxbench_testing.html

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] volatile considered harmful, take 3

2007-05-17 Thread Bill Davidsen

Satyam Sharma wrote:


*Unfortunately* (the trouble with C itself, is that a *committee* has made
it into ... something ... that it should not have made it into) -- anyway,
unfortunately C took it upon itself to solve a problem that it did not
have (and does not even bring about) in the first place: and the
half-hearted (or vague, call it what you will) attempt _then_ ends up
being a problem -- by making people _feel_ as if they are doing things
right, when that is probably not the case.

[ And we've not even touched the issue of whether the _same_ compiler's
implementation of volatile across archs/platforms is consistent. ]

Pardon, I was GE's representative to the original X3J11 committee, and 
'volatile' was added to codify existing practice which is one of the 
goals of a standard. The extension existed, in at least two forms, to 
allow handling of memory mapped hardware. So the committee did not take 
it upon itself, it was a part of the defined duty of the committee.


The intents was simple, clear, and limited, to tell the compiler that 
every read of a variable in source code should result in a read, at that 
point in the logic, and similar for writes. In other words, the code 
should not be moved and should generate a real memory access every time. 
People have tried to do many things with that limited concept since, 
some with clarification and some with assuming the compiler knows when 
to ignore volatile.


As someone noted about a committee, a committee is a poor way to get 
innovation, and a good way to have a bunch of know legible people shoot 
down bad ideas.


It was a fun experience, where I first learned the modern equivalent of 
Occam's Razor, Plauger's Law of least astonishment, which compiler 
writers regularly violate :-(


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] 2.6.22-rc1: known regressions

2007-05-17 Thread Bill Davidsen

Jean Delvare wrote:

Hi Michal,

On Sun, 13 May 2007 20:14:45 +0200, Michal Piotrowski wrote:

I2C

Subject: Sensors Applet give an error message No chip detected
References : http://lkml.org/lkml/2007/5/13/109
Submitter  : Antonino Ingargiola [EMAIL PROTECTED]
Status : Unknown


There is currently zero proof that this has anything to do with I2C.

I believe in another thread this has been traced to a change in the 
interface and can be solved with an upgrade for the applet.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


glitch1 v1.6 script update and results on cfs-v13

2007-05-18 Thread Bill Davidsen
The glitch1 script has been vastly updated, and now runs by itself after 
being started. It produces files with the fps from glxgears and a 
fairloops file which indicates the number of loops for each of the 
scrolling xterms. This gives a good indication of fairness, all 
processes should have about the same number of loops.


Testing 2.6.21.1-cfs-v13:

Using all default settings, all four processes ran the same number of 
loops over 40sec within about 8%. I'll have some neat results with 
standard deviation by the end of the weekend, it's supposed to rain. 
Visual inspection of the glxgears while running looked smooth as a 
baby's ass.


Current self-running script attached, I'm writing a doc, hopefully if 
you want to tune it the comments are clear.


*Note*: these values make sense when various schedulers and tuning 
values are run on the same machine. So I'll be testing on three 
machines, with dual-core, hyperthreaded uni, and pure uni. Unless I see 
a hint that one of these cases is handled less well than the others I 
won't compare.


--
Bill Davidsen
  He was a full-time professional cat, not some moonlighting
ferret or weasel. He knew about these things.


glitch1.sh
Description: Bourne shell script


Re: [Bugme-new] [Bug 8479] New: gettimeofday returning 1000000 in tv_usec on core2duo

2007-05-19 Thread Bill Davidsen

Andrew Morton wrote:

On Tue, 15 May 2007 08:06:52 +0200 Eric Dumazet [EMAIL PROTECTED] wrote:


Andrew Morton a écrit :

On Mon, 14 May 2007 21:17:47 -0700 [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=8479

   Summary: gettimeofday returning 100 in tv_usec on core2duo
Kernel Version: 2.6.21
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Most recent kernel where this bug did *NOT* occur: 2.6.20
Distribution: Gentoo
Hardware Environment: core2duo T7200 (all reporters had this same CPU)
Software Environment: Linux 2.6.21, glibc 2.5
Problem Description:

gettimeofday returns 1 - 100 in tv_usec, not 0 - 99 
This does not happen on any of my AMD-based 32 or 64 bit boxes, only on my

core2duo; I have 2 other reports of this problem, all on T7200's

Steps to reproduce:

call gettimeofday a lot.  Eventually, you'll get 100 returned in tv_usec. My
average is ~1 in 100 calls.  I've attached my test program, with output from
various boxes.  One of the other reporters tried the test program too, and got
similar output.  .config will be attached too.

err, whoops.

I remember I already hit this and corrected it

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=arch/x86_64/kernel/vsyscall.c;h=dc32cef961950915fbaa185e36ab802d5f7cea3b;hp=ba330f87067996a17495f7d03466d646c718b52c;hb=c8118c6c07f2edfd697aaa0b93e08c3b65a5a675;hpb=272a3713bb9e302e0455c894c41180a482d2c8a3


Oh, OK.


Maybe a stable push is necessary ?


yup.  Please always think of -stable when preparing fixes.  I'm sure many
useful fixes are slipping past simply because those who _are_ looking out
for backportable fixes are missing things.

That makes me feel better, I have been occasionally suggesting fixes 
posted here as candidates for stable, I was afraid I was being a PITA. I 
forgot about the stable address and have been bugging greg, I'll stop 
that.



Greg, Chris: please consider c8118c6c07f2edfd697aaa0b93e08c3b65a5a675
for -stable, if it isn't already there.



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Kevin Bowling wrote:

On 5/16/07, David Woodhouse [EMAIL PROTECTED] wrote:

On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote:

 My experience is that no matter which name I pick, people will
 complain
 anyway.  Previous suggestions included:
 jffs3
 jefs
 engelfs
 poofs
 crapfs
 sweetfs
 cutefs
 dynamic journaling fs - djofs
 tfsfkal - the file system formerly known as logfs

Can we call it jörnfs? :)


However if Jörn is accused of murder, it will have little chance of
being merged :-).


WRT that, seems that Nina had a lover who is a confessed serial killer. 
I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law 
and Order' like other high profile crimes.


I see nothing wrong with jörnfs, and there's room for numbers at the end...

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LogFS take three

2007-05-19 Thread Bill Davidsen

Dongjun Shin wrote:


There are so many flash-based storage and some disposable storages,
as you pointed out, have poor quality. I think it's mainly because these
are not designed for good quality, but for lowering the price.

The reliability seems to be appropriate to the common use. I'm doubious 
that computer storage was a big design factor until the last few years. 
A good argument for buying large sizes, they are more likely to be 
recent design.



These kind of devices are not ready for things like power failure because
their use case is far from that. For example, removing flash card
while taking pictures using digital camera is not a common use case.
(there should be a written notice that this kind of action is against
the warranty)

They do well in such use, if you equate battery death to pulling the 
card (it may not be). I have tested that feature and not had a failure 
of any but the last item. Clearly not recommended, but sometimes 
unplanned needs arise.




- In contrast to the embedded environment where CPU and flash is directly
connected, the I/O path between CPU and flash in PC environment is longer.
The latency for SW handshaking between CPU and flash will also be longer,
which would make the performance optimization harder.

As I mentioned, some techniques like log-structured filesystem could
perform generally better on any kind of flash-based storage with FTL.
Although there are many kinds of FTL, it is commonly true that
it performs well under workload where sequential write is dominant.

I also expect that FTL for PC environment will have better quality spec
than the disposable storage.


The recent technology announcements from Intel are encouraging in that 
respect.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-19 Thread Bill Davidsen
I generated a table of results from the latest glitch1 script, using an 
HTML postprocessor I not *quite* ready to foist on the word. In any case 
it has some numbers for frames per second, fairness of the processor 
time allocated to the compute bound processes which generate a lot of 
other screen activity for X, and my subjective comments on how smooth it 
looked and felt.


The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for 
your viewing pleasure. The only tuned result was with sd, since what I 
observed was so bad using the default settings. If any scheduler 
developers would like me to try other tunings or new versions let me know.


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48

2007-05-19 Thread Bill Davidsen

Ray Lee wrote:

On 5/19/07, Bill Davidsen [EMAIL PROTECTED] wrote:

I generated a table of results from the latest glitch1 script, using an
HTML postprocessor I not *quite* ready to foist on the word. In any case
it has some numbers for frames per second, fairness of the processor
time allocated to the compute bound processes which generate a lot of
other screen activity for X, and my subjective comments on how smooth it
looked and felt.

The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for
your viewing pleasure.


Is the S.D. columns (immediately after the average) standard
deviation? If so, you may want to rename those 'stdev', as it's a
little confusing to have S.D. stand for that and Staircase Deadline.
Further, which standard deviation is it? (The standard deviation of
the values (stdev), or the standard deviation of the mean (sdom)?)

What's intended is the stddev from the average, and perl bit me on that 
one. If you spell a variable wrong the same way more than once it 
doesn't flag it as a possible spelling error.


Note on the math, even when coded as intended, the divide of the squares 
of the errors is by N-1 not N. I found it both ways in online doc, but I 
learned it decades ago as N-1 so I used that.

Finally, if it is the standard deviation (of either), then I don't
really believe those numbers for the glxgears case. The deviation is
huge for all but one of those results.

I had the same feeling, but because of the code error above, what failed 
was zeroing the sum of the errors, so (a) values after the first kept 
getting larger, and when I debugged it against the calculation by hand, 
the first one matched so I thought I had it right.



Regardless, it's good that you're doing measurements, and keep it up :-).


Okay, here's a bonus, http://www.tmr.com/~davidsen/sched_smooth_02.html 
not only has the right values, the labels are changed, and I included 
more data points from the fc6 recent kernel and the 2.6.21.1 kernel with 
the mainline scheduler.


The nice thing about this test and the IPC test I posted recently is 
that they are reasonable stable on the same hardware, so even if someone 
argues about what they show, they show the same thing each time and can 
therefore be used to compare changes.


As I told a manger at the old Prodigy after coding up some log analysis 
with pretty graphs, getting the data was the easy part, the hard part 
is figuring out what it means. If this data is useful in suggesting 
changes, then it has value. Otherwise it was a fun way to spend some time.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Scheduler smoothness and fairness - results and package

2007-05-22 Thread Bill Davidsen
Thanks to the suggestions of several people and some encouragement, I've 
done another upgrade of the schedular characteristic package glitch1. 
For bottom line folks, the results are at

http://www.tmr.com/~davidsen/sched_smooth_03.html

This package runs four fast scrolling xterms and a copy of glxgears to 
produce both screen update and CPU load. In addition to the human 
observation of smoothness, the glxgears speeds are characterized by 
variance from sample to sample, and the number of random numbers 
generated by the xterm programs are also characterized.


Changes:

- Reruns for a given configuration are shown in a single row.
- The analysis has had minor output format and statistical tweaks for 
correctness as well as handling of multiple run output in a single row.
- the glxgears 1st value is shown as a separate item, since there is a 
large stoppage on the first sample after cold boot with some schedulers.


The full source and doc is now available from
http://www.tmr.com/~public/source/
so people can do their own runs. Note that values between different 
machines are almost certainly not meaningful.


Having run all this on a dual core Core2duo in x86 (32 bit) mode, I'm 
now off to rerun in x86_64 mode, and on a single CPU hyperthreaded 
machine, and a pure uniprocessor. I'm going to create a page for all the 
results in one place for anyone who cares.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7

2007-04-30 Thread Bill Davidsen
System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display 
using i945G framebuffer


Test: playing a 'toon with mplayer while kernel build -j20 running.

Tuning: not yet, all scheduler parameters were default

Result: base 2.6.21 showed some pauses and after the pause the sound got 
louder for a short time (500ms). With sd-0.46 the playback had many 
glitches and finally just stopped with the display looping on a small 
number of frames and no sound. The skips were repeatable, the hang was 
only two of five runs, I didn't let them go until the make finished 
(todo list) but killed the mplayer after 10-15 sec. No glitches observed 
with cfsv7, I thought I saw one but repeating with granularity set to 
50 and then with no make running convinced me that it's just a 
crappy piece of animation at that point.


I ran glxgears, again sd-0.46 had frequent pauses and uneven fps 
reported. Stock 2.6.21 had a visible pause when the frame rate was 
output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps.


All tests gave acceptable typing echo, it seems that X is getting enough 
time at that load to echo without major issues.


I will be doing tests with server load later this week, have to add disk 
for the database.


Hope this initial report is useful, I may be able to update ctxbench 
later today and try that.


--
Bill Davidsen [EMAIL PROTECTED]
 We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7

2007-04-30 Thread Bill Davidsen

Bill Davidsen wrote:
System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display 
using i945G framebuffer


Test: playing a 'toon with mplayer while kernel build -j20 running.

Tuning: not yet, all scheduler parameters were default

Result: base 2.6.21 showed some pauses and after the pause the sound got 
louder for a short time (500ms). With sd-0.46 the playback had many 
glitches and finally just stopped with the display looping on a small 
number of frames and no sound. The skips were repeatable, the hang was 
only two of five runs, I didn't let them go until the make finished 
(todo list) but killed the mplayer after 10-15 sec. No glitches observed 
with cfsv7, I thought I saw one but repeating with granularity set to 
50 and then with no make running convinced me that it's just a 
crappy piece of animation at that point.


I ran glxgears, again sd-0.46 had frequent pauses and uneven fps 
reported. Stock 2.6.21 had a visible pause when the frame rate was 
output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps.


All tests gave acceptable typing echo, it seems that X is getting enough 
time at that load to echo without major issues.


I will be doing tests with server load later this week, have to add disk 
for the database.


Hope this initial report is useful, I may be able to update ctxbench 
later today and try that.


Followup: I reran with sd-0.46, setting rr_interval to 40, and then 5 
(default was 16). Neither appeared to give a useful video playback. I 
did try setting the make to nice 10, and that made the playback 
perfectly smooth, as well as response to skip forward and volume change 
happening when the key was pressed instead of eventually.


I also tried raising the nice of X to -10, that made things better on 
display, but I winder if it will let X run ahead of the nice-0 raid threads.


Is this my hardware or is there a really odd behavior here? The sd seems 
to be too fair to cope well with this realistic load, and expecting 
users to nice things is probably morally correct but unrealistic.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7

2007-05-02 Thread Bill Davidsen

Bill Huey (hui) wrote:

On Mon, Apr 30, 2007 at 03:58:45PM -0400, Bill Davidsen wrote:
  
Followup: I reran with sd-0.46, setting rr_interval to 40, and then 5 
(default was 16). Neither appeared to give a useful video playback. I 
did try setting the make to nice 10, and that made the playback 
perfectly smooth, as well as response to skip forward and volume change 
happening when the key was pressed instead of eventually.


I also tried raising the nice of X to -10, that made things better on 
display, but I winder if it will let X run ahead of the nice-0 raid threads.


Is this my hardware or is there a really odd behavior here? The sd seems 
to be too fair to cope well with this realistic load, and expecting 
users to nice things is probably morally correct but unrealistic.



People have been reporting very good performance with regards to OpenGL
applications under SD. What is your video driver ? NVidia proprietary ?

  
My original post I was following gave my config, built-in graphics using 
945G framebuffer. This is a server, I'm not a gamer. The only fancy 
graphics I have are on a system with no on board video at all, I picked 
up a moderately high-end Radeon card to drop in. And to give you an idea 
of what a gamer I am, that uses the vesafb driver ;-)

OpenGL, X and direct frame buffer access (mplayer and friends) tend not
to interact each other which can result in very different scheduling
characteristics between them.
  


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7

2007-05-02 Thread Bill Davidsen

Con Kolivas wrote:

On Tuesday 01 May 2007 05:29, Bill Davidsen wrote:
  

System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display
using i945G framebuffer



Bill thanks for testing.
  

Test: playing a 'toon with mplayer while kernel build -j20 running.



Umm I don't think make -j20 is a realistic load on 2 cores. Not only does it 
raise your load to 20 but your I/O bandwidth will even be struggling. If 
video playback was to be smooth at that size a load it would suggest some 
serious unfairness. I'm not just pushing the fairness barrow here; I mean it 
would need to be really really unfair unless your combined X and video 
playback cpu combined added up to less than 1/20th of your total cpu power 
(which is possible but I kinda doubt it). Do you really use make -j20 to 
build regularly?


  
Yes, this is a compile and file server, I frequently build a raft of 
kernels when a security patch comes out. There doesn't seem to be an i/o 
issue, with 2GB RAM and RAID5 over a SATA array I have enough, but 
honestly the disk activity is minimal, even with a single drive.

Tuning: not yet, all scheduler parameters were default

Result: base 2.6.21 showed some pauses and after the pause the sound got
louder for a short time (500ms). With sd-0.46 the playback had many
glitches and finally just stopped with the display looping on a small
number of frames and no sound. The skips were repeatable, the hang was
only two of five runs, I didn't let them go until the make finished
(todo list) but killed the mplayer after 10-15 sec. No glitches observed
with cfsv7, I thought I saw one but repeating with granularity set to
50 and then with no make running convinced me that it's just a
crappy piece of animation at that point.



I did notice on your followup email that nice +10 of the 20 makes fixed the 
playback which sounds pretty good.


  

Yes, I can get around the load doing that.

I ran glxgears, again sd-0.46 had frequent pauses and uneven fps
reported. Stock 2.6.21 had a visible pause when the frame rate was
output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps.



I assume you mean glxgears when you're running make -j20 again here.
  

Of course. ;-)

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >