Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
timeout issue? Just a quick update; it is really starting to look like there is definitely an issue with the nbd kernel driver. I booted the SLES10 2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD hang. But it _still_ occurs with the 4-step process I outlined above. First, running an smp kernel with maxcpus=1 is not the same as running a uni kernel, not is nosmp option. The code is different. Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, but was told it wouldn't work with smp and I kind of lost interest. If Neil thinks it should work in 2.6.21 or later I'll test it, since I have a machine which wants a fresh install soon, and is both backed up and available. The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server is no longer running (the node it was running on was powered off)... however the nbd-client is still connected to the kernel (meaning the kernel didn't return an error back to userspace). Also, MD is still blocking waiting to write the superblock (presumably to nbd0). -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid5: coding style cleanup / refactor
Dan Williams wrote: In other words, it seemed like a good idea at the time, but I am open to suggestions. I went ahead and added the cleanup patch to the front of the git-md-accel.patch series. A few more whitespace cleanups, but no major changes from what I posted earlier. The new rebased series is still passing my tests and Neil's tests in mdadm. When you are ready for wider testing, if you have a patch against a released kernel it makes testing easy, characteristics are pretty well known already. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Paul Clements wrote: Bill Davidsen wrote: Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, but was told it wouldn't work with smp and I kind of lost interest. If Neil thinks it should work in 2.6.21 or later I'll test it, since I have a machine which wants a fresh install soon, and is both backed up and available. Please stop this. nbd is working perfectly fine, AFAIK. I use it every day, and so do 100s of our customers. What exactly is it that not's working? If there's a problem, please send the bug report. Could you clarify what kernel, distribution, and mdadm version is used, and how often the nbd server becomes unavailable to the clients? And your clients are SMP? By working perfectly fine, I assume you do mean in the same way as described in the original posting, and not just with the client, server, and network all fully functional. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: limits on raid
Neil Brown wrote: On Thursday June 14, [EMAIL PROTECTED] wrote: On Fri, 15 Jun 2007, Neil Brown wrote: On Thursday June 14, [EMAIL PROTECTED] wrote: what is the limit for the number of devices that can be in a single array? I'm trying to build a 45x750G array and want to experiment with the different configurations. I'm trying to start with raid6, but mdadm is complaining about an invalid number of drives David Lang man mdadm search for limits. (forgive typos). thanks. why does it still default to the old format after so many new versions? (by the way, the documetnation said 28 devices, but I couldn't get it to accept more then 27) Dunno - maybe I can't count... it's now churning away 'rebuilding' the brand new array. a few questions/thoughts. why does it need to do a rebuild when makeing a new array? couldn't it just zero all the drives instead? (or better still just record most of the space as 'unused' and initialize it as it starts useing it?) Yes, it could zero all the drives first. But that would take the same length of time (unless p/q generation was very very slow), and you wouldn't be able to start writing data until it had finished. You can dd /dev/zero onto all drives and then create the array with --assume-clean if you want to. You could even write a shell script to do it for you. Yes, you could record which space is used vs unused, but I really don't think the complexity is worth it. How about a simple solution which would get an array on line and still be safe? All it would take is a flag which forced reconstruct writes for RAID-5. You could set it with an option, or automatically if someone puts --assume-clean with --create, leave it in the superblock until the first repair runs to completion. And for repair you could make some assumptions about bad parity not being caused by error but just unwritten. Thought 2: I think the unwritten bit is easier than you think, you only need it on parity blocks for RAID5, not on data blocks. When a write is done, if the bit is set do a reconstruct, write the parity block, and clear the bit. Keeping a bit per data block is madness, and appears to be unnecessary as well. while I consider zfs to be ~80% hype, one advantage it could have (but I don't know if it has) is that since the filesystem an raid are integrated into one layer they can optimize the case where files are being written onto unallocated space and instead of reading blocks from disk to calculate the parity they could just put zeros in the unallocated space, potentially speeding up the system by reducing the amount of disk I/O. Certainly. But the raid doesn't need to be tightly integrated into the filesystem to achieve this. The filesystem need only know the geometry of the RAID and when it comes to write, it tries to write full stripes at a time. If that means writing some extra blocks full of zeros, it can try to do that. This would require a little bit better communication between filesystem and raid, but not much. If anyone has a filesystem that they want to be able to talk to raid better, they need only ask... is there any way that linux would be able to do this sort of thing? or is it impossible due to the layering preventing the nessasary knowledge from being in the right place? Linux can do anything we want it to. Interfaces can be changed. All it takes is a fairly well defined requirement, and the will to make it happen (and some technical expertise, and lots of time and coffee?). Well, I gave you two thoughts, one which would be slow until a repair but sounds easy to do, and one which is slightly harder but works better and minimizes performance impact. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: limits on raid
[EMAIL PROTECTED] wrote: On Sat, 16 Jun 2007, Neil Brown wrote: It would be possible to have a 'this is not initialised' flag on the array, and if that is not set, always do a reconstruct-write rather than a read-modify-write. But the first time you have an unclean shutdown you are going to resync all the parity anyway (unless you have a bitmap) so you may as well resync at the start. And why is it such a big deal anyway? The initial resync doesn't stop you from using the array. I guess if you wanted to put an array into production instantly and couldn't afford any slowdown due to resync, then you might want to skip the initial resync but is that really likely? in my case it takes 2+ days to resync the array before I can do any performance testing with it. for some reason it's only doing the rebuild at ~5M/sec (even though I've increased the min and max rebuild speeds and a dd to the array seems to be ~44M/sec, even during the rebuild) I want to test several configurations, from a 45 disk raid6 to a 45 disk raid0. at 2-3 days per test (or longer, depending on the tests) this becomes a very slow process. I've been doing stuff like this, but I just build the array on a partition per drive so the init is livable. For the stuff I'm doing a total of 500-100GB is ample to do performance testing. also, when a rebuild is slow enough (and has enough of a performance impact) it's not uncommon to want to operate in degraded mode just long enought oget to a maintinance window and then recreate the array and reload from backup. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: limits on raid
I didn't get a comment on my suggestion for a quick and dirty fix for -assume-clean issues... Bill Davidsen wrote: Neil Brown wrote: On Thursday June 14, [EMAIL PROTECTED] wrote: it's now churning away 'rebuilding' the brand new array. a few questions/thoughts. why does it need to do a rebuild when makeing a new array? couldn't it just zero all the drives instead? (or better still just record most of the space as 'unused' and initialize it as it starts useing it?) Yes, it could zero all the drives first. But that would take the same length of time (unless p/q generation was very very slow), and you wouldn't be able to start writing data until it had finished. You can dd /dev/zero onto all drives and then create the array with --assume-clean if you want to. You could even write a shell script to do it for you. Yes, you could record which space is used vs unused, but I really don't think the complexity is worth it. How about a simple solution which would get an array on line and still be safe? All it would take is a flag which forced reconstruct writes for RAID-5. You could set it with an option, or automatically if someone puts --assume-clean with --create, leave it in the superblock until the first repair runs to completion. And for repair you could make some assumptions about bad parity not being caused by error but just unwritten. Thought 2: I think the unwritten bit is easier than you think, you only need it on parity blocks for RAID5, not on data blocks. When a write is done, if the bit is set do a reconstruct, write the parity block, and clear the bit. Keeping a bit per data block is madness, and appears to be unnecessary as well. while I consider zfs to be ~80% hype, one advantage it could have (but I don't know if it has) is that since the filesystem an raid are integrated into one layer they can optimize the case where files are being written onto unallocated space and instead of reading blocks from disk to calculate the parity they could just put zeros in the unallocated space, potentially speeding up the system by reducing the amount of disk I/O. Certainly. But the raid doesn't need to be tightly integrated into the filesystem to achieve this. The filesystem need only know the geometry of the RAID and when it comes to write, it tries to write full stripes at a time. If that means writing some extra blocks full of zeros, it can try to do that. This would require a little bit better communication between filesystem and raid, but not much. If anyone has a filesystem that they want to be able to talk to raid better, they need only ask... is there any way that linux would be able to do this sort of thing? or is it impossible due to the layering preventing the nessasary knowledge from being in the right place? Linux can do anything we want it to. Interfaces can be changed. All it takes is a fairly well defined requirement, and the will to make it happen (and some technical expertise, and lots of time and coffee?). Well, I gave you two thoughts, one which would be slow until a repair but sounds easy to do, and one which is slightly harder but works better and minimizes performance impact. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: limits on raid
David Greaves wrote: [EMAIL PROTECTED] wrote: On Fri, 22 Jun 2007, David Greaves wrote: That's not a bad thing - until you look at the complexity it brings - and then consider the impact and exceptions when you do, eg hardware acceleration? md information fed up to the fs layer for xfs? simple long term maintenance? Often these problems are well worth the benefits of the feature. I _wonder_ if this is one where the right thing is to just say no :) so for several reasons I don't see this as something that's deserving of an atomatic 'no' David Lang Err, re-read it, I hope you'll see that I agree with you - I actually just meant the --assume-clean workaround stuff :) If you end up 'fiddling' in md because someone specified --assume-clean on a raid5 [in this case just to save a few minutes *testing time* on system with a heavily choked bus!] then that adds *even more* complexity and exception cases into all the stuff you described. A few minutes? Are you reading the times people are seeing with multi-TB arrays? Let's see, 5TB at a rebuild rate of 20MB... three days. And as soon as you believe that the array is actually usable you cut that rebuild rate, perhaps in half, and get dog-slow performance from the array. It's usable in the sense that reads and writes work, but for useful work it's pretty painful. You either fail to understand the magnitude of the problem or wish to trivialize it for some reason. By delaying parity computation until the first write to a stripe only the growth of a filesystem is slowed, and all data are protected without waiting for the lengthly check. The rebuild speed can be set very low, because on-demand rebuild will do most of the work. I'm very much for the fs layer reading the lower block structure so I don't have to fiddle with arcane tuning parameters - yes, *please* help make xfs self-tuning! Keeping life as straightforward as possible low down makes the upwards interface more manageable and that goal more realistic... Those two paragraphs are mutually exclusive. The fs can be simple because it rests on a simple device, even if the simple device is provided by LVM or md. And LVM and md can stay simple because they rest on simple devices, even if they are provided by PATA, SATA, nbd, etc. Independent layers make each layer more robust. If you want to compromise the layer separation, some approach like ZFS with full integration would seem to be promising. Note that layers allow specialized features at each point, trading integration for flexibility. My feeling is that full integration and independent layers each have benefits, as you connect the layers to expose operational details you need to handle changes in those details, which would seem to make layers more complex. What I'm looking for here is better performance in one particular layer, the md RAID5 layer. I like to avoid unnecessary complexity, but I feel that the current performance suggests room for improvement. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Neil Brown wrote: On Monday May 28, [EMAIL PROTECTED] wrote: There are two things I'm not sure you covered. First, disks which don't support flush but do have a cache dirty status bit you can poll at times like shutdown. If there are no drivers which support these, it can be ignored. There are really devices like that? So to implement a flush, you have to stop sending writes and wait and poll - maybe poll every millisecond? Yes, there really are (or were). But I don't think that there are drivers, so it's not an issue. That wouldn't be very good for performance maybe you just wouldn't bother with barriers on that sort of device? That is why there are no drivers... Which reminds me: What is the best way to turn off barriers? Several filesystems have -o nobarriers or -o barriers=0, or the inverse. If they can function usefully without, the admin gets to make that choice. md/raid currently uses barriers to write metadata, and there is no way to turn that off. I'm beginning to wonder if that is best. I don't see how you can have reliable operation without it, particularly WRT bitmap. Maybe barrier support should be a function of the device. i.e. the filesystem or whatever always sends barrier requests where it thinks it is appropriate, and the block device tries to honour them to the best of its ability, but if you run blockdev --enforce-barriers=no /dev/sda then you lose some reliability guarantees, but gain some throughput (a bit like the 'async' export option for nfsd). Since this is device dependent, it really should be in the device driver, and requests should have status of success, failure, or feature unavailability. Second, NAS (including nbd?). Is there enough information to handle this really right? NAS means lots of things, including NFS and CIFS where this doesn't apply. Well, we're really talking about network attached devices rather than network filesystems. I guess people do lump them together. For 'nbd', it is entirely up to the protocol. If the protocol allows a barrier flag to be sent to the server, then barriers should just work. If it doesn't, then either the server disables write-back caching, or flushes every request, or you lose all barrier guarantees. Pretty much agrees with what I said above, it's at a level closer to the device, and status should come back from the physical i/o request. For 'iscsi', I guess it works just the same as SCSI... Hopefully. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: IOWs, there are two parts to the problem: 1 - guaranteeing I/O ordering 2 - guaranteeing blocks are on persistent storage. Right now, a single barrier I/O is used to provide both of these guarantees. In most cases, all we really need to provide is 1); the need for 2) is a much rarer condition but still needs to be provided. if I am understanding it correctly, the big win for barriers is that you do NOT have to stop and wait until the data is on persistant media before you can continue. Yes, if we define a barrier to only guarantee 1), then yes this would be a big win (esp. for XFS). But that requires all filesystems to handle sync writes differently, and sync_blockdev() needs to call blkdev_issue_flush() as well So, what do we do here? Do we define a barrier I/O to only provide ordering, or do we define it to also provide persistent storage writeback? Whatever we decide, it needs to be documented The block layer already has a notion of the two types of barriers, with a very small amount of tweaking we could expose that. There's absolutely zero reason we can't easily support both types of barriers. That sounds like a good idea - we can leave the existing WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED behaviour that only guarantees ordering. The filesystem can then choose which to use where appropriate Precisely. The current definition of barriers are what Chris and I came up with many years ago, when solving the problem for reiserfs originally. It is by no means the only feasible approach. I'll add a WRITE_ORDERED command to the #barrier branch, it already contains the empty-bio barrier support I posted yesterday (well a slightly modified and cleaned up version). Wait. Do filesystems expect (depend on) anything but ordering now? Does md? Having users of barriers as they currently behave suddenly getting SYNC behavior where they expect ORDERED is likely to have a negative effect on performance. Or do I misread what is actually guaranteed by WRITE_BARRIER now, and a flush is currently happening in all cases? And will this also be available to user space f/s, since I just proposed a project which uses one? :-( I think the goal is good, more choice is almost always better choice, I just want to be sure there won't be big disk performance regressions. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Jens Axboe wrote: On Thu, May 31 2007, Bill Davidsen wrote: Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: IOWs, there are two parts to the problem: 1 - guaranteeing I/O ordering 2 - guaranteeing blocks are on persistent storage. Right now, a single barrier I/O is used to provide both of these guarantees. In most cases, all we really need to provide is 1); the need for 2) is a much rarer condition but still needs to be provided. if I am understanding it correctly, the big win for barriers is that you do NOT have to stop and wait until the data is on persistant media before you can continue. Yes, if we define a barrier to only guarantee 1), then yes this would be a big win (esp. for XFS). But that requires all filesystems to handle sync writes differently, and sync_blockdev() needs to call blkdev_issue_flush() as well So, what do we do here? Do we define a barrier I/O to only provide ordering, or do we define it to also provide persistent storage writeback? Whatever we decide, it needs to be documented The block layer already has a notion of the two types of barriers, with a very small amount of tweaking we could expose that. There's absolutely zero reason we can't easily support both types of barriers. That sounds like a good idea - we can leave the existing WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED behaviour that only guarantees ordering. The filesystem can then choose which to use where appropriate Precisely. The current definition of barriers are what Chris and I came up with many years ago, when solving the problem for reiserfs originally. It is by no means the only feasible approach. I'll add a WRITE_ORDERED command to the #barrier branch, it already contains the empty-bio barrier support I posted yesterday (well a slightly modified and cleaned up version). Wait. Do filesystems expect (depend on) anything but ordering now? Does md? Having users of barriers as they currently behave suddenly getting SYNC behavior where they expect ORDERED is likely to have a negative effect on performance. Or do I misread what is actually guaranteed by WRITE_BARRIER now, and a flush is currently happening in all cases? See the above stuff you quote, it's answered there. It's not a change, this is how the Linux barrier write has always worked since I first implemented it. What David and I are talking about is adding a more relaxed version as well, that just implies ordering. I was reading the documentation in block/biodoc.txt, which seems to just say ordered: 1.2.1 I/O Barriers There is a way to enforce strict ordering for i/os through barriers. All requests before a barrier point must be serviced before the barrier request and any other requests arriving after the barrier will not be serviced until after the barrier has completed. This is useful for higher level control on write ordering, e.g flushing a log of committed updates to disk before the corresponding updates themselves. A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. The generic i/o scheduler would make sure that it places the barrier request and all other requests coming after it after all the previous requests in the queue. Barriers may be implemented in different ways depending on the driver. A SCSI driver for example could make use of ordered tags to preserve the necessary ordering with a lower impact on throughput. For IDE this might be two sync cache flush: a pre and post flush when encountering a barrier write. The flush comment is associated with IDE, so it wasn't clear that the device cache is always cleared to force the data to the platter. And will this also be available to user space f/s, since I just proposed a project which uses one? :-( I see several uses for that, so I'd hope so. I think the goal is good, more choice is almost always better choice, I just want to be sure there won't be big disk performance regressions. We can't get more heavy weight than the current barrier, it's about as conservative as you can get. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Neil Brown wrote: On Friday June 1, [EMAIL PROTECTED] wrote: On Thu, May 31, 2007 at 02:31:21PM -0400, Phillip Susi wrote: David Chinner wrote: That sounds like a good idea - we can leave the existing WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED behaviour that only guarantees ordering. The filesystem can then choose which to use where appropriate So what if you want a synchronous write, but DON'T care about the order? submit_bio(WRITE_SYNC, bio); Already there, already used by XFS, JFS and direct I/O. Are you sure? You seem to be saying that WRITE_SYNC causes the write to be safe on media before the request returns. That isn't my understanding. I think (from comments near the definition and a quick grep through the code) that WRITE_SYNC expedites the delivery of the request through the elevator, but doesn't do anything special about getting it onto the media. My impression is that the sync will return when the i/o has been delivered to the device, and will get special treatment by the elevator code (I looked quickly, more is needed). I'm sore someone will tell me if I misread this. ;-) -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Jens Axboe wrote: On Thu, May 31 2007, Phillip Susi wrote: Jens Axboe wrote: No Stephan is right, the barrier is both an ordering and integrity constraint. If a driver completes a barrier request before that request and previously submitted requests are on STABLE storage, then it violates that principle. Look at the code and the various ordering options. I am saying that is the wrong thing to do. Barrier should be about ordering only. So long as the order they hit the media is maintained, the order the requests are completed in can change. barrier.txt bears But you can't guarentee ordering without flushing the data out as well. It all depends on the type of cache on the device, of course. If you look at the ordinary sata/ide drive with write back caching, you can't just issue the requests in order and pray that the drive cache will make it to platter. If you don't have write back caching, or if the cache is battery backed and thus guarenteed to never be lost, maintaining order is naturally enough. Do I misread this? If ordered doesn't reach all the way to the platter then there will be failure modes which result in order not preserved. Battery backed cache doesn't prevect failures between the cache and the platter. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Jens Axboe wrote: On Fri, Jun 01 2007, Bill Davidsen wrote: Jens Axboe wrote: On Thu, May 31 2007, Bill Davidsen wrote: Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote: On Thu, May 31 2007, David Chinner wrote: IOWs, there are two parts to the problem: 1 - guaranteeing I/O ordering 2 - guaranteeing blocks are on persistent storage. Right now, a single barrier I/O is used to provide both of these guarantees. In most cases, all we really need to provide is 1); the need for 2) is a much rarer condition but still needs to be provided. if I am understanding it correctly, the big win for barriers is that you do NOT have to stop and wait until the data is on persistant media before you can continue. Yes, if we define a barrier to only guarantee 1), then yes this would be a big win (esp. for XFS). But that requires all filesystems to handle sync writes differently, and sync_blockdev() needs to call blkdev_issue_flush() as well So, what do we do here? Do we define a barrier I/O to only provide ordering, or do we define it to also provide persistent storage writeback? Whatever we decide, it needs to be documented The block layer already has a notion of the two types of barriers, with a very small amount of tweaking we could expose that. There's absolutely zero reason we can't easily support both types of barriers. That sounds like a good idea - we can leave the existing WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED behaviour that only guarantees ordering. The filesystem can then choose which to use where appropriate Precisely. The current definition of barriers are what Chris and I came up with many years ago, when solving the problem for reiserfs originally. It is by no means the only feasible approach. I'll add a WRITE_ORDERED command to the #barrier branch, it already contains the empty-bio barrier support I posted yesterday (well a slightly modified and cleaned up version). Wait. Do filesystems expect (depend on) anything but ordering now? Does md? Having users of barriers as they currently behave suddenly getting SYNC behavior where they expect ORDERED is likely to have a negative effect on performance. Or do I misread what is actually guaranteed by WRITE_BARRIER now, and a flush is currently happening in all cases? See the above stuff you quote, it's answered there. It's not a change, this is how the Linux barrier write has always worked since I first implemented it. What David and I are talking about is adding a more relaxed version as well, that just implies ordering. I was reading the documentation in block/biodoc.txt, which seems to just say ordered: 1.2.1 I/O Barriers There is a way to enforce strict ordering for i/os through barriers. All requests before a barrier point must be serviced before the barrier request and any other requests arriving after the barrier will not be serviced until after the barrier has completed. This is useful for higher level control on write ordering, e.g flushing a log of committed updates to disk before the corresponding updates themselves. A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. The generic i/o scheduler would make sure that it places the barrier request and all other requests coming after it after all the previous requests in the queue. Barriers may be implemented in different ways depending on the driver. A SCSI driver for example could make use of ordered tags to preserve the necessary ordering with a lower impact on throughput. For IDE this might be two sync cache flush: a pre and post flush when encountering a barrier write. The flush comment is associated with IDE, so it wasn't clear that the device cache is always cleared to force the data to the platter. The above should mention that the ordered tag comment for SCSI assumes that the drive uses write through caching. If it does, then an ordered tag is enough. If it doesn't, then you need a bit more than that (a post flush, after the ordered tag has completed). Thanks, go it. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduling oddity on 2.6.20.3 stock
David Schwartz wrote: bunzip2 -c $file.bz2 |gzip -9 $file.gz So here are some actual results from a dual P3-1Ghz machine (2.6.21.1, CFSv9). First lets time each operation individually: $ time bunzip2 -k linux-2.6.21.tar.bz2 real1m5.626s user1m2.240s sys 0m3.144s $ time gzip -9 linux-2.6.21.tar real1m17.652s user1m15.609s sys 0m1.912s The compress was the most complex (no surprise there) but they are close enough that efficient overlap will definitely affect the total wall time. If we can both decompress and compress in 1:17, we are optimal. First, let's try the normal way: $ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9 test1) real1m45.051s user2m16.945s sys 0m2.752s 1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the two processes ('accel' creates a 32MB cache and uses 'select' to fill from stdin and empty to stdout without blocking either direction): $ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9 test2) real1m18.361s user2m19.589s sys 0m6.356s Within testing accuracy of optimal. So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate input/output buffering. I don't think it's unreasonable to consider this a defect in those programs. They are hardly designed to optimize this operation... For a tunable buffer program allowing the buffer size and buffers in the pool to be set, see www.tmr.com/~public/source program ptbuf. I wrote it as a proof of concept for a pthreads presentation I was giving, and it happened to be useful. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
glitch1 results - 2.6.21.3-cfs-v15
I have added cfs15 to the chart at www.tmr.com/~davidsen/sched_smooth_05.html and updated the source of the test at www.tmr.com/~public/source if anyone wants to run test on their hardware. I feel that on my hardware cfs-13 was the smoothest for this test and for watching videos. Even relatively light load: nice -10 make -j4 -s of a kernel would cause jumps on the video, gears or youtube. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
Stefan Seyfried wrote: Hi, On Sat, May 26, 2007 at 06:42:37PM -0400, Bill Davidsen wrote: I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. Interesting use case. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. You probably need to reset the video mode. Try the s2ram workaround, specifically -m. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? Is the kernel not saved such that any kernel can be rolled back into memory and run? The Kernel does nothing to the bootloader during suspend. The kernel does not even know that you are using a bootloader and how it might be configured. What I really expected is that what I was running would be save, and resume would restore what I was running and then jump back to where that suspended itself. Without having to address the issue of booting the right kernel, but having any functional kernel which was booted then restore whar was originally suspended. From discussion here, I conclude that it could work that way but doesn't. Userland has to do this (and SUSE's pm-utils actually do. I thought the Fedora pm-utils also did, but i cannot say for sure). Just find out which entry in menu.lst corresponds to the currently running kernel, and preselect it for the next boot. It is doable. So it's a problem of your distro's userland (and if you did not use pm-hibernate to suspend, it is your very own problem). You could of course simply go for GRUB's default saved and savedefault feature, to always boot the last-booted kernel unless changed in the menu. I'm being very careful to avoid changing the default boot kernel. If the system suspends (ie. deliberately) I want to resume in the running kernel, but if it crashes I want the cold boot to bring up a known stable kernel, even though that may be lacking in features, have an old scheduler, etc. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch 04/18] include/linux/logfs.h
Segher Boessenkool wrote: It would be better if GCC had a 'nopadding' attribute which gave us what we need without the _extra_ implications about alignment. That's impossible; removing the padding from a struct _will_ make accesses to its members unaligned (think about arrays of that struct). And many platforms happily support unaligned CPU access in hardware at a price in performance, while other support it in software at great cost in performance. None of that maps into impossible, Some i/o hardware may not support at all and require some bounce buffering, at cost in memory and CPU. None of that equates with impossible. It is readily argued that it could mean inadvisable on some architectures, slow as government assistance and ugly as the north end of a south-bound hedgehog, but it's not impossible. Do NOT take this to mean I think it would be a good thing in a Linux kernel, or that it should be added to gcc, but in some use like embedded applications where memory use is an important cost driver, people are probably doing it already by hand to pack struct arrays into minimal bytes. It's neither impossible nor totally useless. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Extend Linux to support proportional-share scheduling
Willy Tarreau wrote: On Tue, Jun 05, 2007 at 09:31:33PM -0700, Li, Tong N wrote: Willy, These are all good comments. Regarding the cache penalty, I've done some measurements using benchmarks like SPEC OMP on an 8-processor SMP and the performance with this patch was nearly identical to that with the mainline. I'm sure some apps may suffer from the potentially more migrations with this design. In the end, I think what we want is to balance fairness and performance. This design currently emphasizes on fairness, but it could be changed to relax fairness when performance does become an issue (which could even be a user-tunable knob depending on which aspect the user cares more). Maybe storing in each task a small list of the 2 or 4 last CPUs used would help the scheduler in trying to place them. I mean, let's say you have 10 tasks and 8 CPUs. You first assign tasks 1..8 CPUs 1..8 for 1 timeslice. Then you will give 9..10 a run on CPUs 1..2, and CPUs 3..8 will be usable for other tasks. It wil be optimal to run tasks 3..8 on them. Then you will stop some of those because they are in advance, and run 9..10 and 1..2 again. You'll have to switch 1..2 to another group of CPUs to maintain hot cache on CPUs 1..2 for tasks 9..10. But another possibility would be to consider that 9..10 and 1..2 have performed the same amount of work, so let's 9..10 take some advance and benefit from the hot cache, then try to place 1..2 there again. But it will mean that 3..8 will now have run 2 timeslices more than others. At this moment, it should be wise to make them sleep and keep their CPU history for future use. Maybe on end-user systems, the CPUs history is not that important because of the often small caches, but on high-end systems with large L2/L3 caches, I think that we can often keep several tasks in the cache, justifying the ability to select one of the last CPUs used. CPU affinity to preserve cache is a very delicate balance. It makes sense to try to run a process on the same CPU, but since even a few ms of running some other process is long enough to refill the cache with new contents (depending on what it does, obviously) that long delays in running a process to get it on the right CPU are not always a saving, using the previous CPU becomes less beneficial rapidly. Some omnipotent scheduler would have a count of pages evicted from cache as process A runs, and deduct that from the affinity of process B previously on the same CPU. Then make a perfect decision when it's better to migrate the task and how far. Since the schedulers now being advanced are fair rather than perfect, everyone is making educated guesses on optimal process migration policy, migrating all threads to improve cache hit vs. spread them to better run threads in parallel, etc. For a desktop I want a scheduler which doesn't suck at the things I do regularly. For a server I'm more concerned with overall tps than the latency of one transaction. Most users would trade a slowdown in kernel compiles for being able to watch youtube while the compile runs, and conversely people with heavily loaded servers would usually trade a slower transaction for more of them per second. Obviously within reason... what people will tolerate is a bounded value. Not an easy thing to do, but probably very complementary to your work IMHO. Agree, not easy at all. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [md-accel PATCH 00/19] md raid acceleration and the async_tx api
Dan Williams wrote: Greetings, Per Andrew's suggestion this is the md raid5 acceleration patch set updated with more thorough changelogs to lower the barrier to entry for reviewers. To get started with the code I would suggest the following order: [md-accel PATCH 01/19] dmaengine: refactor dmaengine around dma_async_tx_descriptor [md-accel PATCH 04/19] async_tx: add the async_tx api [md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside sh-lock [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines The patch set can be broken down into three main categories: 1/ API (async_tx: patches 1 - 4) 2/ implementation (md changes: patches 5 - 15) 3/ driver (iop-adma: patches 16 - 19) I have worked with Neil to get approval of the category 2 changes. However for the category 1 and 3 changes there was no obvious merge-path/maintainer to work through. I have thus far extrapolated Neil's comments about 2 out to 1 and 3, Jeff gave some direction on a early revision about the scalability of the API, and the patch set has picked up various fixes and suggestions from being in -mm for a few releases. Please help me ensure that this code is ready for Linus to pull for 2.6.23. git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus Dan, I hope you will release these as a patchset against 2.6.22 when it's out or 2.6.21. I find I have a lot more confidence in results, good or bad, when comparing something I have run in production with just one patchset added. There are enough other changes in an -rc to confuse the issue, and I don't run them in production (at least not usually). -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How would I do this? (expert tricks) OT
Marc Perkel wrote: I have a server with port 25 closed. I was to be able to run a script every time someone tries to connect to port 25, but from the outside the port remains closed. I need the script that I'm going to run get the IP address that tried to connect. I know it's off topic but it's part of an experiment to stop spam. Put a rule in iptables to jump to a user table to do a log and drop. You are doing it the wrong way, you want to set syslog to write the log message to a FIFO and have a permanent running program reading it (I do just this for other things). Alternatively you can use redirect to send it to a program of your choosing, which can run a script if you really want to. Beware that rate limiting is desirable if you are going to start a process for ANY type of attack packets. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about fair schedulers
Alberto Gonzalez wrote: On Saturday 23 June 2007, Tom Spink wrote: Alberto, If you're feeling adventurous, grab the latest kernel and patch it with Ingo's scheduler: CFS. You may be pleasantly surprised. Thanks, I might if I have to courage to patch and compile my own kernel :) However, I'd also need to change all my applications to set them with the right priority to see the good results, so I think I might just wait until it lands in mainline. In general not the case. I generally don't diddle my priorities, there's rarely a need. Just to check if I understood everything correctly: The mainline scheduler tries to be smart and guess the priority of each task, and while it mostly hits the nail right in the head, sometimes it hits you right in the thumb. Fair schedulers, on the contrary, forget about trying to be smart and just care about being fair, leaving the priority settings to where they belong: applications. Is this more or less correct? Incomplete. The CFS scheduler seems to do better with latency, so you may get less CPU to a process but it doesn't wind up waiting a long time to get a fair share. So it feels better without micro tuning. Face it, if you have more jobs than CPU no scheduler is going to make you really happy. Alberto. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New format Intel microcode...
Andi Kleen wrote: Daniel J Blueman [EMAIL PROTECTED] writes: On 23/03/07, Shaohua Li [EMAIL PROTECTED] wrote: On Thu, 2007-03-22 at 23:45 +, Daniel J Blueman wrote: Hi Shao-hua, Is the tool you mentioned last June [1] available for splitting up the old firmware files to the new format (eg /lib/firmware/intel-ucode/06-0d-06), or are updates available from Intel (or otherwise) in this new format? Yes, we are preparing the new format data files and maybe put it into a new website. We will announce it when it's ready. It's been a while; is there any sign of the ucode updates being available, especially in light of the C2D/Q incorrect TLB invalidation + recent ucode to fix this? That microcode update is not needed on any recent Linux kernel; it flushes the TLBs in a way that is fine. Slashdot carried an article this morning saying that an error in Intel microcode was being fixed. However, it listed only Windows related sites for the fix download. Is this the same TLB issue? And are these really fixes for Windows to flush the TLB properly the way Linux does? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New format Intel microcode...
Andi Kleen wrote: Slashdot carried an article this morning saying that an error in Intel microcode was being fixed. However, it listed only Windows related sites That's a little misleading. Always dangerous getting your information from slashdot. Let's say Intel clarified some corner cases in TLB flushing that have changed with Core2 and not everybody got that right. I wouldn't say it was a Intel bug though. Given that the Slashdot note was a pointer to Microsoft and echo of their statements of a firmware fix, and that same information is on the Microsoft site, I find it hard to find fault with them as a source for pointers and some context on why they might be useful. If Intel has released new microcode to address the issue, then it seems the code didn't function as desired, and it doesn't matter what you call it. for the fix download. Is this the same TLB issue? And are these really I think so. That was one question. fixes for Windows to flush the TLB properly the way Linux does? On newer Linux 2.6 yes. On 2.4/x86-64 you would need in theory the microcode update too. (it'll probably show up at some point at the usual place http://urbanmyth.org/microcode/). Linux/i386 is always fine. But the problem is very obscure and you can likely ignore it too. If your machine crashes it's very likely something else. I don't ignore anything I can fix. An ounce of prevention is worth a pound of cure. My systems don't currently crash, and that's the intended behavior. I was mainly concerned with this being a new issue, and curious if Microsoft was calling an O/S bug a microcode fix, given that the average Windows user doesn't know microcode from nanotech anyway. The non-answer from Arjan didn't answer either, and started by calling the report FUD, implying that Slashdot was wrong (not about this), and issuing so little answer and so much obfuscation that I thought he might be running for President. ;-) I'd like the microcode update, some people elsewhere speculate that user level code could effect reliability if not security. I worry that an old 2.4 kernel would be an issue, even in kvm, if that were the case. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: man-pages-2.59 and man-pages-2.60 are released
Michael Kerrisk wrote: Alexander, I just released man-pages-2.59 and man-pages-2.60. These releases are now available for download at: http://www.kernel.org/pub/linux/docs/manpages Yes, just this morning I decided to tidy away some of the old tarballs into a newly created old directory. There is one little problem with this: there is no stable URL for a given version. Well, there never really was. To date, most old tarballs have had only a limited life on kernel.org. Why? I'm not questioning the policy, it's just that if HUGE kernel versions are kept available forever, a tiny man page tar would not seem to be a disk space issue. This hurts, e.g., automated Linux From Scratch rebuilds (the official script grabs the URL from the book, but it becomes invalid too soon). Could you please, in order to avoid this, do what SAMBA team does: place into http://www.kernel.org/pub/linux/docs/manpages/Old not only old versions, but also the current version? This way, LFS will be sure that the 2.60 version is always available as As noted above old versions never were always available on kernel.org... http://www.kernel.org/pub/linux/docs/manpages/Old/man-pages-2.60.tar.bz2 (even if it is in fact the latest version). How about a link in /pub/linux/docs/manpages/ of the form LATEST-IS-m.xy? Rob Landley was wanting something like this, and I guess it would be easy for LFS to build a simple script that looks for that link and deduces man-pages-m.xy from it. (I've just now created such a link in the directory, as an example.) Why not just a link with a fixed name (LATEST?) which could be updated? I assume installing a new version is automated to create and install the tar, any needed links, the push to mirrors, etc. So it would just be a single step added to an automated procedure. You could have a link in Old as requested, and any other links as well. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New format Intel microcode...
Chuck Ebbert wrote: On 06/28/2007 11:27 AM, Andi Kleen wrote: But the problem is very obscure and you can likely ignore it too. If your machine crashes it's very likely something else. What about deliberate exploits of these bugs from userspace? Theo thinks they are possible... Do you have any details? One of the folks in a chat was saying something similar, but thought that causing as crash was the extent of it, rather than any access violation. Obviously I don't know the extent of that claim, so more information would be good. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v18
Vegard Nossum wrote: Hello, On 6/23/07, Ingo Molnar [EMAIL PROTECTED] wrote: i'm pleased to announce release -v18 of the CFS scheduler patchset. As usual, any sort of feedback, bugreport, fix and suggestion is more than welcome! I have been running cfs-v18 for a couple of days now, and today I stumbled upon a rather strange problem. Consider the following short program: while(1) printf(%ld\r, 1000 * clock() / CLOCKS_PER_SEC); Running this in an xterm makes the xterm totally unresponsive. Ctrl-C takes about two seconds to terminate the program, during which the program will keep running. In fact, it seems that the longer it runs, the longer it takes to terminate (towards 5 seconds after running for a couple of minutes). This is rather surprising, as the rest of the system is quite responsive (even remarkably so). I think this is also in contrast with the expected behaviour, that Ctrl-C/program termination should be prioritized somehow. This sounds as though it might be related to the issues I see with my glitch1 script, posted here a while ago. With cfs-v18 the effect of having multiple xterms scrolling is obvious, occasionally they behave as if they were owed more CPU and get paid back all at once. I've seen this effect to one degree or another since cfs-v13, which did NOT show the effect. Some other observations: X.Org seems to be running at about 75% CPU on CPU 1, the xterm at about 45% on CPU 0, and a.out at about 20% on CPU 0. (HT processor) Killing with -2 or -9 from another terminal works immediately. Ctrl-Z takes the same time as Ctrl-C. I think this is because the shell to read the keypress is getting high latency, rather than the process taking a long time to react. I have been wrong before... I read Ingo's reply to this, I'll gather the same information when the test machine is available later this morning and send it off to Ingo. Another thing to note is that simply looping with no output retains the expected responsiveness of the xterm. Printing i++ is somewhere halfway in between. See http://www.tmr.com/~public/source (note the tilde) for glitch1. Is this behaviour expected or even intended? My main point is that Ctrl-C is a safety fallback which suddenly doesn't work as usual. I might even go so far as to call it a regression. I'd also like to point out that [EMAIL PROTECTED] seems to draw more CPU than it should. Or, at least, in top, it shows up as using 50% CPU even though other processes are demanding as much as they can get. The FAH program should be running with idle priority. I expect it to fall to near 0% when other programs are running at full speed, but it keeps trotting along. And I am pretty sure that this is not due to SMP/HT (I made sure to utilize both CPUs). Lastly, I'd like to mention that I got BUGs (soft lockups) with -v8, though it has not been reproducible with -v18, so I suppose it must have been fixed already. Otherwise, I am satisfied with the performance of CFS. Especially the desktop is noticably smoother. Thanks! Kind regards, Vegard Nossum -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Old bug in tg3 driver unfixed?
Tim Boneko wrote: Hello! I am not subscribed to this list so please CC answers to my mail address. THX! I recently replaced the mainboard of one of my servers with a Tyan Tomcat K8E. The onboard gigabit NIC is a Broadcom BCM5721. After compiling and loading the tg3 driver in Kernel 2.6.21.5, the interface could not be configured: Device not found. While searching the net i found a few other people with the same problem but no solution. By coincidence i found that a simpe ifconfig eth1 worked OK and afterwards the device could be configured and used as desired. After searching this list, i found this posting Probably unrelated, but what's eth0? http://uwsg.iu.edu/hypermail/linux/kernel/0409.0/0224.html by someone with obviously the same problem. Has some patch of the driver been reversed or is the hardware buggy? BTW the chip is connected via PCI Express. I have that chip in a system, but I didn't find it quickly, it may be at another location, unless the controller which shows up as 3C940 on my ASUS P4P800 is the Broadcom. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the overdue eepro100 removal
Adrian Bunk wrote: This patch contains the overdue removal of the eepro100 driver. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] The hardware supported by this driver is still in use, thanks. It's probably easier to leave the eepro100 driver in than find anyone who wants to investigate why the other driver (e100? from memory) doesn't work with some cards. As I recall this was suggested over a year ago and it was decided to leave it in, all of the reasons for doing so still seem valid. There really doesn't seem to be a benefit, it's not like people are working night and day to support new cards for this chip. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend2 is getting a new name.
Nigel Cunningham wrote: Hi all. Suspend2's name is changing to TuxOnIce. This is for a couple of reasons: In recent discussions on LKML, the point was made that the word Suspend is confusing. It is used to refer to both suspending to disk and suspending to ram. Life will be simpler if we more clearly differentiate the two. The name Suspend2 came about a couple of years ago when we made the 2.0 release and started self-hosting. If we ever get to a 3.0 release, the name could become even more confusing! (And there are already problems with people confusing the name with swsusp and talking about uswsusp as version 3!). http://www.suspend2.net is still working at the moment, but we'll shift to http://www.tuxonice.net over the next while. The wiki and bugzilla are already done; email will remain on suspend2.net for a little while and git trees will be renamed at the time of the next stable release. I guess this is good news, bad news time. The good news is that the suspend with working resume project is still active, the bad news is that making provisions for long term out of mainline operation sounds as if you have no hope of getting this code into the mainline kernel. :-( -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Zoltan Boszormenyi wrote: Hi, Zoltan Boszormenyi írta: Hi, I am testing your current code with akpm's beautifying patches for about an hour now. I have seen no problems with it so far. Still using the patch on 2.6.22-rc6 and no problems so far. It's really stable. I am looking forward to the next version and the inclusion into mainstream kernels. Thanks! I am going to hold off any more -rc testing, but if there's a patch against 2.6.22 when it releases, I would certainly try it on a system which is about to be redeploted. I'm also scheduling testing of several RAID queueing patches there as well. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blink driver power saving
Pavel Machek wrote: ...drivers are not expected to act on their own. I was expecting to get nice /sys/class/led* interface to my keyboard leds. What's the benefit of such an interface? If you're able to trigger keyboard LEDs via that interface, you're also able to use the ioctl() on /dev/console. Well, at least it is standartized interface... plus it can do stuff like blink that led on disk access. One of many useful things for system without blinking lights, disk network, thermal alert, etc. And a cheap helper for handicapped folks who can't hear an audible alert. I think the intention of the blink driver was to have a *early* blink, i.e. before initrd (and on systems without intrd, before the first init script runs). ...and yes, it can autoblink, too. It should be even possible to set default behaviour of led to blink, doing what the blink driver does, but in a clean way. Endlessly useful, alarm clock, non-fatal errors on boot, etc. it would be nice If this were done, priority levels would be nice, so the I'm taking a dump or panic would block lower level system use like disk or network lights, and user applications would have some policy to put them higher or lower than the pseudo disk light (or not). /not nice -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Miguel Figueiredo wrote: Bill Davidsen wrote: I generated a table of results from the latest glitch1 script, using an HTML postprocessor I not *quite* ready to foist on the word. In any case it has some numbers for frames per second, fairness of the processor time allocated to the compute bound processes which generate a lot of other screen activity for X, and my subjective comments on how smooth it looked and felt. The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for your viewing pleasure. The only tuned result was with sd, since what I observed was so bad using the default settings. If any scheduler developers would like me to try other tunings or new versions let me know. As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 on the same machine i found *very* odd those numbers you posted, so i tested myself those kernels to see the numbers I get instead of talking about the usage of kernel xpto feels like. I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2 inside Debian's GNOME environment. The hardware is an AMD Sempron64 3.0 GHz, 1 GB RAM, Nvidia 6800XT. Average and standard deviation from the gathered data: * 2.6.21: average = 11251.1; stdev = 0.172 * 2.6.21-cfs-v13:average = 11242.8; stdev = 0.033 * 2.6.21-ck2:average = 11257.8; stdev = 0.067 Keep in mind those numbers don't mean anything we all know glxgears is not a benchmark, their purpose is only to be used as comparison under the same conditions. One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time applet in the bar skipped some minutes (e.g. 16:23 - 16:25) several times. The data is available on: http://www.debianPT.org/~elmig/pool/kernel/20070520/ How did you get your data? I am affraid your data it's wrong, there's no such big difference between the schedulers... The glitch1 script starts multiple scrolling xterms at the same time as the glxgears, and allows observation of smoothness of the gears. It's not a benchmark, although the fps is reported since fast or slow and scheduler with fair aspirations should have similar results in 5 sec time slices, and between multiple CPU-bound xterms scrolling with the same code. The comments column can be used to report the user impressions, since that's the important thing if you want to listen to music or watch video. Perhaps my data appear wrong because you have failed to measure the same thing? You can get the most recent info at http://www.tmr.com/~public/source/ if you want to duplicate the test on your hardware, or view the most recent tests at http://www.tmr.com/~davidsen/sched_smooth_03.html to see what the data look like when you run the same test. Note: there have been some minor changes in the test and analysis resulting from suggestions, only the recent results are worth investigating. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
lzo code
It is derived from original LZO 2.02 code found at: http://www.oberhumer.com/opensource/lzo/download/ The code has also been reformatted to match general kernel style. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Increased ipw2200 power usage with dynticks
Björn Steinbrink wrote: On 2007.05.20 20:55:35 +0200, Andi Kleen wrote: Björn Steinbrink [EMAIL PROTECTED] writes: Ok, it seems that ipw2200 is just a trigger for the problem here. AFAICT the cause of the worse C state usage is that after ipw2200 has woken the cpu, acpi_processor_idle() chooses C2 (due to dma? bm? I have no idea...) as the prefered sleep state. Now without NO_HZ or when I hold down a key, there are interrupts that wake up the CPU and when acpi_processor_idle() is called again the promotion to C3/C4 happens. But with NO_HZ, there are no such interrupts, most wakeups are caused by ipw2200 and so the processor doesn't go any deeper than C2 most of the time and thus wastes lots of power. The cpuidle governour code Venki is working on is supposed to address this. There have been also earlier prototype patches by Adam Belay and Thomas Renninger. Venki (at least I think it was him) also told me about cpuidle and the menu governor on #powertop. Unfortunately, cpuidle seems to be gone from acpi-test (or I'm simply still too stupid for git/gitweb). I manually added the cpuidle and menu governor patches on top of my 2.6.22-rc1-hrt8 kernel, but that broke C-state duration accounting. On the bright side of things is power usage though, which is down to an incredible 13.9W in idle+ipw2200 :) Very encouraging, hopefully that can get into mainline soon, as power usage is an issue with laptops. Until then, it sounds as if dynticks is a negative power save for ipw2200 (and probably many other things). Dare we hope that this will allow use of USB on laptops without draining the battery? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Miguel Figueiredo wrote: Ray Lee wrote: On 5/20/07, Miguel Figueiredo [EMAIL PROTECTED] wrote: As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 on the same machine i found *very* odd those numbers you posted, so i tested myself those kernels to see the numbers I get instead of talking about the usage of kernel xpto feels like. I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2 inside Debian's GNOME environment. The hardware is an AMD Sempron64 3.0 GHz, 1 GB RAM, Nvidia 6800XT. Average and standard deviation from the gathered data: * 2.6.21: average = 11251.1; stdev = 0.172 * 2.6.21-cfs-v13: average = 11242.8; stdev = 0.033 * 2.6.21-ck2: average = 11257.8; stdev = 0.067 Keep in mind those numbers don't mean anything we all know glxgears is not a benchmark, their purpose is only to be used as comparison under the same conditions. Uhm, then why are you trying to use them to compare against Bill's numbers? You two have completely different hardware setups, and this is a test that is dependent upon hardware. Stated differently, this is a worthless comparison between your results and his as you are changing multiple variables at the same time. (At minimum: the scheduler, cpu, and video card.) The only thing i want to see it's the difference between the behaviour of the different schedulers on the same test setup. In my test -ck2 was a bit better, not 200% worse as in Bill's measurements. I don't compare absolute values on different test setups. Since I didn't test ck2 I'm sure your numbers are unique, I only tested the sd-0.48 patch set. I have the ck2 patch, just haven't tried it yet... But since there are a lot of other things in it, I'm unsure how it relates to what I was testing. One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time applet in the bar skipped some minutes (e.g. 16:23 - 16:25) several times. The data is available on: http://www.debianPT.org/~elmig/pool/kernel/20070520/ How did you get your data? I am affraid your data it's wrong, there's no such big difference between the schedulers... It doesn't look like you were running his glitch1 script which starts several in glxgears parallel. Were you, or were you just running one? No i'm not, i'm running only one instance of glxgears inside the GNOME's environment. If you test the same conditions as I did let me know your results. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Software raid0 will crash the file-system, when each disk is 5TB
Jeff Zheng wrote: Fix confirmed, filled the whole 11T hard disk, without crashing. I presume this would go into 2.6.22 Since it results in a full loss of data, I would hope it goes into 2.6.21.x -stable. Thanks again. Jeff -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Zheng Sent: Thursday, 17 May 2007 5:39 p.m. To: Neil Brown; [EMAIL PROTECTED]; Michal Piotrowski; Ingo Molnar; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED] Subject: RE: Software raid0 will crash the file-system, when each disk is 5TB Yeah, seems you've locked it down, :D. I've written 600GB of data now, and anything is still fine. Will let it run overnight, and fill the whole 11T. I'll post the result tomorrow Thanks a lot though. Jeff -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Thursday, 17 May 2007 5:31 p.m. To: [EMAIL PROTECTED]; Jeff Zheng; Michal Piotrowski; Ingo Molnar; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED] Subject: RE: Software raid0 will crash the file-system, when each disk is 5TB On Thursday May 17, [EMAIL PROTECTED] wrote: Uhm, I just noticed something. 'chunk' is unsigned long, and when it gets shifted up, we might lose bits. That could still happen with the 4*2.75T arrangement, but is much more likely in the 2*5.5T arrangement. Actually, it cannot be a problem with the 4*2.75T arrangement. chuck chunksize_bits will not exceed the size of the underlying device *in*kilobytes*. In that case that is 0xAE9EC800 which will git in a 32bit long. We don't double it to make sectors until after we add zone-dev_offset, which is sector_t and so 64bit arithmetic is used. So I'm quite certain this bug will cause exactly the problems experienced!! Jeff, can you try this patch? Don't bother about the other tests I mentioned, just try this one. Thanks. NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid0.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c --- .prev/drivers/md/raid0.c 2007-05-17 10:33:30.0 +1000 +++ ./drivers/md/raid0.c2007-05-17 15:02:15.0 +1000 @@ -475,7 +475,7 @@ static int raid0_make_request (request_q x = block chunksize_bits; tmp_dev = zone-dev[sector_div(x, zone-nb_dev)]; } - rsect = (((chunk chunksize_bits) + zone-dev_offset)1) + rsect = sector_t)chunk chunksize_bits) + +zone-dev_offset)1) + sect_in_chunk; bio-bi_bdev = tmp_dev-bdev; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SideWinder GameVoice driver
Tomas Carnecky wrote: Despite it's a Microsoft product, it's actually very nice and useful. A little pad with a few buttons and connectors for a headset. It's an USB device, but it doesn't represent itself as an input/HID device: HID device not claimed by input or hiddev I plugged it into a windows box and the USB protocol it uses looks very simple (see attachment): everytime I press one of the eight buttons, it sends one byte, a bitmap of the pressed buttons. What would be the best way to have this device appear in the system? Having a separate driver/device node? Or is it possible to have a small driver that would translate the gamevoice commands into evdev messages and have a new /dev/input/eventX device appear? I could write something like that myself, my C skills are good enough for that, I'd just need some advice how to use the kernel USB/evdev interfaces. From your description it sounds as though it would be useful in applications where voice connect was useful and visual wasn't, such as blind users and embedded applications where a USB pluggable interface might be useful in unusual situations. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v13
Anant Nitya wrote: On Thursday 17 May 2007 23:15:33 Ingo Molnar wrote: i'm pleased to announce release -v13 of the CFS scheduler patchset. The CFS patch against v2.6.22-rc1, v2.6.21.1 or v2.6.20.10 can be downloaded from the usual place: http://people.redhat.com/mingo/cfs-scheduler/ -v13 is a fixes-only release. It fixes a smaller accounting bug, so if you saw small lags during desktop use under certain workloads then please re-check that workload under -v13 too. It also tweaks SMP load-balancing a bit. (Note: the load-balancing artifact reported by Peter Williams is not a CFS-specific problem and he reproduced it in v2.6.21 too. Nevertheless -v13 should be less prone to such artifacts.) I know about no open CFS regression at the moment, so please re-test -v13 and if you still see any problem please re-report it. Thanks! Changes since -v12: - small tweak: made the fork flow of reniced tasks zero-sum - debugging update: /proc/PID/sched is now seqfile based and echoing 0 to it clears the maximum-tracking counters. - more debugging counters - small rounding fix to make the statistical average of rounding errors zero - scale both the runtime limit and the granularity on SMP too, and make it dependent on HZ - misc cleanups As usual, any sort of feedback, bugreport, fix and suggestion is more than welcome, Ingo - Hi Been testing this version of CFS from last an hour or so and still facing same lag problems while browsing sites with heavy JS and or flash usage. Mouse movement is pathetic and audio starts to skip. I haven't face this behavior with CFS till v11. 'm not seeing this, do have a site or two as examples? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12
Ingo Molnar wrote: * Bill Davidsen [EMAIL PROTECTED] wrote: I have posted the results of my initial testing, measuring IPC rates using various schedulers under no load, limited nice load, and heavy load at nice 0. http://www.tmr.com/~davidsen/ctxbench_testing.html nice! For this to become really representative though i'd like to ask for a real workload function to be used after the task gets the lock/message. The reason is that there is an inherent balancing conflict in this area: should the scheduler 'spread' tasks to other CPUs or not? In general, for all workloads that matter, the answer is almost always: 'yes, it should'. Added to the short to-do list. Note that this was originally simply a check to see which IPC works best (or at all) in an o/s. It has been useful for some other things, and an option for work will be forthcoming. But in your ctxbench results the work a task performs after doing IPC is not reflected (the benchmark goes about to do the next IPC - hence penalizing scheduling strategies that move tasks to other CPUs) - hence the bonus of a scheduler properly spreading out tasks is not measured fairly. A real-life IPC workload is rarely just about messaging around (a single task could do that itself) - some real workload function is used. You can see this effect yourself: do a taskset -p 01 $$ before running ctxbench and you'll see the numbers improve significantly on all of the schedulers. As a solution i'd suggest to add a workload function with a 100 or 200 usecs (or larger) cost (as a fixed-length loop or something like that) so that the 'spreading' effect/benefit gets measured fairly too. Can do. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12
William Lee Irwin III wrote: On Thu, May 17, 2007 at 07:26:38PM -0400, Bill Davidsen wrote: I have posted the results of my initial testing, measuring IPC rates using various schedulers under no load, limited nice load, and heavy load at nice 0. http://www.tmr.com/~davidsen/ctxbench_testing.html Kernel compiles are not how to stress these. The way to stress them is to have multiple simultaneous independent chains of communicators and deeper chains of communicators. Kernel compiles are little but background cpu/memory load for these sorts of tests. Just so. What is being quantified is the rate of slowdown due to external load. I would hope that each IPC method would slow by some similar factor. ... Something expected to have some sort of mutual interference depending on quality of implementation would be a better sort of competing load, one vastly more reflective of real workloads. For instance, another set of processes communicating using the same primitive. The original intent was purely to measure IPC speed under no load conditions, since fairness is in vogue I also attempted to look for surprising behavior. Corresponding values under equal load may be useful in relation to one another, but this isn't (and hopefully doesn't claim to be) a benchmark. It may or may not be useful viewed in that light, but that's not the target. Perhaps best of all would be a macrobenchmark utilizing a variety of the primitives under consideration. Unsurprisingly, major commercial databases do so for major benchmarks. And that's a very good point, either multiple copies or more forked processes might be useful, and I do intend to add threaded tests on the next upgrade, but perhaps a whole new code might be better for generating the load you suggest. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Miguel Figueiredo wrote: Bill Davidsen wrote: Miguel Figueiredo wrote: Ray Lee wrote: On 5/20/07, Miguel Figueiredo [EMAIL PROTECTED] wrote: As I tryied myself kernels 2.6.21, 2.6.21-cfs-v13, and 2.6.21-ck2 on the same machine i found *very* odd those numbers you posted, so i tested myself those kernels to see the numbers I get instead of talking about the usage of kernel xpto feels like. I did run glxgears with kernels 2.6.21, 2.6.21-cfs-v13 and 2.6.21-ck2 inside Debian's GNOME environment. The hardware is an AMD Sempron64 3.0 GHz, 1 GB RAM, Nvidia 6800XT. Average and standard deviation from the gathered data: * 2.6.21: average = 11251.1; stdev = 0.172 * 2.6.21-cfs-v13: average = 11242.8; stdev = 0.033 * 2.6.21-ck2: average = 11257.8; stdev = 0.067 Keep in mind those numbers don't mean anything we all know glxgears is not a benchmark, their purpose is only to be used as comparison under the same conditions. Uhm, then why are you trying to use them to compare against Bill's numbers? You two have completely different hardware setups, and this is a test that is dependent upon hardware. Stated differently, this is a worthless comparison between your results and his as you are changing multiple variables at the same time. (At minimum: the scheduler, cpu, and video card.) The only thing i want to see it's the difference between the behaviour of the different schedulers on the same test setup. In my test -ck2 was a bit better, not 200% worse as in Bill's measurements. I don't compare absolute values on different test setups. Since I didn't test ck2 I'm sure your numbers are unique, I only tested the sd-0.48 patch set. I have the ck2 patch, just haven't tried it yet... But since there are a lot of other things in it, I'm unsure how it relates to what I was testing. One odd thing i noticed, with 2.6.21-cfs-v13 the gnome's time applet in the bar skipped some minutes (e.g. 16:23 - 16:25) several times. The data is available on: http://www.debianPT.org/~elmig/pool/kernel/20070520/ How did you get your data? I am affraid your data it's wrong, there's no such big difference between the schedulers... It doesn't look like you were running his glitch1 script which starts several in glxgears parallel. Were you, or were you just running one? No i'm not, i'm running only one instance of glxgears inside the GNOME's environment. If you test the same conditions as I did let me know your results. Hi Bill, if i've understood correctly the script runs glxgears for 43 seconds and in that time generates random numbers in a random number of times (processes, fork and forget), is that it? No, I haven't made it clear. A known number (default four) of xterms are started, each of which calculates random numbers and prints them, using much CPU time and causing a lot of scrolling. At the same time glxgears is running, and the smoothness (or not) is observed manually. The script records raw data on the number of frames per second and the number of random numbers calculated by each shell. Since these are FAIR schedulers, the variance between the scripts, and between multiple samples from glxgears is of interest. To avoid startup effects the glxgears value from the first sample is reported separately and not included in the statistics. I looked at your results, and they are disturbing to say the least, it appears that using the ck2 scheduler glxgears stopped for all practical purposes. You don't have quite the latest glitch1, the new one runs longer and allows reruns to get several datasets, but the results still show very slow gears and a large difference between the work done by the four shells. That's not a good result, how did the system feel? You find the data, for 2.6.21-{cfs-v13, ck2} in http://www.debianpt.org/~elmig/pool/kernel/20070522/ Thank you, these results are very surprising, and I would not expect the system to be pleasing the use under load, based on this. Here's the funny part... Lets call: a) to random number of processes run while glxgears is running, gl_fairloops file It's really the relative work done by identical processes, hopefully they are all nearly the same, magnitude is interesting but related to responsiveness rather than fairness. b) to generated frames while running a burst of processes aka massive and uknown amount of operations in one process, gl_gears file Well, top or ps will give you a good idea of processing, but it tried to use all of one CPU if allowed. Again, similarity of samples reflects fairness and magnitude reflects work done. kernel2.6.21-cfs-v132.6.21-ck2 a)194464254669 b)54159124 Everyone seems to like ck2, this makes it look as if the video display would be really pretty unusable. While sd-0.48 does show an occasional video glitch when watching video under heavy load, it's annoying rather than unusable. Your subjective
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
I was unable to reproduce the numbers Miguel generated, comments below. The -ck2 patch seems to run nicely, although the memory repopulation from swap would be most useful on system which have a lot of memory pressure. Bill Davidsen wrote: Miguel Figueiredo wrote: Hi Bill, if i've understood correctly the script runs glxgears for 43 seconds and in that time generates random numbers in a random number of times (processes, fork and forget), is that it? No, I haven't made it clear. A known number (default four) of xterms are started, each of which calculates random numbers and prints them, using much CPU time and causing a lot of scrolling. At the same time glxgears is running, and the smoothness (or not) is observed manually. The script records raw data on the number of frames per second and the number of random numbers calculated by each shell. Since these are FAIR schedulers, the variance between the scripts, and between multiple samples from glxgears is of interest. To avoid startup effects the glxgears value from the first sample is reported separately and not included in the statistics. I looked at your results, and they are disturbing to say the least, it appears that using the ck2 scheduler glxgears stopped for all practical purposes. You don't have quite the latest glitch1, the new one runs longer and allows reruns to get several datasets, but the results still show very slow gears and a large difference between the work done by the four shells. That's not a good result, how did the system feel? You find the data, for 2.6.21-{cfs-v13, ck2} in http://www.debianpt.org/~elmig/pool/kernel/20070522/ Thank you, these results are very surprising, and I would not expect the system to be pleasing the use under load, based on this. Here's the funny part... Lets call: a) to random number of processes run while glxgears is running, gl_fairloops file It's really the relative work done by identical processes, hopefully they are all nearly the same, magnitude is interesting but related to responsiveness rather than fairness. b) to generated frames while running a burst of processes aka massive and uknown amount of operations in one process, gl_gears file Well, top or ps will give you a good idea of processing, but it tried to use all of one CPU if allowed. Again, similarity of samples reflects fairness and magnitude reflects work done. kernel2.6.21-cfs-v132.6.21-ck2 a)194464254669 b)54159124 Everyone seems to like ck2, this makes it look as if the video display would be really pretty unusable. While sd-0.48 does show an occasional video glitch when watching video under heavy load, it's annoying rather than unusable. I spent a few hours running the -ck2 patch, and I didn't see any numbers like yours. What I did see is going up with my previous results as http://www.tmr.com/~davidsen/sched_smooth_04.html. While there were still some minor pauses in glxgears with my test, performance was very similar to the sd-0.48 results. And I did try watching video with high load, without problems. Only when I run a lot of other screen-changing processes can I see pauses in the display. Your subjective impressions would be helpful, and you may find that the package in the www.tmr.com/~public/source is slightly easier to use and gives more stable results. The documentation suggests the way to take samples (the way I did it) but if you feel more or longer samples would help it is tunable. I added Con to the cc list, he may have comments or suggestions (against the current versions, please). Or he may feel that video combined with other heavy screen updating is unrealistic or not his chosen load. I'm told the load is similar to games which use threads and do lots of independent action, if that's a reference. I'll include the -ck2 patch in my testing on other hardware. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Con Kolivas wrote: On Wednesday 23 May 2007 10:28, Bill Davidsen wrote: kernel2.6.21-cfs-v132.6.21-ck2 a)194464254669 b)54159124 Everyone seems to like ck2, this makes it look as if the video display would be really pretty unusable. While sd-0.48 does show an occasional video glitch when watching video under heavy load, it's annoying rather than unusable. That's because the whole premise of your benchmark relies on a workload that yield()s itself to the eyeballs on most graphic card combinations when using glxgears. Your test remains a test of sched_yield in the presence of your workloads rather than anything else. If people like ck2 it's because in the real world with real workloads it is better, rather than on a yield() based benchmark. Repeatedly the reports are that 3d apps and games in normal usage under -ck are better than mainline and cfs. I have to admit that I call in the teen reserves to actually get good feedback on games, but I do watch a fair number of videos and under high load I find sd acceptable and cfs totally smooth. The next time my game expert comes to visit I'll get some subjective feedback. My use of glxgears was mainly intended to use something readily available, and which gave me the ability to make both subjective and objective evaluations. My -ck2 results certainly show no significant difference from sd-0.48, I suspect that on a machine with less memory the swap reload would be more beneficial. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Michael Gerdau wrote: That's because the whole premise of your benchmark relies on a workload that yield()s itself to the eyeballs on most graphic card combinations when using glxgears. Your test remains a test of sched_yield in the presence of your workloads rather than anything else. If people like ck2 it's because in the real world with real workloads it is better, rather than on a yield() based benchmark. Repeatedly the reports are that 3d apps and games in normal usage under -ck are better than mainline and cfs. While I can't comment on the technical/implementational details of Con's claim I definitely have to agree from a users POV. Any of the sd/ck/cfs schedulers are an improvement on the current mainline, and hopefully they will continue to cross pollinate and evolve. Perhaps by 2.6.23 a clear best will emerge, or Linus will change his mind and make sd and cfs be compile options at build time. All my recent CPU intensive benchmarks show that both ck/sd and cfs are very decent scheduler and IMO superior to mainline for all _my_ usecases. In particular playing supertux while otherwise fully utilizing both CPUs on a dualcore works without any glitch and better than on mainline for both sd and cfs. I did some kernel compile timing numbers as part of my work with ctxbench, and there is little to choose between the schedulers under load, although the special case for sched_yield makes some loads perform better with cfs. With large memory and fast disk, a kernel make becomes a CPU benchmark, there's virtually no iowait not filled with another process. For me the huge difference you have for sd to the others increases the likelyhood the glxgears benchmark does not measure scheduling of graphic but something else. The glitch1 script generates a number of CPU bound processes updating the screen independently, which stresses both graphics performance and scheduler fairness. And once again I note that it's a *characterization* rather than a benchmark. The ability of the scheduler to deliver the same resources to multiple identical processes, and to keep another CPU bound process (glxgears) getting the processor at regular intervals is more revealing than the frames per second or loops run. I would expect sd to be better at this, since it uses a deadline concept, but in practice the gears pause, and then move rapidly or appear to jump. My reading on this is that the process starves for some ms, then gets a lot of CPU because it is owed more. I think I see this in games, but not being a game player I can't tell from experience if it's artifact or the games suck. That's what my test rig, based on a 15 year old boy and several cans of high caffeine soda, is used for. ;-) Anyway, I'm still in the process of collecting data or more precisely until recently constantly refined what data to collect and how. I plan to provide new benchmark results on CPU intensive tasks in a couple of days. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Miguel Figueiredo wrote: Bill Davidsen wrote: I was unable to reproduce the numbers Miguel generated, comments below. The -ck2 patch seems to run nicely, although the memory repopulation from swap would be most useful on system which have a lot of memory pressure. I spent a few hours running the -ck2 patch, and I didn't see any numbers like yours. What I did see is going up with my previous results as http://www.tmr.com/~davidsen/sched_smooth_04.html. While there were still some minor pauses in glxgears with my test, performance was very similar to the sd-0.48 results. And I did try watching video with high load, without problems. Only when I run a lot of other screen-changing processes can I see pauses in the display. Your subjective impressions would be helpful, and you may find that the package in the www.tmr.com/~public/source is slightly easier to use and gives more stable results. The documentation suggests the way to take samples (the way I did it) but if you feel more or longer samples would help it is tunable. I added Con to the cc list, he may have comments or suggestions (against the current versions, please). Or he may feel that video combined with other heavy screen updating is unrealistic or not his chosen load. I'm told the load is similar to games which use threads and do lots of independent action, if that's a reference. I'll include the -ck2 patch in my testing on other hardware. Hi Bill, the numbers i posted before are repeatable on that machine. The numbers you posted in [EMAIL PROTECTED] are not the same... From my inbox I grab some very non-matching values: = Here's the funny part... Lets call: a) to random number of processes run while glxgears is running, gl_fairloops file b) to generated frames while running a burst of processes aka massive and uknown amount of operations in one process, gl_gears file kernel2.6.21-cfs-v132.6.21-ck2 a)194464254669 b)54159124 = The numbers in your glitch1.html file show a close correlation for cfs and -ck2, well within what I would expect. The stddev for the loops is larger for -cf2, but not out of line with what I see, and nothing like the numbers you originally sent me (which may have been testing something else, or from an old version before I made improvements, or ???). In any case thanks for testing. I did run, again, glitch1 on my laptop (T2500 CoreDuo, also Nvidia) please check: http://www.debianpt.org/~elmig/pool/kernel/20070523/ Thanks, those data seem as expected. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Race free attributes in sysfs
Greg KH wrote: On Wed, May 23, 2007 at 09:27:12AM -0400, Mark Lord wrote: Greg KH wrote: And yes, it only starts to look for things when it recieves an event, it does not scan sysfs at all. Does it look for only that one event, or does it scan at that point? udev will act on that event, and as I mentioned, not read anything from sysfs at all, unless a custom rule is in the rules file asking it to read a specific sysfs file in the tree. So no scanning happens unless specificically asked for. And as mentioned, udev can work just fine without sysfs enabled at all now, with the exception of some custom rules for some devices. I think what Mark is asking is about the case where udev gets an event, is told to look in sysfs, and while looking encounters a partially described device. Now that the this won't happen unless... cases, could someone cover this and state that it either can't happen because {reason} or that if it does the result will be {description}. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.1 on Fedora Core 6 breaks LVM/vgscan
Jonathan Woithe wrote: On 21 May 2007 I wrote: Attempting to compile a 2.6.21.1 kernel for use on a Fedora Core 6 box results in a panic at boot because the root filesystem can't be found. I have just compiled 2.6.22-rc2 with the configuration file given in my previous post and the resulting kernel successfully boots on the machine concerned. Whatever broke LVM for this machine in between 2.6.18 and 2.6.21.1 has now been fixed. I haven't had any problem booting with any of the kernels, but when I try to build a kernel with a Fedora config from /boot, it builds fine but doesn't boot after install. I started by building a very basic kernel for testing, and then started adding features to get everything I need. But just using the latest FC6 config file gets me a kernel which fails in just the way you mention. There is still a problem with the CDROM but I will follow up in another thread about that. Happy to say I don't see that, I'm using PATA optical devices, and USB on some machines, both work. I can't get scanning to work even after buying a supported scanner, so I may have to go back to Slackware and a 2.4 kernel on one machine, but boot and run does fine. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected
Jonathan Woithe wrote: A collegue of mine has an Intel mainboard with the i865 chipset onboard (DQ965). All kernels up to and including 2.6.22-rc2 do not detect the IDE CDROM/DVDROM when booting. The SATA hard drive is found without any problems. Let me belatedly ask if the device shows up in POST at cold boot. It may need some BIOS setting to be visible. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] Unify dma blacklist in ide-dma.c and libata-core.c
Junio C Hamano wrote: This introduces a shared header file that defines the entries for two dma blacklists in ide-dma.c and libata-core.c to make it easier to keep them in sync. Why wasn't this done this way in the first place? Out of tree development for libata or something? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6.21.1] resume doesn't run suspended kernel?
I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? Is the kernel not saved such that any kernel can be rolled back into memory and run? Actually, the answer is HELL NO, so I really ask if this is the intended mode of operation, that only the default boot kernel will restore. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Documentation on /sys/power/resume
Not in the ABI doc, is there and doc at all, and if not could someone who knows where it's used might give me a hint, as a quick look didn't bring enlightenment. Or is it a future hook which doesn't work yet? -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation on /sys/power/resume
Rafael J. Wysocki wrote: Hi, On Sunday, 27 May 2007 01:51, Bill Davidsen wrote: Not in the ABI doc, is there and doc at all, and if not could someone who knows where it's used might give me a hint, as a quick look didn't bring enlightenment. Or is it a future hook which doesn't work yet? That's something that in theory may allow you to resume the system from and initrd script. Basically, you write your resume device's major and minor numbers into it as the MAJ:MIN string (eg. 8:3 for /dev/sda3 on my box) and the kernel will try to read the image from this device and restore it. It only works with partitions and the use of it us discouraged, so it's deliberately undocumented. Thanks, that's just different enough from what little info I had to make what I have not work. I'm looking at resume from a non-swap location. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
David Greaves wrote: Bill Davidsen wrote: Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. Booting the machine isn't the kernel's job, it's the bootloader's job. And resume is not the the bootloader's job... if memory and registers are restored, and a jump is made to the resume address, a resumed system should result. clearly some part of that didn't happen :-( I can bypass this by making the debug kernel the default, but WHY? Is the kernel not saved such that any kernel can be rolled back into memory and run? Actually, the answer is HELL NO, so I really ask if this is the intended mode of operation, that only the default boot kernel will restore. Yes. It is very dangerous to attempt a resume with a different kernel than the one that has gone to sleep. Different kernels may be compiled with different options that affect where or how in-memory structures are saved. If the mainline resume is depending on that no wonder resume is so fragile. User action can change order of module loads, kmalloc calls move allocated structures, etc. Counting on anything to be locked in place seems naive. So you suspend with a kernel which holds your filesystem data/cache/inodes at 0x1234000 and restore with a kernel that expects to see your filesystem data at 0x1235000. Ouch. I would hope that the data used by the resumed kernel would be the same data that was suspended, not something from another kernel. Personally I think the kernel suspend should write a signature - similar to a hash of the bzImage - into the suspend image so it won't even attempt a resume if there's a mismatch. (Yes, I made this mistake once whilst playing with suspend). Someone else dropped a note saying the FC kernels use suspend2, and work fine. I'm off to look at the FC source and see if that's the case. That would explain why suspend works and resume doesn't, hopefully there's a 2.6.21 suspend2 patch in that case. Thanks for the feedback in any case. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation on /sys/power/resume
Rafael J. Wysocki wrote: On Sunday, 27 May 2007 14:53, Bill Davidsen wrote: Rafael J. Wysocki wrote: Hi, On Sunday, 27 May 2007 01:51, Bill Davidsen wrote: Not in the ABI doc, is there and doc at all, and if not could someone who knows where it's used might give me a hint, as a quick look didn't bring enlightenment. Or is it a future hook which doesn't work yet? That's something that in theory may allow you to resume the system from and initrd script. Basically, you write your resume device's major and minor numbers into it as the MAJ:MIN string (eg. 8:3 for /dev/sda3 on my box) and the kernel will try to read the image from this device and restore it. It only works with partitions and the use of it us discouraged, so it's deliberately undocumented. Thanks, that's just different enough from what little info I had to make what I have not work. I'm looking at resume from a non-swap location. Only suspend2 can do this right now. The built-in swsusp can resume from a swap file as long as it's not located on LVM. Sounds like sispend2 is still needed, I haven't needed a suspending kernel in a few years, and I was hoping that with suspend working in mainline that resume would have been implemented. Sounds as if that's not the case, my swap is RAID1, I was hoping to resume from one of the mirrors, since they are based on a partition. No joy wit or without /sys/power/resume, so I'll look further. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
Pavel Machek wrote: On Sat 2007-05-26 18:42:37, Bill Davidsen wrote: I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? HELL YES :-). We do not save kernel code into image. That's clear, I'll have to use xen or kvm or similar which restores the system as suspended. Thanks for the clarification of the limitations. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
Bill Davidsen wrote: Pavel Machek wrote: On Sat 2007-05-26 18:42:37, Bill Davidsen wrote: I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? HELL YES :-). We do not save kernel code into image. That's clear, I'll have to use xen or kvm or similar which restores the system as suspended. Thanks for the clarification of the limitations. Sorry, I wrote that late at night and quickly. I should have said design decision rather than limitation, For systems which don't do multiple kernels it's not an issue. I certainly would not have made the same decision, but I didn't write the code. It seems more robust to save everything than to try to identify what has and hasn't changed in a modular kernel. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Neil Brown wrote: We can think of there being three types of devices: 1/ SAFE. With a SAFE device, there is no write-behind cache, or if there is it is non-volatile. Once a write completes it is completely safe. Such a device does not require barriers or -issue_flush_fn, and can respond to them either by a no-op or with -EOPNOTSUPP (the former is preferred). 2/ FLUSHABLE. A FLUSHABLE device may have a volatile write-behind cache. This cache can be flushed with a call to blkdev_issue_flush. It may not support barrier requests. 3/ BARRIER. A BARRIER device supports both blkdev_issue_flush and BIO_RW_BARRIER. Either may be used to synchronise any write-behind cache to non-volatile storage (media). Handling of SAFE and FLUSHABLE devices is essentially the same and can work on a BARRIER device. The BARRIER device has the option of more efficient handling. There are two things I'm not sure you covered. First, disks which don't support flush but do have a cache dirty status bit you can poll at times like shutdown. If there are no drivers which support these, it can be ignored. Second, NAS (including nbd?). Is there enough information to handle this really rigt? Otherwise looks good as a statement of issues. It seems to me that the filesystem should be able to pass the barrier request to the block layer and have it taken care of, rather than have code in each f/s to cope with odd behavior. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
What causes iowait other than waiting for i/o?
I recently noted that my system was spending a lot of time in i/o wait when doing some tasks which I thought didn't involve i/o, as noted by the lack of disk light activity most of the time. I thought of network, certainly the NIC had no activity for this job. So I set up a little loop to capture all disk i/o and network activity (including loopback). That was no obvious help, and the program doesn't use pipes. At this point I'm really curious, does someone have a good clue? Note: I don't think this is a bug or performance issue, unless the kernel is doing something and charging time to iowait instead of system I don't see anything to fix, but I would like to understand. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux v2.6.22-rc3
Jeff Garzik wrote: Several people have reported LITE-ON LTR-48246S detection failed because SETXFER fails. It seems the device raises IRQ too early after SETXFER. This is controller independent. The same problem has been reported for different controllers. So, now we have pata_via where the controller raises IRQ before it's ready after SETXFER and a device which does similar thing. This patch makes libata always execute SETXFER via polling. As this only happens during EH, performance impact is nil. Setting ATA_TFLAG_POLLING is also moved from issue hot path to ata_dev_set_xfermode() - the only place where SETXFER can be issued. Note that ATA_TFLAG_POLLING applies only to drivers which implement SFF TF interface and use libata HSM. More advanced controllers ignore the flag. This doesn't matter for this fix as SFF TF controllers are the problematic ones. Not only kills two birds with a single store, but will avoid having to re-solve the problem at sometime in the future. That's good software! -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: stuff ready to be deleted?
Oliver Pinter wrote: + open sound system Why? OSS supports some hardware ALSA doesn't, it's maintained by an independent commercial company (4Front) so maintenance isn't an issue, and it's portable to many other operating systems. Functionality and low TCO, what could be better? New Linux code, including x86_64 3D drivers, was released in April, so there's no lack of new features and activity. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What causes iowait other than waiting for i/o?
Satyam Sharma wrote: Hi Bill, On 5/29/07, Bill Davidsen [EMAIL PROTECTED] wrote: I recently noted that my system was spending a lot of time in i/o wait when doing some tasks which I thought didn't involve i/o, as noted by the lack of disk light activity most of the time. I thought of network, certainly the NIC had no activity for this job. So I set up a little loop to capture all disk i/o and network activity (including loopback). That was no obvious help, and the program doesn't use pipes. At this point I'm really curious, does someone have a good clue? Note: I don't think this is a bug or performance issue, unless the kernel is doing something and charging time to iowait instead of system I don't see anything to fix, but I would like to understand. What tool / kernel instrumentation / mechanism are you using to determine that some task(s) are indeed blocked waiting for i/o? Perhaps some userspace process accounting tools could be broken in the sense that they generalize all uninterruptible sleep as waiting for i/o ... I wouldn't expect /proc/stat and similar to be broken in that way, but If no one has a better idea I guess I will assume there's a check needed of where time is added to iowait. I was hoping to avoid a full kernel search. Never thought of /proc data as a user space tool, but I guess. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What causes iowait other than waiting for i/o?
Rik van Riel wrote: Bill Davidsen wrote: I recently noted that my system was spending a lot of time in i/o wait when doing some tasks which I thought didn't involve i/o, as noted by the lack of disk light activity most of the time. I thought of network, certainly the NIC had no activity for this job. So I set up a little loop to capture all disk i/o and network activity (including loopback). That was no obvious help, and the program doesn't use pipes. At this point I'm really curious, does someone have a good clue? Note: I don't think this is a bug or performance issue, unless the kernel is doing something and charging time to iowait instead of system I don't see anything to fix, but I would like to understand. All filesystem IO and direct disk IO can cause iowait. This includes NFS activity. If I didn't note it before, I'm read the the data from /proc, cpustats, net/dev, and diskstats. I assume that all i/o would show up in one of those places. NFS isn't involved, although this machine is a fileserver as a side job the modules weren't even loaded during testing. A puzzlement for future consideration. If I get a chance later this week I'll make a pretty graphic of all the stuff going on when the iowait spiked, ctx rate, inq rate, hell the last time I even grabbed the CPU temp to see if it told me anything (didn't, thermal throttling NOT). Thanks for the feedback, I think that lets out the obvious stuff. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.1 - 97% wait time on IDE operations
Tommy Vercetti wrote: Hi folks, I was trying to get answer to my question around, but no one knows. I do have DMA turned on, etc, yet - on extensive harddrive operations wait time is 90+% , which means that machine is waiting, rather than doing something meanwhile. (I guess). Can someone describe to me , in more detail why is that happening, and what steps should I consider to avoid it ? I couldn't find any answers that would have help me on net. thanks. From later posts I suspect that your disk performance just sucks, but do use one of the monitoring tools and follow the actual disk work, in term of seeks and transfer rate. I've been looking at high iowait while disk and network are (nearly) idle, but it sounds as if you just have a bad match of CPU and disk speed. Can you borrow a USB drive to use for some testing? USB 2 needed to be faster than your old 4200 rpm drive. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Bill Davidsen [EMAIL PROTECTED] wrote: I've taken mainline git tree (freshly integrated CFS!) out for a multimedia spin. I tested watching movies and listenign to music in the presence of various sleep/burn loads, pure burn loads, and mixed loads. All was peachy here.. I saw no frame drops or sound skips or other artifacts under any load where the processor could possibly meet demand. I would agree with preliminary testing, save that if you get a lot of processes updating the screen at once, there seems to be a notable case of processes getting no CPU for 100-300ms, followed by a lot of CPU. I see this clearly with the glitch1 test with four scrolling xterms and glxgears, but also watching videos with little busy processes on the screen. The only version where I never see this in test or with real use is cfs-v13. just as a test, does this go away if you: renice -20 pidof `Xorg` i.e. is this connected to the way X is scheduled? Doing this slows down the display rates, but doesn't significantly help the smoothness of the gears. Another thing to check would be whether it goes away if you set the granularity to some really finegrained value: echo 0 /proc/sys/kernel/sched_wakeup_granularity_ns echo 50 /proc/sys/kernel/sched_granularity_ns this really pushes things - but it tests the theory whether this is related to granularity. I didn't test this with standard Xorg priority, I should go back and try that. But it didn't really make much difference. The gears and scrolling xterms ran slower with Xorg at -20 with any sched settings. I'll do that as soon as a build finishes and I can reboot. I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors with FC6. Automount starts taking 30% of CPU (unused at the moment), the sensors applet doesn't work, etc. I hope over the weekend I can get bug reports out on all this, but there are lots of non-critical oddities. Ingo -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Chuck Ebbert [EMAIL PROTECTED] wrote: On 07/13/2007 05:19 PM, Bill Davidsen wrote: I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors with FC6. Automount starts taking 30% of CPU (unused at the moment) Can you confirm whether CFS is involved, i.e. does it spin like that even without the CFS patch applied? I will try that, but not until Tuesday night. I've been here too long today and have an out-of-state meeting tomorrow. I'll take a look after dinner. Note that the latest 2.6.21 with cfs-v19 doesn't have any problems of any nature, other than suspend to RAM not working, and I may have the config wrong. Runs really well otherwise, but I'll test drive 2.6.22 w/o the patch. hmmm could you take out the kernel/time.c (sys_time()) changes from the CFS patch, does that solve the automount issue? If yes, could someone take a look at automount and check whether it makes use of time(2) and whether it combines it with finer grained time sources? Will do. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Ian Kent [EMAIL PROTECTED] wrote: ah! It passes in a low-res time source into a high-res time interface (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to time(NULL) + 2, or change it to: gettimeofday(wait, NULL); wait.tv_sec++; OK, I'm with you, hi-res timer. But even so, how is the time in the past after adding a second. Is it because I'm not setting tv_nsec when it's close to a second boundary, and hence your recommendation above? yeah, it looks a bit suspicious: you create a +1 second timeout out of a 1 second resolution timesource. I dont yet understand the failure mode though that results in that looping and in the 30% CPU time use - do you understand it perhaps? (and automount is still functional while this is happening, correct?) Can't say, I have automount running because I get it by default, but I have nothing using at on my test machine. Why is it looping so fast when there are no mount points defined? If the config changes there's no requirement to notice right away, is there? -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Ian Kent [EMAIL PROTECTED] wrote: ah! It passes in a low-res time source into a high-res time interface (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to time(NULL) + 2, or change it to: gettimeofday(wait, NULL); wait.tv_sec++; does this solve the spinning? Yes, adding in the offset within the current second appears to resolve the issue. Thanks Ingo. i'm wondering how widespread this is. If automount is the only app doing this then _maybe_ we could get away with it by changing automount? I don't think the change is unreasonable since I wasn't using an accurate time in the condition wait, so that's a coding mistake on my part which I will fix. thanks Ian for taking care of this and for fixing it! Linus, Thomas, what do you think, should we keep the time.c change? Automount is one app affected so far, and it's a borderline case: the increased (30%) CPU usage is annoying, but it does not prevent the system from working per se, and an upgrade to a fixed/enhanced automount version resolves it. The temptation of using a really (and trivially) scalable low-resolution time-source (which is _easily_ vsyscall-able, on any platform) for DBMS use is really large, to me at least. Should i perhaps add a boot/config option that enables/disables this optimization, to allow distros finer grained control about this? And we've also got to wait whether there's any other app affected. Allow it to be selected by the features so that admins can evaluate the implications without a reboot? That would be a convenient interface if you could provide it. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Linus Torvalds wrote: On Tue, 17 Jul 2007, Ingo Molnar wrote: * Ian Kent [EMAIL PROTECTED] wrote: In several places I have code similar to: wait.tv_sec = time(NULL) + 1; wait.tv_nsec = 0; Ok, that definitely should work. Does the patch below help? Spectacularly no! With this patch the glitch1 script with multiple scrolling windows has all xterms and glxgears stop totally dead for ~200ms once per second. I didn't properly test anything else after that. Since the automount issue doesn't seem to start until something kicks it off, I didn't see it but that doesn't mean it's fixed. ah! It passes in a low-res time source into a high-res time interface (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to time(NULL) + 2, or change it to: gettimeofday(wait, NULL); wait.tv_sec++; This is wrong. It's wrong for two reasons: - it really shouldn't be needed. I don't think time() has to be *exactly* in sync, but I don't think it can be off by a third of a second or whatever (as the 30% CPU load would seem to imply) - gettimeofday works on a timeval, pthread_cond_timedwait() works on a timespec. So if it actually makes a difference, it makes a difference for the *wrong* reason: the time is still totally nonsensical in the tv_nsec field (because it actually got filled in with msecs!), but now the tv_sec field is in sync, so it hides the bug. Anyway, hopefully the patch below might help. But we probably should make this whole thing a much more generic routine (ie we have our internal getnstimeofday() that still is missing the second-overflow logic, and that is quite possibly the one that triggers the 30% off behaviour). Hope that info helps. Ingo, I'd suggest: - ger rid of timespec_add_ns(), or at least make it return a return value for when it overflows. - make all the people who overflow into tv_sec call a fix_up_seconds() thing that does the xtime overflow handling. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Bill Davidsen [EMAIL PROTECTED] wrote: Does the patch below help? Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as I recreate it. Spectacularly no! With this patch the glitch1 script with multiple scrolling windows has all xterms and glxgears stop totally dead for ~200ms once per second. I didn't properly test anything else after that. Bill, could you try the patch below - does it fix the automount problem, without introducing new problems? Ingo --- Subject: time: introduce xtime_seconds From: Ingo Molnar [EMAIL PROTECTED] introduce the xtime_seconds optimization. This is a read-mostly low-resolution time source available to sys_time() and kernel-internal use. This variable is kept uptodate atomically, and it's monotically increased, every time some time interface constructs an xtime-alike time result that overflows the seconds value. (it's updated from the timer interrupt as well) this way high-resolution time results update their seconds component at the same time sys_time() does it: 118485883289000 11848588320 118485883292000 11848588320 118485883296000 11848588320 118485883299000 11848588320 118485883303000 11848588330 118485883306000 11848588330 118485883309000 11848588330 [ these are nsec time results from alternating calls to sys_time() and sys_gettimeofday(), recorded at the seconds boundary. ] instead of the previous (non-coherent) behavior: 118484895087000 11848489500 11848489509 11848489500 118484895094000 11848489500 118484895097000 11848489500 118484895101000 11848489500 118484895105000 11848489500 118484895108000 11848489500 118484895111000 11848489500 118484895115000 Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- include/linux/time.h | 13 +++-- kernel/time.c | 25 ++--- kernel/time/timekeeping.c | 28 3 files changed, 41 insertions(+), 25 deletions(-) Index: linux/include/linux/time.h === --- linux.orig/include/linux/time.h +++ linux/include/linux/time.h @@ -91,19 +91,28 @@ static inline struct timespec timespec_s extern struct timespec xtime; extern struct timespec wall_to_monotonic; extern seqlock_t xtime_lock __attribute__((weak)); +extern unsigned long xtime_seconds; extern unsigned long read_persistent_clock(void); void timekeeping_init(void); +extern void __update_xtime_seconds(unsigned long new_xtime_seconds); + +static inline void update_xtime_seconds(unsigned long new_xtime_seconds) +{ + if (unlikely((long)(new_xtime_seconds - xtime_seconds) 0)) + __update_xtime_seconds(new_xtime_seconds); +} + static inline unsigned long get_seconds(void) { - return xtime.tv_sec; + return xtime_seconds; } struct timespec current_kernel_time(void); #define CURRENT_TIME (current_kernel_time()) -#define CURRENT_TIME_SEC ((struct timespec) { xtime.tv_sec, 0 }) +#define CURRENT_TIME_SEC ((struct timespec) { xtime_seconds, 0 }) extern void do_gettimeofday(struct timeval *tv); extern int do_settimeofday(struct timespec *tv); Index: linux/kernel/time.c === --- linux.orig/kernel/time.c +++ linux/kernel/time.c @@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz); asmlinkage long sys_time(time_t __user * tloc) { /* -* We read xtime.tv_sec atomically - it's updated -* atomically by update_wall_time(), so no need to -* even read-lock the xtime seqlock: +* We read xtime_seconds atomically - it's updated +* atomically by update_xtime_seconds(): */ - time_t i = xtime.tv_sec; + time_t i = xtime_seconds; smp_rmb(); /* sys_time() results are coherent */ @@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti do { seq = read_seqbegin(xtime_lock); - + now = xtime; } while (read_seqretry(xtime_lock, seq)); - return now; + return now; } EXPORT_SYMBOL(current_kernel_time); @@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv tv-tv_sec = sec; tv-tv_usec = usec; - /* -* Make sure xtime.tv_sec [returned by sys_time()] always -* follows the gettimeofday() result precisely. This -* condition is extremely unlikely, it can hit at most -* once per second: -*/ - if (unlikely(xtime.tv_sec != tv-tv_sec)) { - unsigned long flags; - - write_seqlock_irqsave(xtime_lock, flags); - update_wall_time(); - write_sequnlock_irqrestore(xtime_lock, flags
Re: [patch] CFS scheduler, -v19
Bill Davidsen wrote: Ingo Molnar wrote: * Bill Davidsen [EMAIL PROTECTED] wrote: Does the patch below help? Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as I recreate it. Applied to 2.6.22-git9, building now. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v19
Ingo Molnar wrote: * Bill Davidsen [EMAIL PROTECTED] wrote: Does the patch below help? Spectacularly no! With this patch the glitch1 script with multiple scrolling windows has all xterms and glxgears stop totally dead for ~200ms once per second. I didn't properly test anything else after that. Bill, could you try the patch below - does it fix the automount problem, without introducing new problems? Okay, as noted off-list, after I exported the xtime_seconds it now builds and works. However, there are a *lot* of section mismatches which are not reassuring. Boots, runs, glitch1 test runs reasonably smoothly. automount has not used significant CPU yet, but I don't know what triggers it, the bad behavior did not happen immediately without the patch. However, it looks very hopeful. Warnings attached to save you the trouble... -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot Script started on Thu 19 Jul 2007 05:29:08 PM EDT Common profile 1.13 lastmod 2006-01-04 22:43:25-05 No common directory available Session time 17:29:08 on 07/19/07 posidon:davidsen time nice -10 make -j4 -s; sleep 2; exit CHK include/linux/version.h CHK include/linux/utsrelease.h CHK include/linux/compile.h CHK include/linux/compile.h UPD include/linux/compile.h CHK include/linux/version.h Building modules, stage 2. WARNING: vmlinux(.text+0xc1001183): Section mismatch: reference to .init.text:start_kernel (between 'is386' and 'check_x87') WARNING: vmlinux(.text+0xc1213fb4): Section mismatch: reference to .init.text: (between 'rest_init' and 'kthreadd_setup') WARNING: vmlinux(.text+0xc1218786): Section mismatch: reference to .init.text: (between 'iret_exc' and '_etext') WARNING: vmlinux(.text+0xc1218792): Section mismatch: reference to .init.text: (between 'iret_exc' and '_etext') WARNING: vmlinux(.text+0xc121879e): Section mismatch: reference to .init.text: (between 'iret_exc' and '_etext') WARNING: vmlinux(.text+0xc12187aa): Section mismatch: reference to .init.text: (between 'iret_exc' and '_etext') WARNING: vmlinux(.text+0xc1214071): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'alloc_node_mem_map' and 'zone_wait_table_init') WARNING: vmlinux(.text+0xc1214117): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'zone_wait_table_init' and 'schedule') WARNING: vmlinux(.text+0xc10fbaae): Section mismatch: reference to .init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta') WARNING: vmlinux(.text+0xc1218eda): Section mismatch: reference to .init.text: (between 'iret_exc' and '_etext') Root device is (253, 0) Setup is 11240 bytes (padded to 11264 bytes). System is 1915 kB Kernel: arch/i386/boot/bzImage is ready (#3) real4m11.024s user2m5.121s sys 0m30.952s exit Script done on Thu 19 Jul 2007 05:33:35 PM EDT
[RFC] what should 'uptime' be on suspend?
I just found a machine which will resume after suspend to memory, using the mainline kernel (no suspend2 patch). On resume I was looking at the uptime output, and it was about six minutes, FAR longer than the time since resume. So the topic for discussion is, should the uptime be - time sine the original boot - total uptime since first boot, not counting the time suspended - time since resume - some other time around six minutes Any of the first three could be useful and right for some casesm thus discussion invited. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Where did KVM go in 2.6.22-git9?
I just built a 2.6.22-git9 kernel, and when I run oldconfig it (a) sets the processor type to pentium-pro, and (b) the KVM stuff simply isn't in the config. Before I spend a lot of time on this, was it disabled temporarily for some reason, or is it a known bug, or ??? Processor is a Core2 E6600, and the starting config has KVM. Strong suggestion: put KVM in processor type and options, and if the CPU type selected supports the feature, let the builder turn it on in one place and have Kconfig turn on whatever voodoo is needed to allow it, rather than have people trying to find out what depends have changed with each release. I see KVm depends on X86_CMPXCHG64 which simply doesn't seem to be defined directly anywhere. Going back to 2.6.21.6 until whatever changed is at least documented. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mkfs.ext2 triggerd RAM corruption
Jan-Benedict Glaw wrote: On Fri, 2007-05-04 16:59:51 +0200, Bernd Schubert [EMAIL PROTECTED] wrote: To see whats going on, I copied the entire / (so the initrd) into a tmpfs root, chrooted into it, also bind mounted the main / into this chroot and compared several times /bin of chroot/bin and the bind-mounted /bin while the mkfs.ext2 command was running. beo-05:/# diff -r /bin /oldroot/bin/ beo-05:/# diff -r /bin /oldroot/bin/ beo-05:/# diff -r /bin /oldroot/bin/ Binary files /bin/sleep and /oldroot/bin/sleep differ beo-05:/# diff -r /bin /oldroot/bin/ Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ Binary files /bin/cat and /oldroot/bin/cat differ ... Also tested different schedulers, at least happens with deadline and anticipatory. The corruption does NOT happen on running the mkfs command on /dev/sda1, but happens with sda2, sda3 and sda3. Also doesn't happen with extended partitions of sda1. Is sda2 the largest filesystem out of sda2, sda3 (and the logical partitions within the extended sda1, if these get mkfs'ed, too)? I'm not too sure that this is a kernel bug, but probably a bad RAM chip. Did you run memtest86 for a while? ...and can you reproduce this problem on different machines? MfG, JBG Was this missing from your copy of the original post, or did you delete it without reading? Note last sentence... Summary: The system ramdisk (initrd) gets corrupted while running mkfs.ext2 on a local sata disk partition. Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (2.6.16 doesn't run on any of the systems I can do tests with). Please note: I could reproduce this on serveral systems, all of them use ECC memory and the memory of most of them the memory is monitored using EDAC. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Preempt of BKL and with tickless systems
I think I have a reasonable grip on the voluntary and full preempt models, can anyone give me any wisdom on the preempt of the BKL? I know what it does, the question is where it might make a difference under normal loads. Define normal as servers and desktops. I've been running some sched tests, and it seems to make little difference how that's set. Before I run a bunch of extra tests, I thought I'd ask. New topic: I have found preempt, both voluntary and forced, seems to help more with response as the HZ gets smaller. How does that play with tickless operation, or are you-all waiting for me to run my numbers with all values of HZ and not, and tell the world what I found? ;-) -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Some CFS and sd04[68] results for kernel build
I have a results page here, I will repeat tests with tuning if asked. http://www.tmr.com/~davidsen/Kernel%20build%20time%20results.html -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Preempt of BKL and with tickless systems
Lee Revell wrote: On 5/8/07, Bill Davidsen [EMAIL PROTECTED] wrote: I think I have a reasonable grip on the voluntary and full preempt models, can anyone give me any wisdom on the preempt of the BKL? I know what it does, the question is where it might make a difference under normal loads. Define normal as servers and desktops. This was introduced by Ingo to solve a real problem that I found, where some codepath would hold the BKL for long enough to introduce excessive scheduling latencies - search list archive for details. But I don't remember the code path (scrolling the FB console? VT switching? reiser3? misc. ioctl()s?). Basically, taking the BKL disabled preemption which caused long latencies. It's certainly possible that whatever issue led to this was solved in another way since. Anything is possible. I feel that using voluntary + bkl is probably good for most servers, forced preempt for desktop, although it really doesn't seem to do much beyond voluntary. Thanks for the clarification. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] volatile considered harmful document
Jonathan Corbet wrote: +There are still a few rare situations where volatile makes sense in the +kernel: + + - The above-mentioned accessor functions might use volatile on +architectures where direct I/O memory access does work. Essentially, +each accessor call becomes a little critical section on its own and +ensures that the access happens as expected by the programmer. + + - Inline assembly code which changes memory, but which has no other +visible side effects, risks being deleted by GCC. Adding the volatile +keyword to asm statements will prevent this removal. + + - The jiffies variable is special in that it can have a different value +every time it is referenced, but it can be read without any special +locking. So jiffies can be volatile, but the addition of other +variables of this type is frowned upon. Jiffies is considered to be a +stupid legacy issue in this regard. It would seem that any variable which is (a) subject to change by other threads or hardware, and (b) the value of which is going to be used without writing the variable, would be a valid use for volatile. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] volatile considered harmful document
Krzysztof Halasa wrote: Robert Hancock [EMAIL PROTECTED] writes: You don't need volatile in that case, rmb() can be used. rmb() invalidates all compiler assumptions, it can be much slower. Yes, why would you use rmb() when a read of a volatile generates optimal code? -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3 years since last 2.2 release, why still on kernel.org main page?
Rob Landley wrote: Out of curiosity, since 2.2 hasn't had a release in 3 years, and the last prepatch was 2 years ago, why is its' status still on the kernel.org main page? Not exactly something people are checking the status of on a daily basis... Just wondering... I assume because it's a handy place to the the current 2.2 kernel, which some people run for reasons which are valid to them. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swsusp: Use platform mode by default
Rafael J. Wysocki wrote: On Friday, 11 May 2007 18:30, Linus Torvalds wrote: On Fri, 11 May 2007, Rafael J. Wysocki wrote: We're working on fixing the breakage, but currently it's difficult, because none of my testboxes has problems with the 'platform' hibernation and I cannot reproduce the reported issues. The rule for anything ACPI-related has been: no regressions. It doesn't matter if something fixes 10 boxes, if it breaks a single one, it's going to get reverted. [Well, I think I should stop explaining decisions that weren't mine. Yet, I feel responsible for patches that I sign-off.] Just to clarify, the change in question isn't new. It was introduced by the commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's request and with Pavel's acceptance. We had much too much of the two steps forward, one step back dance with ACPI a few years ago, which is the reason that rule got installed (and which is why it's ACPI-only: in some other subsystems we accept the fact that sometimes we don't know how to fix some hardware issue, but the new situation is at least better than the old one). I agree that it can be aggravating to know that you can fix a problem for some people, but then being limited by the fact that it breaks for others. But beign able to *rely* on something that used to work is just too important, and with ACPI, you can never make a good judgement of which way works better (since it really just depends on some random firmware issues that we have zero visibility into). Also, quite often, it may *seem* like something fixes more boxes than it breaks, but it's because people report *breakage* only, and then a few months later it turns out that it's exactly the other way around: now it's a hundred people who report breakage with the *new* code, and the reason people thought it fixed more than it broke was that the people for whom the old code worked fine obviously never reported it! So this is why a single regression is considered more important than ten fixes - because a single regressionr report tends to actually be just the first indication of a lot of people who simply haven't tested the new code yet! People for whom the old code is broken are more likely to test new things. So I'd just suggest changing the default back to PM_DISK_SHUTDOWN (but leave the pm_ops-enter testing in place - ie not reverting the other commits in the series). The series actually preserves the 2.6.20/21 behavior. By defaulting back to PM_DISK_SHUTDOWN, we'll cause some users for whom 2.6.20 and 2.6.21 work to report this change as a regression, so please let me avoid making this decision (I'm not the maintainer of the hibernation code after all). The problem is that we don't know about regressions until somebody reports them and if that happens after two affected kernel releases, what should we do? I think that one of the reasons people (guilty) don't report problems with suspend and hibernate is that it's been a problem on and off and when it breaks people don't bother to chase it, they just don't use it unless it's critical, or they install suspend2. I only suggest that if 'platform' is more correct use that, don't change it again. Then fix platform. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: upgrade linux kernel
[EMAIL PROTECTED] wrote: I am upgrading kernel from 2.4.20-8(default in RH9) to 2.6.xx. when I do make command it gives some output and finally get error saying that, BFD: Warning: Writing section '.bss' to huge ( ie negative) file offset 0xc0244000. Objcopy: arch/i386/boot/compressed/vmlinux.bin: file truncated make[2]:***[ arch/i386/boot/compressed/vmlinux.bin] Error 1 make[1]: ***[ arch/i386/boot/compressed/vmlinux] Error 2 make: *** [bzImage] Error 2 You can upgrade a bunch of system utilities, but I'm not sure it's worth doing. The system I'm on is RH8.0 patched to run later kernels when my development machine went down. I got 2.5.52 to boot, last stable was 2.5.47-ac6 and I gave up. Unless you have some major need to upgrade the kernel without the distribution, grab the latest RH kernel, or use the latest 2.4 kernel available. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: undeprecate raw driver.
Robert P. J. Day wrote: On Sun, 13 May 2007, Jan Engelhardt wrote: On May 13 2007 12:32, Dave Jones wrote: Despite repeated attempts over the last two and half years, this driver seems somewhat persistant. Remove its deprecated status as it has existing users who may not be in a position to migrate their apps At least keep the it's obsolete Kconfig description. We don't want new users/projects to jump on /dev/raw. i just *know* this is a mistake, but i'm going to take one more shot at distinguishing between deprecated and obsolete. as i understand it, the raw driver is *deprecated*. that is, it's still there, it's still supported, people are still using it but its use is *seriously* discouraged and everyone should be trying to move off of it at their earliest possible convenience. that is *not* the same as obsolete which should mean that that feature is dead, dead, DEAD and *no one* should be using it anymore. yes, i realize it sounds like splitting hairs, but it's this malleable definition of deprecated that's causing all of this trouble in the first place -- the fact that the raw driver is currently listed as obsolete when it is, in fact, only deprecated. in short, do *not* remove its deprecated status. rather, remove its obsolete status and *make* it deprecated. Correct. Like the weird lady next door who fancies you, it's old, it's ugly, but it's not likely to go away any time soon. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] volatile considered harmful document
Jeff Garzik wrote: On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote: Krzysztof Halasa wrote: Robert Hancock [EMAIL PROTECTED] writes: You don't need volatile in that case, rmb() can be used. rmb() invalidates all compiler assumptions, it can be much slower. It does not invalidate /all/ assumptions. Yes, why would you use rmb() when a read of a volatile generates optimal code? Read of a volatile is guaranteed to generate the least optimal code. That's what volatile does, guarantee no optimization of that particular access. By optimal you seem to mean generate fewer CPU cycles by risking use of an obsolete value, while by the same term I mean read the correct and current value from the memory location without the overhead of locks. If your logic doesn't require the correct value, why read it at all? And if it does, how fewer cycles and cache impact can anything have than a single register load from memory? Locks are useful when the value will be changed by a thread, or when the value must not be changed briefly. That's not always the case. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Software raid0 will crash the file-system, when each disk is 5TB
Jeff Zheng wrote: Here is the information of the created raid0. Hope it is enough. If I read this correctly, the problem is with JFS rather than RAID? Have you tried not mounting the JFS filesystem but just starting the array which crashes, so you can read bits of it, etc, and verify that the array itself is working? And can you run an fsck on the filesystem, if that makes sense? I assume you got to actually write a f/s at one time, and I've never used JFS under Linux. I spent five+ years using it on AIX, though, complex but robust. The crashing one: md: bindsdd md: bindsde md: raid0 personality registered for level 0 md0: setting max_sectors to 4096, segment boundary to 1048575 raid0: looking at sde raid0: comparing sde(5859284992) with sde(5859284992) raid0: END raid0: == UNIQUE raid0: 1 zones raid0: looking at sdd raid0: comparing sdd(5859284992) with sde(5859284992) raid0: EQUAL raid0: FINAL 1 zones raid0: done. raid0 : md_size is 11718569984 blocks. raid0 : conf-hash_spacing is 11718569984 blocks. raid0 : nb_zone is 2. raid0 : Allocating 8 bytes for hash. JFS: nTxBlock = 8192, nTxLock = 65536 The working one: md: bindsde md: bindsdf md: bindsdg md: bindsdd md0: setting max_sectors to 4096, segment boundary to 1048575 raid0: looking at sdd raid0: comparing sdd(2929641472) with sdd(2929641472) raid0: END raid0: == UNIQUE raid0: 1 zones raid0: looking at sdg raid0: comparing sdg(2929641472) with sdd(2929641472) raid0: EQUAL raid0: looking at sdf raid0: comparing sdf(2929641472) with sdd(2929641472) raid0: EQUAL raid0: looking at sde raid0: comparing sde(2929641472) with sdd(2929641472) raid0: EQUAL raid0: FINAL 1 zones raid0: done. raid0 : md_size is 11718565888 blocks. raid0 : conf-hash_spacing is 11718565888 blocks. raid0 : nb_zone is 2. raid0 : Allocating 8 bytes for hash. JFS: nTxBlock = 8192, nTxLock = 65536 -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Wednesday, 16 May 2007 12:04 p.m. To: Michal Piotrowski Cc: Jeff Zheng; Ingo Molnar; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED] Subject: Re: Software raid0 will crash the file-system, when each disk is 5TB On Wednesday May 16, [EMAIL PROTECTED] wrote: Anybody have a clue? No... When a raid0 array is assemble, quite a lot of message get printed about number of zones and hash_spacing etc. Can you collect and post those. Both for the failing case (2*5.5T) and the working case (4*2.55T) is possible. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Scheduler responsiveness under load
I am just running a new series of response tests, which I expect to send to the list today or tomorrow. It includes operating at some high (LA20) loads, and gathering reproducible statistics. In the process I used the file completion feature while load was high, and noted that with sd0.48 typing the first characters and hitting tab was VERY slow compared to cfs12 or even fc6 recent release. The directory had about a dozen files, there was only one match, and I was just saving typing a long filename. This was tested over three boots of sd0.48, cfs12, and fc6, as well as one boot of cfs9, and the problem was only with the most recent sd kernel I have built. Hardware: Intel Core2duo 6600, 2.4MHz, 2GB mem, 600GB RAID5, kernel 2.6.21 build with make -j20 to generate the load. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Scheduling tests on IPC methods, fc6, sd0.48, cfs12
I have posted the results of my initial testing, measuring IPC rates using various schedulers under no load, limited nice load, and heavy load at nice 0. http://www.tmr.com/~davidsen/ctxbench_testing.html -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] volatile considered harmful, take 3
Satyam Sharma wrote: *Unfortunately* (the trouble with C itself, is that a *committee* has made it into ... something ... that it should not have made it into) -- anyway, unfortunately C took it upon itself to solve a problem that it did not have (and does not even bring about) in the first place: and the half-hearted (or vague, call it what you will) attempt _then_ ends up being a problem -- by making people _feel_ as if they are doing things right, when that is probably not the case. [ And we've not even touched the issue of whether the _same_ compiler's implementation of volatile across archs/platforms is consistent. ] Pardon, I was GE's representative to the original X3J11 committee, and 'volatile' was added to codify existing practice which is one of the goals of a standard. The extension existed, in at least two forms, to allow handling of memory mapped hardware. So the committee did not take it upon itself, it was a part of the defined duty of the committee. The intents was simple, clear, and limited, to tell the compiler that every read of a variable in source code should result in a read, at that point in the logic, and similar for writes. In other words, the code should not be moved and should generate a real memory access every time. People have tried to do many things with that limited concept since, some with clarification and some with assuming the compiler knows when to ignore volatile. As someone noted about a committee, a committee is a poor way to get innovation, and a good way to have a bunch of know legible people shoot down bad ideas. It was a fun experience, where I first learned the modern equivalent of Occam's Razor, Plauger's Law of least astonishment, which compiler writers regularly violate :-( -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/2] 2.6.22-rc1: known regressions
Jean Delvare wrote: Hi Michal, On Sun, 13 May 2007 20:14:45 +0200, Michal Piotrowski wrote: I2C Subject: Sensors Applet give an error message No chip detected References : http://lkml.org/lkml/2007/5/13/109 Submitter : Antonino Ingargiola [EMAIL PROTECTED] Status : Unknown There is currently zero proof that this has anything to do with I2C. I believe in another thread this has been traced to a change in the interface and can be solved with an upgrade for the applet. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
glitch1 v1.6 script update and results on cfs-v13
The glitch1 script has been vastly updated, and now runs by itself after being started. It produces files with the fps from glxgears and a fairloops file which indicates the number of loops for each of the scrolling xterms. This gives a good indication of fairness, all processes should have about the same number of loops. Testing 2.6.21.1-cfs-v13: Using all default settings, all four processes ran the same number of loops over 40sec within about 8%. I'll have some neat results with standard deviation by the end of the weekend, it's supposed to rain. Visual inspection of the glxgears while running looked smooth as a baby's ass. Current self-running script attached, I'm writing a doc, hopefully if you want to tune it the comments are clear. *Note*: these values make sense when various schedulers and tuning values are run on the same machine. So I'll be testing on three machines, with dual-core, hyperthreaded uni, and pure uni. Unless I see a hint that one of these cases is handled less well than the others I won't compare. -- Bill Davidsen He was a full-time professional cat, not some moonlighting ferret or weasel. He knew about these things. glitch1.sh Description: Bourne shell script
Re: [Bugme-new] [Bug 8479] New: gettimeofday returning 1000000 in tv_usec on core2duo
Andrew Morton wrote: On Tue, 15 May 2007 08:06:52 +0200 Eric Dumazet [EMAIL PROTECTED] wrote: Andrew Morton a écrit : On Mon, 14 May 2007 21:17:47 -0700 [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8479 Summary: gettimeofday returning 100 in tv_usec on core2duo Kernel Version: 2.6.21 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did *NOT* occur: 2.6.20 Distribution: Gentoo Hardware Environment: core2duo T7200 (all reporters had this same CPU) Software Environment: Linux 2.6.21, glibc 2.5 Problem Description: gettimeofday returns 1 - 100 in tv_usec, not 0 - 99 This does not happen on any of my AMD-based 32 or 64 bit boxes, only on my core2duo; I have 2 other reports of this problem, all on T7200's Steps to reproduce: call gettimeofday a lot. Eventually, you'll get 100 returned in tv_usec. My average is ~1 in 100 calls. I've attached my test program, with output from various boxes. One of the other reporters tried the test program too, and got similar output. .config will be attached too. err, whoops. I remember I already hit this and corrected it http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=arch/x86_64/kernel/vsyscall.c;h=dc32cef961950915fbaa185e36ab802d5f7cea3b;hp=ba330f87067996a17495f7d03466d646c718b52c;hb=c8118c6c07f2edfd697aaa0b93e08c3b65a5a675;hpb=272a3713bb9e302e0455c894c41180a482d2c8a3 Oh, OK. Maybe a stable push is necessary ? yup. Please always think of -stable when preparing fixes. I'm sure many useful fixes are slipping past simply because those who _are_ looking out for backportable fixes are missing things. That makes me feel better, I have been occasionally suggesting fixes posted here as candidates for stable, I was afraid I was being a PITA. I forgot about the stable address and have been bugging greg, I'll stop that. Greg, Chris: please consider c8118c6c07f2edfd697aaa0b93e08c3b65a5a675 for -stable, if it isn't already there. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Kevin Bowling wrote: On 5/16/07, David Woodhouse [EMAIL PROTECTED] wrote: On Wed, 2007-05-16 at 15:53 +0200, Jörn Engel wrote: My experience is that no matter which name I pick, people will complain anyway. Previous suggestions included: jffs3 jefs engelfs poofs crapfs sweetfs cutefs dynamic journaling fs - djofs tfsfkal - the file system formerly known as logfs Can we call it jörnfs? :) However if Jörn is accused of murder, it will have little chance of being merged :-). WRT that, seems that Nina had a lover who is a confessed serial killer. I'm surprised the case hasn't been adapter for 'Boston legal' and 'Law and Order' like other high profile crimes. I see nothing wrong with jörnfs, and there's room for numbers at the end... -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] LogFS take three
Dongjun Shin wrote: There are so many flash-based storage and some disposable storages, as you pointed out, have poor quality. I think it's mainly because these are not designed for good quality, but for lowering the price. The reliability seems to be appropriate to the common use. I'm doubious that computer storage was a big design factor until the last few years. A good argument for buying large sizes, they are more likely to be recent design. These kind of devices are not ready for things like power failure because their use case is far from that. For example, removing flash card while taking pictures using digital camera is not a common use case. (there should be a written notice that this kind of action is against the warranty) They do well in such use, if you equate battery death to pulling the card (it may not be). I have tested that feature and not had a failure of any but the last item. Clearly not recommended, but sometimes unplanned needs arise. - In contrast to the embedded environment where CPU and flash is directly connected, the I/O path between CPU and flash in PC environment is longer. The latency for SW handshaking between CPU and flash will also be longer, which would make the performance optimization harder. As I mentioned, some techniques like log-structured filesystem could perform generally better on any kind of flash-based storage with FTL. Although there are many kinds of FTL, it is commonly true that it performs well under workload where sequential write is dominant. I also expect that FTL for PC environment will have better quality spec than the disposable storage. The recent technology announcements from Intel are encouraging in that respect. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Sched - graphic smoothness under load - cfs-v13 sd-0.48
I generated a table of results from the latest glitch1 script, using an HTML postprocessor I not *quite* ready to foist on the word. In any case it has some numbers for frames per second, fairness of the processor time allocated to the compute bound processes which generate a lot of other screen activity for X, and my subjective comments on how smooth it looked and felt. The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for your viewing pleasure. The only tuned result was with sd, since what I observed was so bad using the default settings. If any scheduler developers would like me to try other tunings or new versions let me know. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sched - graphic smoothness under load - cfs-v13 sd-0.48
Ray Lee wrote: On 5/19/07, Bill Davidsen [EMAIL PROTECTED] wrote: I generated a table of results from the latest glitch1 script, using an HTML postprocessor I not *quite* ready to foist on the word. In any case it has some numbers for frames per second, fairness of the processor time allocated to the compute bound processes which generate a lot of other screen activity for X, and my subjective comments on how smooth it looked and felt. The chart is at http://www.tmr.com/~davidsen/sched_smooth_01.html for your viewing pleasure. Is the S.D. columns (immediately after the average) standard deviation? If so, you may want to rename those 'stdev', as it's a little confusing to have S.D. stand for that and Staircase Deadline. Further, which standard deviation is it? (The standard deviation of the values (stdev), or the standard deviation of the mean (sdom)?) What's intended is the stddev from the average, and perl bit me on that one. If you spell a variable wrong the same way more than once it doesn't flag it as a possible spelling error. Note on the math, even when coded as intended, the divide of the squares of the errors is by N-1 not N. I found it both ways in online doc, but I learned it decades ago as N-1 so I used that. Finally, if it is the standard deviation (of either), then I don't really believe those numbers for the glxgears case. The deviation is huge for all but one of those results. I had the same feeling, but because of the code error above, what failed was zeroing the sum of the errors, so (a) values after the first kept getting larger, and when I debugged it against the calculation by hand, the first one matched so I thought I had it right. Regardless, it's good that you're doing measurements, and keep it up :-). Okay, here's a bonus, http://www.tmr.com/~davidsen/sched_smooth_02.html not only has the right values, the labels are changed, and I included more data points from the fc6 recent kernel and the 2.6.21.1 kernel with the mainline scheduler. The nice thing about this test and the IPC test I posted recently is that they are reasonable stable on the same hardware, so even if someone argues about what they show, they show the same thing each time and can therefore be used to compare changes. As I told a manger at the old Prodigy after coding up some log analysis with pretty graphs, getting the data was the easy part, the hard part is figuring out what it means. If this data is useful in suggesting changes, then it has value. Otherwise it was a fun way to spend some time. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Scheduler smoothness and fairness - results and package
Thanks to the suggestions of several people and some encouragement, I've done another upgrade of the schedular characteristic package glitch1. For bottom line folks, the results are at http://www.tmr.com/~davidsen/sched_smooth_03.html This package runs four fast scrolling xterms and a copy of glxgears to produce both screen update and CPU load. In addition to the human observation of smoothness, the glxgears speeds are characterized by variance from sample to sample, and the number of random numbers generated by the xterm programs are also characterized. Changes: - Reruns for a given configuration are shown in a single row. - The analysis has had minor output format and statistical tweaks for correctness as well as handling of multiple run output in a single row. - the glxgears 1st value is shown as a separate item, since there is a large stoppage on the first sample after cold boot with some schedulers. The full source and doc is now available from http://www.tmr.com/~public/source/ so people can do their own runs. Note that values between different machines are almost certainly not meaningful. Having run all this on a dual core Core2duo in x86 (32 bit) mode, I'm now off to rerun in x86_64 mode, and on a single CPU hyperthreaded machine, and a pure uniprocessor. I'm going to create a page for all the results in one place for anyone who cares. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7
System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display using i945G framebuffer Test: playing a 'toon with mplayer while kernel build -j20 running. Tuning: not yet, all scheduler parameters were default Result: base 2.6.21 showed some pauses and after the pause the sound got louder for a short time (500ms). With sd-0.46 the playback had many glitches and finally just stopped with the display looping on a small number of frames and no sound. The skips were repeatable, the hang was only two of five runs, I didn't let them go until the make finished (todo list) but killed the mplayer after 10-15 sec. No glitches observed with cfsv7, I thought I saw one but repeating with granularity set to 50 and then with no make running convinced me that it's just a crappy piece of animation at that point. I ran glxgears, again sd-0.46 had frequent pauses and uneven fps reported. Stock 2.6.21 had a visible pause when the frame rate was output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps. All tests gave acceptable typing echo, it seems that X is getting enough time at that load to echo without major issues. I will be doing tests with server load later this week, have to add disk for the database. Hope this initial report is useful, I may be able to update ctxbench later today and try that. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7
Bill Davidsen wrote: System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display using i945G framebuffer Test: playing a 'toon with mplayer while kernel build -j20 running. Tuning: not yet, all scheduler parameters were default Result: base 2.6.21 showed some pauses and after the pause the sound got louder for a short time (500ms). With sd-0.46 the playback had many glitches and finally just stopped with the display looping on a small number of frames and no sound. The skips were repeatable, the hang was only two of five runs, I didn't let them go until the make finished (todo list) but killed the mplayer after 10-15 sec. No glitches observed with cfsv7, I thought I saw one but repeating with granularity set to 50 and then with no make running convinced me that it's just a crappy piece of animation at that point. I ran glxgears, again sd-0.46 had frequent pauses and uneven fps reported. Stock 2.6.21 had a visible pause when the frame rate was output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps. All tests gave acceptable typing echo, it seems that X is getting enough time at that load to echo without major issues. I will be doing tests with server load later this week, have to add disk for the database. Hope this initial report is useful, I may be able to update ctxbench later today and try that. Followup: I reran with sd-0.46, setting rr_interval to 40, and then 5 (default was 16). Neither appeared to give a useful video playback. I did try setting the make to nice 10, and that made the playback perfectly smooth, as well as response to skip forward and volume change happening when the key was pressed instead of eventually. I also tried raising the nice of X to -10, that made things better on display, but I winder if it will let X run ahead of the nice-0 raid threads. Is this my hardware or is there a really odd behavior here? The sd seems to be too fair to cope well with this realistic load, and expecting users to nice things is probably morally correct but unrealistic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7
Bill Huey (hui) wrote: On Mon, Apr 30, 2007 at 03:58:45PM -0400, Bill Davidsen wrote: Followup: I reran with sd-0.46, setting rr_interval to 40, and then 5 (default was 16). Neither appeared to give a useful video playback. I did try setting the make to nice 10, and that made the playback perfectly smooth, as well as response to skip forward and volume change happening when the key was pressed instead of eventually. I also tried raising the nice of X to -10, that made things better on display, but I winder if it will let X run ahead of the nice-0 raid threads. Is this my hardware or is there a really odd behavior here? The sd seems to be too fair to cope well with this realistic load, and expecting users to nice things is probably morally correct but unrealistic. People have been reporting very good performance with regards to OpenGL applications under SD. What is your video driver ? NVidia proprietary ? My original post I was following gave my config, built-in graphics using 945G framebuffer. This is a server, I'm not a gamer. The only fancy graphics I have are on a system with no on board video at all, I picked up a moderately high-end Radeon card to drop in. And to give you an idea of what a gamer I am, that uses the vesafb driver ;-) OpenGL, X and direct frame buffer access (mplayer and friends) tend not to interact each other which can result in very different scheduling characteristics between them. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] 2.6.21 vs. 2.6.21-sd046 vs. 2.6.21-CFSv7
Con Kolivas wrote: On Tuesday 01 May 2007 05:29, Bill Davidsen wrote: System: Intel 6600 Core2duo, 2GB RAM, X nice 0 for all tests, display using i945G framebuffer Bill thanks for testing. Test: playing a 'toon with mplayer while kernel build -j20 running. Umm I don't think make -j20 is a realistic load on 2 cores. Not only does it raise your load to 20 but your I/O bandwidth will even be struggling. If video playback was to be smooth at that size a load it would suggest some serious unfairness. I'm not just pushing the fairness barrow here; I mean it would need to be really really unfair unless your combined X and video playback cpu combined added up to less than 1/20th of your total cpu power (which is possible but I kinda doubt it). Do you really use make -j20 to build regularly? Yes, this is a compile and file server, I frequently build a raft of kernels when a security patch comes out. There doesn't seem to be an i/o issue, with 2GB RAM and RAID5 over a SATA array I have enough, but honestly the disk activity is minimal, even with a single drive. Tuning: not yet, all scheduler parameters were default Result: base 2.6.21 showed some pauses and after the pause the sound got louder for a short time (500ms). With sd-0.46 the playback had many glitches and finally just stopped with the display looping on a small number of frames and no sound. The skips were repeatable, the hang was only two of five runs, I didn't let them go until the make finished (todo list) but killed the mplayer after 10-15 sec. No glitches observed with cfsv7, I thought I saw one but repeating with granularity set to 50 and then with no make running convinced me that it's just a crappy piece of animation at that point. I did notice on your followup email that nice +10 of the 20 makes fixed the playback which sounds pretty good. Yes, I can get around the load doing that. I ran glxgears, again sd-0.46 had frequent pauses and uneven fps reported. Stock 2.6.21 had a visible pause when the frame rate was output, otherwise minimal pauses. CFSv7 appeared smooth at about 250 fps. I assume you mean glxgears when you're running make -j20 again here. Of course. ;-) -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/