Re: [00/17] Large Blocksize Support V3
On Thu, 26 Apr 2007, Andrew Morton wrote: > > Or make sure that truncate > > doesn't race on a partial *block* truncate? > > lock four pages You would only lock a single higher order block. Truncate works on that level. If you have 4 separate pages then you need to take separate locks and you may not have contiguous memory which makes the filesystem run through all sorts of hoops. > I'm not saying it's especially simple, nor fast. But it has the advantage > that we're not forced to use larger pages with _it's_ attendant performance > problems. The patch is not about forcing to use large pages but about the option to use larger pages. Its a new flexibility. > And it doesn't introduce a rather nasty hack of pretending (in some places) > that pages are larger than they really are. They are really larger. One page struct controls it all. > And it has the very significant advantage that it doesn't introduce brand > new concepts and some complexity into core MM. The patchset would reduce complexity and making it easy to handle the page cache. Gets rid of the hacks to support larger ones right now. Its straightforward, no new locking, very much a cleanup patch. > And make no mistake: the latter disadvantage is huge. Because if we do the > PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*. > Maintaining and enhancing core MM and VFS becomes harder and more costly > and slower and more buggy *for ever*. The ramp for people to become > competent on core MM becomes longer. Our developer pool becomes smaller, and > proportionally less skilled. No it becomes easier. Look at the patchset. It cleans up a huge mess. What is hacky about it? It is consistently using larger pages for the page cache and it integrates nicely into the VM. > And hardware gets better. If Intel & AMD come out with a 16k pagesize > option in a couple of years we'll look pretty dumb. If the problems which > you're presently having with that controller get sorted out in the next > generation of the hardware, we'll also look pretty dumb. We are currently looking dumb and unable to deal with the hardware. Yes we can pressure the hardware vendors to produce hardware conforming to our specifications but I always thought that was how another company operates. > As always, there are tradeoffs. We can see the cons, and they are very > significant. We don't yet know the pros. Perhaps they will be similarly > significant. But I don't believe that the larger PAGE_CACHE_SIZE hack > (sorry) is the only way in which they can be realised. It is the most consistent solution that avoid the proliferation of further hacks to address the large blocksize. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work
On Thu, Apr 26, 2007 at 08:44:36PM +0400, Oleg Nesterov wrote: > On 04/26, Jarek Poplawski wrote: > > > > On Wed, Apr 25, 2007 at 04:47:14PM +0400, Oleg Nesterov wrote: > > ... > > > > > > + spin_lock_irq(>lock); > > > > > > + /* CPU_DEAD in progress may change cwq */ > > > > > > + if (likely(cwq == get_wq_data(work))) { > > > > > > + list_del_init(>entry); > > > > > > + __set_bit(WORK_STRUCT_PENDING, > > > > > > work_data_bits(work)); > > > > > > + retry = try_to_del_timer_sync(>timer) < > > > > > > 0; > > > > > > + } > > > > > > + spin_unlock_irq(>lock); > > > > > > + } while (unlikely(retry)); > > > > > > > 1. If delayed_work_timer_fn of this work is fired and is waiting > > > > on the above spin_lock then, after above spin_unlock, the work > > > > will be queued. > > > > > > No, in that case try_to_del_timer_sync() returns -1. > > > > Yes. But I think it's safe only after moving work_clear_pending > > in run_workqueue under a lock; probably otherwise there is a > > possibility this flag could be cleared, after above unlock. > > It doesn't matter in this particular case because we are going to retry > anyway. But yes, this patch moves work_clear_pending() under lock, because > otherwise it could be cleared by run_workqueue() if this work is about > to be executed, but was already deleted from list. ...and it seems to be the same what I meant... I wanted only to make agree (now it's only for historical reasons) the lock on _PENDING could matter in run_workqueue. Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21 reiserfs -- cicular locking?
On Thu, 26 Apr 2007 16:40:14 -0700 David Brownell <[EMAIL PROTECTED]> wrote: > This might be a Heisenberg, but I figure it's worth posting > in case anyone else sees similar oddness. Never seen it > before or since. It's as if a gremlin got annoyed with me > for switching a filesystem from reiser to ext3. :) > > - Dave > > > === > [ INFO: possible circular locking dependency detected ] > 2.6.21-git #6 > --- > vi/4556 is trying to acquire lock: > (_SB(s)->xattr_dir_sem){..--}, at: [] > reiserfs_chown_xattrs+0x5b/0x128 > > but task is already holding lock: > (>i_mutex){--..}, at: [] chown_common+0x93/0xb3 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (>i_mutex){--..}: >[] __lock_acquire+0x9f7/0xbaa >[] get_xa_root+0x49/0x107 >[] lock_acquire+0x7b/0x9f >[] get_xa_root+0x49/0x107 >[] save_trace+0x40/0x9e >[] __mutex_lock_slowpath+0xd8/0x281 >[] _spin_unlock_irq+0x24/0x4a >[] get_xa_root+0x49/0x107 >[] open_xa_dir+0x1c/0xf8 >[] __down_read+0x34/0x9d >[] reiserfs_delete_xattrs+0x64/0x185 >[] _atomic_dec_and_lock+0x14/0x34 >[] reiserfs_delete_inode+0x38/0xae >[] generic_delete_inode+0x64/0xf5 >[] reiserfs_delete_inode+0x0/0xae >[] generic_delete_inode+0x7a/0xf5 >[] do_unlinkat+0xd9/0x14f >[] trace_hardirqs_on+0x123/0x14d >[] trace_hardirqs_on_thunk+0x35/0x37 >[] system_call+0x7e/0x83 >[] 0x > > -> #0 (_SB(s)->xattr_dir_sem){..--}: >[] print_circular_bug_header+0xcc/0xd3 >[] __lock_acquire+0x8f3/0xbaa >[] reiserfs_chown_xattrs+0x5b/0x128 >[] lock_acquire+0x7b/0x9f >[] reiserfs_chown_xattrs+0x5b/0x128 >[] down_read+0x32/0x3b >[] reiserfs_chown_xattrs+0x5b/0x128 >[] __capable+0x9/0x1d >[] reiserfs_setattr+0x11e/0x1ec >[] current_fs_time+0x35/0x3a >[] notify_change+0x122/0x231 >[] chown_common+0x9e/0xb3 >[] fget+0x88/0xa7 >[] sys_fchown+0x30/0x47 >[] system_call+0x7e/0x83 >[] 0x > > other info that might help us debug this: > > 1 lock held by vi/4556: > #0: (>i_mutex){--..}, at: [] chown_common+0x93/0xb3 > > stack backtrace: > > Call Trace: > [] print_circular_bug_tail+0x69/0x72 > [] print_circular_bug_header+0xcc/0xd3 > [] __lock_acquire+0x8f3/0xbaa > [] reiserfs_chown_xattrs+0x5b/0x128 > [] lock_acquire+0x7b/0x9f > [] reiserfs_chown_xattrs+0x5b/0x128 > [] down_read+0x32/0x3b > [] reiserfs_chown_xattrs+0x5b/0x128 > [] __capable+0x9/0x1d > [] reiserfs_setattr+0x11e/0x1ec > [] current_fs_time+0x35/0x3a > [] notify_change+0x122/0x231 > [] chown_common+0x9e/0xb3 > [] fget+0x88/0xa7 > [] sys_fchown+0x30/0x47 > [] system_call+0x7e/0x83 > cc added. This was also reported againt -rc7-mm1 (or 2) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] change global zonelist order v4 [0/2]
Hi, this is version 4. including Lee Schermerhon's good rework. and automatic configuration at boot time. (This patch is reworked from V2, so skip V3 changelog.) ChangeLog V2 -> V4 - automatic configuration is added. - automatic configuration is now default. - relaxed_zone_order is renamed to be numa_zonelist_order you can specify value "default" , "zone" , "numa" - clean-up from Lee Schermerhorn - patch is speareted to "base" and "autoconfiguration algorithm" Changelog from V1 -> V2 - sysctl name is changed to be relaxed_zone_order - NORMAL->NORMAL->->DMA->DMA->DMA order (new ordering) is now default. NORMAL->DMA->NORMAL->DMA order (old ordering) is optional. - addes boot opttion to set relaxed_zone_order. ia64 is supported now. - Added documentation Please don't hesitate to rework this if you have good plan. I'll be offlined in the next week because my office will be closed. Lee-san, please Ack or Sign-Off if patches seems O.K. I think my autoconfiguration logic is reasonable to some extent. But we may have some discussion. It can be rewritable by additional patch easily. Thanks. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: menuconfig issue (checklist) in 2.6.20.7 & 2.6.21-rc7 ?
On Apr 26 2007 05:06, Mike Galbraith wrote: >On Wed, 2007-04-25 at 22:30 +0200, Sam Ravnborg wrote: > >> > There are general funnies in the menuconfig world (my preference) here. >> > For instance, I recently had reason to change/test different default IO >> > schedulers, and found that no matter what I did, I couldn't select a >> > default IO scheduler any more, though I used to be able to do so. > >> Tried it now with latest -git from Linus and here it works. >> Notice that you need to make the scheduler a built-in <*> >> before you can select it as default. >> A scheduler selected as a module cannot be made default. > >Ok, I guess my ncurses is ill. (all built in) Thanks. CONFIG_MODULES=n? Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On 4/27/07, Pavel Machek <[EMAIL PROTECTED]> wrote: Now, it would be _very_ nice to be able to snapshot system and continue running, but I just don't see how to do it without extensive filesystem support. So what kind of support do we need from the filesystem? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Pin an IRQ to a particular CPU
> Hello, > > In an effort to increase over all throughput of my Linux NFS file > server, I thought about trying to force an IRQ, for the NIC, to be > serviced by a particular CPU. Is this possible? > > TIA, > Phy /proc/irq/*/smp_affinity I would recommend automatic balancing and leave it at that. This code automatically distributes IRQs to CPUs such that each IRQ can only be serviced by a single CPU and the IRQs are distributed over the CPU set. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS file out of date?
Eric W. Biederman wrote: > "H. Peter Anvin" <[EMAIL PROTECTED]> writes: > >> A lot of that code (although, of course, not all) could be written in C, >> though. I'm thinking of taking a stab at rewriting it that way. > > Is this using the .code16gcc? Or are you thinking of some other > technique. Requiring another C compiler to build the kernel would > be a pain to use. .code16gcc was what I was using. There is a GSoC project that I'm mentoring to get 16-bit support for gcc, that will be possible to eventually migrate to (for code size) if/when it gets implemented and gets pushed out far enough, but that's for the future. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS file out of date?
"H. Peter Anvin" <[EMAIL PROTECTED]> writes: > A lot of that code (although, of course, not all) could be written in C, > though. I'm thinking of taking a stab at rewriting it that way. Is this using the .code16gcc? Or are you thinking of some other technique. Requiring another C compiler to build the kernel would be a pain to use. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Pin an IRQ to a particular CPU
On Apr 26 2007 18:15, Phy Prabab wrote: > Hello, > > In an effort to increase over all throughput of my Linux NFS file > server, I thought about trying to force an IRQ, for the NIC, to be > serviced by a particular CPU. Is this possible? Sorry, too early in the morning :) It is definitely possible, since that is what irqbalance does. (Not that I know the exact details right now...) the other URL I posted might give some useful hints too, though. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Pin an IRQ to a particular CPU
> Hello, > > In an effort to increase over all throughput of my Linux NFS file > server, I thought about trying to force an IRQ, for the NIC, to be > serviced by a particular CPU. Is this possible? http://lkml.org/lkml/2007/4/5/155 Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ZFS with Linux: An Open Plea
On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote: > > Does it matter that google's recent report on disk failures indicated > that SMART never predicted anything useful as far as they could tell? > Certainly none of my drive failures ever had SMART make any kind of > indication that anything was wrong. I saw that talk, and that's not what I got out of it. They found that SMART error reports _did_ correlate with drive failure. See page 8 of: http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf (If you're not a USENIX member, you may be able to find a free download copy elsewhere.) However, they found that the correlation was not strong enough to make it economically feasible to replace disks reporting SMART failures, since something like 70% of disks were still working a year after the first failure report. Also, they found that some disks failed without any SMART error reports. Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of data, so for them, "economically feasible" means something different than for my personal laptop hard drive. I have twice had my laptop hard drive start spitting SMART errors and then die within a week. It is economically quite sensible for me to replace my laptop drive once it has an error, since I don't carry around 3 laptops everywhere I go. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26 2007, Mel Gorman wrote: > On (26/04/07 20:39), Jens Axboe didst pronounce: > > On Thu, Apr 26 2007, Christoph Lameter wrote: > > > On Thu, 26 Apr 2007, Jens Axboe wrote: > > > > > > > On Thu, Apr 26 2007, Christoph Lameter wrote: > > > > > On Thu, 26 Apr 2007, Jens Axboe wrote: > > > > > > > > > > > The above can be implemented fairly cleanly, and on a need-to-have > > > > > > basis. It's not something that'll break drivers. > > > > > > > > > > But its also not going to fix the hacks that we have in the kernel > > > > > to deal with > PAGE_SIZE i/o. > > > > > > > > No, but that's a _seperate_ issue! Don't keep mixing up the two. > > > > > > Yes I understand that you want it to be a separate issue so we get get > > > more rationales for the hacks that we do to avoid the large > > > order allocations. > > > > Christoph, don't take your frustrations out on me. I've several times in > > this thread said that I'd LIKE to have > PAGE_SIZE support in the page > > cache. I WROTE the initial pktcdvd driver that is a primary example of > > these hacks, I'm very well aware of the pain and bugs involved with > > that. > > > > But don't push large pages as the only solution to larger ios, because > > that is trivially not true. > > > > Would it be fair to say that your approach and using large pages are not > mutually exclusive solutions? It seems a lot of the debate here is > assuming there is One And Only One Solution for larger ios. Definitely, there's zero reason they cannot coexist. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work
On Thu, Apr 26, 2007 at 08:34:06PM +0400, Oleg Nesterov wrote: > On 04/26, Jarek Poplawski wrote: > > > > > void cancel_rearming_delayed_work(struct delayed_work *dwork) > > > { > > > struct work_struct *work = >work; > > > struct cpu_workqueue_struct *cwq = get_wq_data(work); > > > int done; > > > > I don't understand, why you think cwq cannot be NULL here. > > sure it can, this is just a template. > > > > > > > do { > > > done = 1; > > > spin_lock_irq(>lock); > > > > > > if (!list_empty(>entry)) > > > list_del_init(>entry); > > > > BTW, isn't needs_a_good_name needles after this and after del_timer > > positive? > > no, we still need it. work->func() may be running on another CPU as well. > > > > > > else if (test_and_set_bit(WORK_STRUCT_PENDING, > > > work_data_bits(work))) > > > done = del_timer(>timer) > > > > If this runs while a work function is fired in run_workqueue, > > it sets _PENDING bit, but if the work skips rearming, we have probably > > endless loop, again. > > No, if the work skips rearming (or didn't yet), we set WORK_STRUCT_PENDING > successfully. Sorry! Should be: "If this runs while a work function is fired in run_workqueue, it sets _PENDING bit, but if the work skips rearming, I have probably endless loop, again." > > > It is something alike to the current > > way, with some added measures: you try to shoot a work on the run, > > while queued or timer_pending, plus the _PENDING flag set, so it seems, > > there is some risk of longer than planed looping. > > Sorry, can't understand. done == 0 means that the queueing in progress, > this work should be placed on cwq->worklist very soon, most probably > right after we drop cwq->lock. I think, theoretically, probably, maybe, there is possible some strange case, this function gets spin_lock only when: list_empty(>entry) == 1 && _PENDING == 1 && del_timer(>timer) == 0. > > > I have to look at this more, at home and, if something new, I'll write > > tomorrow. So, the good news, is you should have enough sleep this time! > > Thanks for review! OK. Here is the review: It looks great!!! I cannot believe, it could be so "easy"! Regards, Jarek P. PS: probably unusable, but for my own satisfaction: Acked-by: Jarek Poplawski <[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Fri, 27 Apr 2007 14:20:46 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > >blocksizes via this scheme - instantiate and lock four pages and go for > >it. > > So now how do you get block aligned writeback? in writeback and pageout: if (page->index & mapping->block_size_mask) continue; > Or make sure that truncate > doesn't race on a partial *block* truncate? lock four pages > You basically have to > jump through nasty, nasty hoops, to handle corner cases that are introduced > because the generic code can no longer reliably lock out access to a > filesystem block. > > Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and > doing everything inside the filesystem because it's the only way sane > way to serialise access to these aggregated structures. This is > the way XFS used to work in it's data path, and we all know how long > and loud people complained about that. > > A filesystem specific aggregation mechanism is not a palatable solution > here because it drives filesystems away from being able to use generic > code. I would expect we could (should) implement this in generic code by modifying the existing stuff. I'm not saying it's especially simple, nor fast. But it has the advantage that we're not forced to use larger pages with _it's_ attendant performance problems. And it will benefit all filesystems immediately. And it doesn't introduce a rather nasty hack of pretending (in some places) that pages are larger than they really are. And it has the very significant advantage that it doesn't introduce brand new concepts and some complexity into core MM. And make no mistake: the latter disadvantage is huge. Because if we do the PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*. Maintaining and enhancing core MM and VFS becomes harder and more costly and slower and more buggy *for ever*. The ramp for people to become competent on core MM becomes longer. Our developer pool becomes smaller, and proportionally less skilled. And hardware gets better. If Intel & AMD come out with a 16k pagesize option in a couple of years we'll look pretty dumb. If the problems which you're presently having with that controller get sorted out in the next generation of the hardware, we'll also look pretty dumb. As always, there are tradeoffs. We can see the cons, and they are very significant. We don't yet know the pros. Perhaps they will be similarly significant. But I don't believe that the larger PAGE_CACHE_SIZE hack (sorry) is the only way in which they can be realised. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS file out of date?
Eric W. Biederman wrote: > WANG Cong <[EMAIL PROTECTED]> writes: > >> Thanks for your point. >> I know little about virtualization, maybe can't help much. But I am >> interested >> in other things you mentioned. AFAIK, segments can't be avoided on i386, and >> Linux uses them very little, how are they recalculated constantly? > > Look in arch/i386/boot/setup.S it runs in 16bit mode. We are talking about > real mode segments not 16bit segments. > > 16bit real mode is a completely different ball game, and why we keep BIOS > calls isolated to that one dinky file. > A lot of that code (although, of course, not all) could be written in C, though. I'm thinking of taking a stab at rewriting it that way. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] include/linux/kprobes.h: always offer show_registers() prototype
On Fri, Apr 27, 2007 at 01:47:49AM +0200, Adrian Bunk wrote: > Allow gcc to perform show_registers() type checking also with > CONFIG_KPROBES=n. Is kprobes.h the correct place to allow for this change? Perhaps, with Christoph's patch http://marc.info/?l=linux-kernel=117432009501114=2 consolidating the die_notifier code, this could be moved to include/linux/kdebug.h instead? Ananth > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > > This patch has been sent on: > - 27 Mar 2007 > > include/linux/kprobes.h |4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > --- linux-2.6.21-rc4-mm1/include/linux/kprobes.h.old 2007-03-26 > 15:28:27.0 +0200 > +++ linux-2.6.21-rc4-mm1/include/linux/kprobes.h 2007-03-26 > 15:32:23.0 +0200 > @@ -163,7 +163,6 @@ > extern void arch_arm_kprobe(struct kprobe *p); > extern void arch_disarm_kprobe(struct kprobe *p); > extern int arch_init_kprobes(void); > -extern void show_registers(struct pt_regs *regs); > extern kprobe_opcode_t *get_insn_slot(void); > extern void free_insn_slot(kprobe_opcode_t *slot, int dirty); > extern void kprobes_inc_nmissed_count(struct kprobe *p); > @@ -241,4 +240,7 @@ > { > } > #endif /* CONFIG_KPROBES */ > + > +void show_registers(struct pt_regs *regs); > + > #endif /* _LINUX_KPROBES_H */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote: > Do I get it right that you just have in each cnode a pointer to the > previous & next cnode? But then if two consecutive cnodes get corrupted, > you have no way to connect the chain, do you? If each cnode contained > some unique identifier of the file and a number identifying position of > cnode, then there would be at least some way (through expensive) to > link them together correctly... You're right, it's easy to add a little more redundancy that would make it possible to recover from two consecutive nodes being corrupted. Keeping a parent inode id in each continuation inode is definitely a smart thing to do. Some minor side notes: Continuation inodes aren't really in any defined order - if you look at Jeff's ping-pong chunk allocation example, you'll see that the data in each continuation inode won't be in linearly increasing order. Also, while the current implementation is a simple doubly-linked list, this may not be the best solution long-term. What's important is that each continuation inode have a back pointer to the parent and that there is some structure for quickly looking up the continuation inode for a given file offset. Suggestions for data structures that work well in this situation are welcome. :) -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: >> I suspect what we want to do is come up with a function to call >> to test to see if a page should be read-only and map such pages >> _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code. >> > > Hm, I think that's a hard function to write in general. For the special > case of pagetable_init it wouldn't be too hard, but it doesn't seem like > a big improvement over the current state of affairs. There is some difficulty there, and I need to look at the issue some more but it just occurred to me that this problem of tracking special permissions on pages is not confined to Xen. Currently for pages we need to have a consistent global view if a pages is uncached, write-combining or cached. We need a way to check this so that we don't get inconsistencies in how we are caching pages, when we start controlling this on a per page basis. Currently we are not doing this and it is blocking merging of PAT write-combining support in the kernel because of the bad things (silent data corruption and general cpu undefined behavior) that can happen if we don't handling things consistently. I don't know if we can solve the two problems with the same mechanism but it is worth looking into. Especially if what we do is just add support for marking some pages as read-only. We could use that ensuring there are no writable mappings of the kernel's text segment for example. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote: > > No, I'm referring to a different file. The scenario is that you have > a growing file in a nearly full disk with files being deleted (and > thus space being freed) such that allocations for the growing file > bounce back and forth between chunks. This is an excellent question. I call this the ping-pong problem. The solution is as Amit describes: You have a maximum of one continuation inode per file per chunk, and you require sparse files. Here's an example, spelled out: Allocate file 1 in chunk A. Grow file 1. Chunk A fills up. Allocate continuation inode for file 1 in chunk B. Chunk A gets some free space. Chunk B fills up. Pick chunk A for allocating next block of file 1. Try to look up a continuation inode for file 1 in chunk A. Continuation inode for file 1 found in chunk A! Attach newly allocated block to existing inode for file 1 in chunk A. This is why the file format inside each chunk needs to support sparse files. I have a presentation that has a series of slides on problems and potential resolutions that might help: http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code
At Thu, 26 Apr 2007 20:30:54 -0400, James Bottomley wrote: > On Fri, 2007-04-27 at 02:13 +0200, Adrian Bunk wrote: > > On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote: > > > On Fri, 27 Apr 2007, Adrian Bunk wrote: > > > > > > > This patch removes kernel 2.4 code. > > > > > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > > > > > --- > > > > > > > > This patch has been sent on: > > > > - 26 Mar 2007 > > > > > > > > drivers/scsi/nsp32.c | 109 +-- > > > > 1 file changed, 13 insertions(+), 96 deletions(-) > > > > > > > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 > > > > 20:27:34.0 +0200 > > > > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 > > > > 20:31:59.0 +0200 > > > > @@ -49,10 +49,6 @@ > > > > #include > > > > #include > > > > > > > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)) > > > > -# include > > > > -#endif > > > > - > > > > > > i'm curious about the rules for removing code like this. in the case > > > of drivers, isn't it possible that some driver source could be > > > relevant for both the 2.4 and 2.6 kernel source tree, and simply uses > > > that kind of preprocessor check to make sure it's being compiled > > > appropriately? > > > > That's what it was for. > > > > > or are you doing something more sophisticated than simply checking the > > > kernel version being tested? > > > > No. > > > > The point is: > > > > It seems this driver was once maintained for both 2.4 and 2.6 in one > > file. > > > > As long as this is done, such version checks are OK. > > Personally, I don't like to see 2.4 and 2.6 in a new driver, and will > tend to try to force it to be 2.6 only. For an existing driver, I tend > to be much more tolerant: removing the huge gobs of code to achieve 2.6 > only is usually a bit disruptive on both the driver and the maintainer > > > But if a driver is no longer actually maintained for both kernels these > > checks become useless (and there quickly arised unconditional 2.6-only > > code in such a driver) and can be removed. > > This driver is maintained by > > Yokota Hiroshi <[EMAIL PROTECTED]> > GOTO Masanori <[EMAIL PROTECTED]> > > As it says in the header. It was last modified in May 2006, so it is > maintained under the somewhat elastic standards of SCSI. I've cc'd them > to see what they think. Thanks forwarding the mail. Personally I don't mind to remove 2.4 compatibility code. Historically nsp32 was introduced in 2.4.20 and 2.5.40 so that we needed to maintain both kernel serieses at that time. However, these days 2.4 kernel is not the mainline code, and 2.6 is the standard one. We don't need to keep paying attention for 2.4 compatibility issue. Moreover, some other compatibility code were already removed from 2.6 nsp32 driver by many cleaning up patches (thanks!). This removal makes sense to me. Regards, -- gotom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > >which will map in the snapshot, return the mapped address and the size > >(and if you want to support snapshots > 4GB, be my guest, but I suspect > >you're actually *better* off just admitting that if you cannot shrink > >the snapshot to less than 32 bits, it's not worth doing) On Fri, 27 Apr 2007, Nigel Cunningham wrote: > That inherently limits the image to half of available ram (you need > somewhere to store the snapshot), so you won't get the full image you > express interest in below. It doesn't. We can make the userspace mapped pages copy-on-write. As long as the userspace makes sure there's not much activity during snapshot/shutdown, we will be fine. What we probably do need to copy is kernel pages. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fix hrtimers documentation
Fix hrtimers documentation. - The word `patch' is used to indicate hrtimers subsystem implementation. However this subsystem has already been in ustream kernel, so convert them into appropriate phrases. - Some minor cleanups. Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]> Index: linux-2.6.21/Documentation/hrtimers/hrtimers.txt === --- linux-2.6.21.orig/Documentation/hrtimers/hrtimers.txt 2007-04-27 12:45:56.0 +0900 +++ linux-2.6.21/Documentation/hrtimers/hrtimers.txt2007-04-27 13:24:22.0 +0900 @@ -1,8 +1,9 @@ hrtimers - subsystem for high-resolution kernel timers - +-- -This patch introduces a new subsystem for high-resolution kernel timers. +This document describes hrtimers, a new subsystem for high-resolution +kernel timers. One might ask the question: we already have a timer subsystem (kernel/timers.c), why do we need two timer subsystems? After a lot of @@ -78,7 +79,7 @@ from 250 Hz to 100 HZ (or even smaller). hrtimer subsystem implementation details -the basic design considerations were: +The basic design considerations were: - simplicity @@ -87,7 +88,7 @@ the basic design considerations were: - simplification of existing, timing related kernel code -another basic requirement was the immediate enqueueing and ordering of +Another basic requirement was the immediate enqueueing and ordering of timers at activation time. After looking at several possible solutions such as radix trees and hashes, we chose the red black tree as the basic data structure. Rbtrees are available as a library in the kernel and are @@ -138,23 +139,22 @@ details.) hrtimers - rounding of timer values --- -the hrtimer code will round timer events to lower-resolution clocks +The hrtimer code will round timer events to lower-resolution clocks because it has to. Otherwise it will do no artificial rounding at all. -one question is, what resolution value should be returned to the user by +One question is, what resolution value should be returned to the user by the clock_getres() interface. This will return whatever real resolution a given clock has - be it low-res, high-res, or artificially-low-res. hrtimers - testing and verification --- +--- -We used the high-resolution clock subsystem ontop of hrtimers to verify +We used the high-resolution clock subsystem on top of hrtimers to verify the hrtimer implementation details in praxis, and we also ran the posix timer tests in order to ensure specification compliance. We also ran tests on low-resolution clocks. -The hrtimer patch converts the following kernel functionality to use -hrtimers: +hrtimers converts the following kernel functionality to use it: - nanosleep - itimers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS file out of date?
WANG Cong <[EMAIL PROTECTED]> writes: > > Thanks for your point. > I know little about virtualization, maybe can't help much. But I am interested > in other things you mentioned. AFAIK, segments can't be avoided on i386, and > Linux uses them very little, how are they recalculated constantly? Look in arch/i386/boot/setup.S it runs in 16bit mode. We are talking about real mode segments not 16bit segments. 16bit real mode is a completely different ball game, and why we keep BIOS calls isolated to that one dinky file. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26, 2007 at 07:53:57PM -0700, Andrew Morton wrote: > On Fri, 27 Apr 2007 12:27:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote: > > > On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote: > > > Also, afaict your important requirements would be met by retaining > > > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by > > > physically contiguous pages > > > > Sure, that addresses the larger I/O side of things, but it doesn't address > > the large filesystem blocksize issues that can only be solved with some kind > > of page aggregation abstraction. > > a) That wasn't a part of Christoph's original rationale list, so forgive >me for thinking it is not so important and got snuck in post-facto when >things got tough. I've been pushing christoph to do something like this for more than a year purely so we can support large block sizes in XFS. He's got other reasons for wanting to do this, but that doesn't mean that the large filesystem blocksize issue is any less important. >blocksizes via this scheme - instantiate and lock four pages and go for >it. So now how do you get block aligned writeback? Or make sure that truncate doesn't race on a partial *block* truncate? You basically have to jump through nasty, nasty hoops, to handle corner cases that are introduced because the generic code can no longer reliably lock out access to a filesystem block. Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and doing everything inside the filesystem because it's the only way sane way to serialise access to these aggregated structures. This is the way XFS used to work in it's data path, and we all know how long and loud people complained about that. A filesystem specific aggregation mechanism is not a palatable solution here because it drives filesystems away from being able to use generic code. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] unprivileged mounts update
Miklos Szeredi <[EMAIL PROTECTED]> writes: >> On Apr 25 2007 11:21, Eric W. Biederman wrote: >> >> >> >> Why did we want to use fsuid, exactly? >> > >> >- Because ruid is completely the wrong thing we want mounts owned >> > by whomever's permissions we are using to perform the mount. >> >> Think nfs. I access some nfs file as an unprivileged user. knfsd, by >> nature, would run as euid=0, uid=0, but it needs fsuid=jengelh for >> most permission logic to work as expected. > > I don't think knfsd will ever want to call mount(2). > > But yeah, I've been convinced, that using fsuid is the right thing to > do. Actually knfsd does call mount when it crosses a mount point on the nfs server it generates an equivalent mount point in linux. At least I think that is the what it is doing. It is very similar to our mount propagation path. However as a special case I don't think the permission checking is likely to bite us there. It is worth double checking once we have the other details ironed out. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.6.21 hangs during post boot initialization phase
Linus Torvalds wrote: On Fri, 27 Apr 2007, Peter Williams wrote: The 2.6.21 kernel is hanging during the post boot phase where various daemons are being started (not always the same daemon unfortunately). This problem was not present in 2.6.21-rc7 and there is no oops or other unusual output in the system log at the time the hang occurs. Can you use "git bisect" to narrow it down a bit more? It's only 125 commits, so bisecting even just three or four kernels will narrow it down to a handful Yes. I'm just in the process of reading up on how to do the bisecting now. Should have an answer in a few hours, I guess. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <[EMAIL PROTECTED]> wrote: > Apr 26 11:09:34 jupiter2 RTNL: assertion failed at > net/ipv4/devinet.c > (1055) Apr 26 11:09:34 jupiter2 > Apr 26 11:09:34 jupiter2 Call Trace: > Apr 26 11:09:34 jupiter2 [] > inetdev_event+0x48/0x283 > Apr 26 11:09:34 jupiter2 [] > _spin_lock_bh+0x9/0x19 > Apr 26 11:09:34 jupiter2 [] > rt_run_flush+0x7e/0xaf > Apr 26 11:09:34 jupiter2 [] > notifier_call_chain+0x29/0x56 > Apr 26 11:09:34 jupiter2 [] > dev_set_mac_address+0x53/0x59 > Apr 26 11:09:34 jupiter2 [] > bo This was due to locking bustage in the net tree. It should be fixed in 2.6.21-rc7-mm2. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, 2007-04-26 at 21:02 +0200, Willy Tarreau wrote: > On Thu, Apr 26, 2007 at 09:40:26AM -0700, Linus Torvalds wrote: > > > So we should have somebody like Christoph running -mm, and when things > > break, we'll just sic Christoph on whoever broke it, and teach people > > proper fear and respect! > > And with Al Viro doing random code review and fill in the commits for > regression fixes, even long established developers will check their code > twice before submitting ;-) Yeah! We can call them The Black And Blues Brothers :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
Jens Axboe <[EMAIL PROTECTED]> writes: > On Thu, Apr 26 2007, Eric W. Biederman wrote: > > Yep, if you could just have > PAGE_CACHE_SIZE blocks in the filesystem > easily, the problem would basically be solved for cd and dvd packet > writing. Ok. I'm not in a position to do this work. But I will keep it in mind and look at it. >> Am I correct in assuming that the problem is primarily about getting >> filesystems (and other upper layers) to submit BIOs that take into >> consideration the larger block size of the underlying device, so >> that read/modify write is not needed in the pktcdvd layer? > > Yes, that is exactly the problem. Once you have that, pktcdvd is pretty > much reduced to setup and init code, the actual data handling can be > done by sr or ide-cd directly. You could merge it into cdrom.c, it would > not be very different from mt-rainier handling (which basically does RMW > in firmware, so it works for any write, but performance is of course > horrible if you don't do it right). Thanks for the clarification. So we do have a clear problem that we do not have generic support for large sector sizes residing in the page cache. There is one place where this is a direct effect fs/block_dev.c We have an indirect affect in the filesystems because there a few bits of generic support missing and there is no linux convention on how to handle this case. I expect if we can enhance fs/block_dev.c to handle this case the other parts will fall out naturally. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange problem with Device Mapper
On Thu, 26 Apr 2007 19:12:27 +0200 "Paul Rolland" <[EMAIL PROTECTED]> wrote: > Hello, > > I've a machine that has been installed with Intel Soft Raid on top > of 2 SATA disks. > I'm trying to have this work as a RAID-1 array. > Bios configuration has been done, using 128K chunk, and the kernel > (2.6.20.7) sees perfectly /dev/mapper/isw__RAID1 > > But, I'm facing two problems : > 1 - If i try to create partitions on this device, it does fail (the > values are not interpreted correctly) > > 2 - To avoid 1), I stop the RAID array (dmraid -an), then I do create > exactly the same partition set on /dev/sda, and /dev/sdb, and > then I reactivate RAID (dmraid -ay). > This allows me to see all the /dev/mapper/isw_xxx_RAID1p1, p2, ... > But, running fsck -t ext2 /dev/mapper/isw_xxx_RAID1p1 > ends in a lock when the partition is larger than 10Go (well, it is > Ok on the 10Go one, and it locks on the 100Go). > Not a real hardlock, I still can switch to a new VC, but it's not > more possible to start a new command or to stop fsck. > > Should I do everything on the physical disks before activating RAID ? > Is it a normal behavior ? > > BTW, I also noticed that such a device doesn't support BLKRRPART ioctl... > So it is possible that what I'm doing is wrong... > > Any idea ? > cc added. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Thu, 2007-04-26 at 22:09 +0200, Kasper Sandberg wrote: > On Thu, 2007-04-26 at 10:41 -0400, Gene Heskett wrote: > > > > > Compared to mainline? I still think this is a 100% keeper for desktop > > users > > like me. > > Here its alot worse, just playing an ogg with ogg123 even without > anything reniced (X is 0), just pressing a link in konqueror can make > audio skip (ogg123 fails to fill the alsa buffer, and thus it skips). What arguments are you passing to ogg123? I'm creating some ogg files right now, and would like to reproduce these skips to see why they're happening if possible. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
"REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio"
"REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio" Hardware: Dell Inspiron 700m laptop 1.7GHz Pentium M (Dothan 2M cache) 2GB RAM 1000Hz Gentoo Linux dyn-tick 700m # cat /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate 1 (microseconds, 10ms) 855gm integrated video/chipset xf86-video-i810 (intel 1.7.4) DRI enabled xorg-server-1.2.0-r3 beryl-core 0.3.0-svn MPlayer dev-SVN-rUNKNOWN-4.1.2 - x11 Gnome totem 2.16.5 - x11-gstreamer reiser4 w/cryptcompress Screenshot: http://ordorica.org/misc/beryl.png muine playing mp3's off mounted windows share Tests run under 16 bit color which provides a constant 75 fps on one cube side (fps forced limited). Drops to ~45-50 fps during animation/rotate/scale (depending on complexity of rendering) Vsync off. 75Hz refresh 1280x800. totem running fullscreen playing 700MB divx "An Inconvenient Truth.avi" on one side of cube/desktop gmplayer running fullscreen on another cube side (same file). The given observations/numbers are when I move the cube with my mouse and view two faces at one time (see screenshot). One face is playing the totem video, the other containing my terminals. Some numbers I've seen other people throw around: I don't know their relevance. cfs-v6: 700m kernel # cat sched_granularity_ns 500 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 221480300 139461200 181 0 6068 5317 69 6 25 0 4 0 0 220880300 139526800 176 0 6147 5579 68 6 27 0 1 0 0 220340300 139576800 167 0 6052 5393 70 6 24 0 6 0 0 219920300 139620400 103 0 5830 5211 73 6 21 0 top - 18:31:17 up 7:45, 5 users, load average: 5.18, 4.73, 4.28 Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 91.6%us, 6.4%sy, 0.0%ni, 0.3%id, 0.0%wa, 1.3%hi, 0.3%si, 0.0%st Mem: 2057700k total, 1845952k used, 211748k free, 300k buffers Swap: 987988k total,0k used, 987988k free, 1404040k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 18046 hechacke 20 0 189m 83m 20m S 38.7 4.2 12:04.64 totem 18059 hechacke 20 0 51280 30m 18m R 25.8 1.5 9:47.36 gmplayer 12117 root 20 0 275m 54m 18m R 20.2 2.7 15:18.38 Xorg 22730 hechacke 20 0 119m 35m 18m R 5.3 1.7 0:12.68 mono 12350 hechacke 20 0 63820 6776 4328 S 3.6 0.3 2:20.36 beryl 16465 hechacke 20 0 43960 15m 10m S 2.3 0.8 0:07.14 gnome-terminal 12200 hechacke 20 0 5308 4016 1740 S 0.3 0.2 0:05.45 gconfd-2 12215 hechacke 20 0 38704 8956 7588 S 0.3 0.4 0:08.90 xfce4-clipman-p Observation: Music plays perfectly. Audio of video's play perfectly. New processes take forever to start. Firefox (already cached in ram) takes about 5 seconds to start; even right after closing it. Browsing the web is slow. Already open applications are responsive. Behavior of video: video's both moving forward. totem is updating about every half second. mplayer updates about every 3 seconds. - cfs-v6: 700m kernel # cat sched_granularity_ns 200 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 99604 44 151936400 0 0 3903 5575 91 5 5 0 3 0 0 99512 44 151936400 0 0 5990 6783 72 5 23 0 3 0 0 100412 44 151936400 0 0 6858 7261 67 5 28 0 1 0 0 100412 44 151936400 0 0 7426 7634 62 4 34 0 4 0 0 100288 44 151936400 0 0 7039 7442 60 6 34 0 top - 19:05:09 up 8:18, 5 users, load average: 3.62, 4.16, 4.28 Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 69.8%us, 5.0%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.7%hi, 0.0%si, 0.0%st Mem: 2057700k total, 2009396k used,48304k free, 300k buffers Swap: 987988k total,0k used, 987988k free, 1555428k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 18059 hechacke 20 0 51332 30m 18m R 30.8 1.5 18:48.17 gmplayer 18046 hechacke 20 0 189m 83m 20m S 20.9 4.2 23:25.49 totem 12117 root 20 0 276m 57m 18m S 9.6 2.8 20:59.01 Xorg 22730 hechacke 20 0 129m 36m 18m R 8.6 1.8 1:28.59 mono 22930 hechacke 20 0 65480 8392 4320 S 4.0 0.4 0:53.38 beryl 12213 hechacke 20 0 34472 7680 6484 S 0.7 0.4 1:16.41 xfce4-battery-p Observation: Music plays perfectly. Audio of video's play perfectly. New processes take forever to start. Browsing the web is slow. Already open applications are responsive. Behavior of video: video's both moving forward. totem is updating about every 1/3 second. mplayer updates about every 1 seconds. Rotating the
Re: [PATCH -mm] consolidate asm/const.h to linux/const.h
From: Randy Dunlap <[EMAIL PROTECTED]> Date: Thu, 26 Apr 2007 20:50:08 -0700 > From: Randy Dunlap <[EMAIL PROTECTED]> > > Make a global linux/const.h header file instead of having multiple, > per-arch files, and convert current users of asm/const.h to use > linux/const.h. > > Built on x86_64 and sparc64. > > Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHY, resending
On Thu, 2007-04-26 at 18:18 -0700, Jeff Haran wrote: > From: Jeff Haran <[EMAIL PROTECTED]> > > Resending with Outlook patch mangling hopefully corrected (Maybe I > should write a HOWTO, this was harder than fixing the driver). Note, sorry about that, still mangled :-( Just send it as an attachment ... and get yourself a linux desktop :-) Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] consolidate asm/const.h to linux/const.h
From: Randy Dunlap <[EMAIL PROTECTED]> Make a global linux/const.h header file instead of having multiple, per-arch files, and convert current users of asm/const.h to use linux/const.h. Built on x86_64 and sparc64. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- include/asm-sparc64/Kbuild|1 - include/asm-sparc64/const.h | 19 --- include/asm-sparc64/lsu.h |2 +- include/asm-sparc64/mmu.h |2 +- include/asm-sparc64/page.h|2 +- include/asm-sparc64/pgtable.h |2 +- include/asm-sparc64/pstate.h |2 +- include/asm-sparc64/sfafsr.h |2 +- include/asm-x86_64/const.h| 20 include/asm-x86_64/page.h |2 +- include/asm-x86_64/pgtable.h |2 +- include/linux/Kbuild |1 + include/linux/const.h | 19 +++ 13 files changed, 28 insertions(+), 48 deletions(-) --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/const.h +++ /dev/null @@ -1,19 +0,0 @@ -/* const.h: Macros for dealing with constants. */ - -#ifndef _SPARC64_CONST_H -#define _SPARC64_CONST_H - -/* Some constant macros are used in both assembler and - * C code. Therefore we cannot annotate them always with - * 'UL' and other type specificers unilaterally. We - * use the following macros to deal with this. - */ - -#ifdef __ASSEMBLY__ -#define _AC(X,Y) X -#else -#define _AC(X,Y) (X##Y) -#endif - - -#endif /* !(_SPARC64_CONST_H) */ --- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/const.h +++ /dev/null @@ -1,20 +0,0 @@ -/* const.h: Macros for dealing with constants. */ - -#ifndef _X86_64_CONST_H -#define _X86_64_CONST_H - -/* Some constant macros are used in both assembler and - * C code. Therefore we cannot annotate them always with - * 'UL' and other type specificers unilaterally. We - * use the following macros to deal with this. - */ - -#ifdef __ASSEMBLY__ -#define _AC(X,Y) X -#else -#define __AC(X,Y) (X##Y) -#define _AC(X,Y) __AC(X,Y) -#endif - - -#endif /* !(_X86_64_CONST_H) */ --- /dev/null +++ linux-2.6.21-rc7-mm2/include/linux/const.h @@ -0,0 +1,19 @@ +/* const.h: Macros for dealing with constants. */ + +#ifndef _LINUX_CONST_H +#define _LINUX_CONST_H + +/* Some constant macros are used in both assembler and + * C code. Therefore we cannot annotate them always with + * 'UL' and other type specifiers unilaterally. We + * use the following macros to deal with this. + */ + +#ifdef __ASSEMBLY__ +#define _AC(X,Y) X +#else +#define __AC(X,Y) (X##Y) +#define _AC(X,Y) __AC(X,Y) +#endif + +#endif /* !(_LINUX_CONST_H) */ --- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/page.h +++ linux-2.6.21-rc7-mm2/include/asm-x86_64/page.h @@ -1,7 +1,7 @@ #ifndef _X86_64_PAGE_H #define _X86_64_PAGE_H -#include +#include /* PAGE_SHIFT determines the page size */ #define PAGE_SHIFT 12 --- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/pgtable.h +++ linux-2.6.21-rc7-mm2/include/asm-x86_64/pgtable.h @@ -1,7 +1,7 @@ #ifndef _X86_64_PGTABLE_H #define _X86_64_PGTABLE_H -#include +#include #ifndef __ASSEMBLY__ /* --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/lsu.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/lsu.h @@ -2,7 +2,7 @@ #ifndef _SPARC64_LSU_H #define _SPARC64_LSU_H -#include +#include /* LSU Control Register */ #define LSU_CONTROL_PM _AC(0x01fe,UL) /* Phys-watchpoint byte mask*/ --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/mmu.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/mmu.h @@ -1,8 +1,8 @@ #ifndef __MMU_H #define __MMU_H +#include #include -#include #include #define CTX_NR_BITS13 --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/page.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/page.h @@ -5,7 +5,7 @@ #ifdef __KERNEL__ -#include +#include #if defined(CONFIG_SPARC64_PAGE_SIZE_8KB) #define PAGE_SHIFT 13 --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/pgtable.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/pgtable.h @@ -15,13 +15,13 @@ #include #include +#include #include #include #include #include #include #include -#include /* The kernel image occupies 0x400 to 0x100 (4MB --> 32MB). * The page copy blockops can use 0x200 to 0x400. --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/pstate.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/pstate.h @@ -2,7 +2,7 @@ #ifndef _SPARC64_PSTATE_H #define _SPARC64_PSTATE_H -#include +#include /* The V9 PSTATE Register (with SpitFire extensions). * --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/sfafsr.h +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/sfafsr.h @@ -1,7 +1,7 @@ #ifndef _SPARC64_SFAFSR_H #define _SPARC64_SFAFSR_H -#include +#include /* Spitfire Asynchronous Fault Status register, ASI=0x4C VA<63:0>=0x0 */ --- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/Kbuild +++ linux-2.6.21-rc7-mm2/include/asm-sparc64/Kbuild @@ -8,7 +8,6 @@ header-y += apb.h header-y += asi.h header-y
Re: [00/17] Large Blocksize Support V3 (mmap conceptual discussion)
On Thu, 26 Apr 2007, Andrew Morton wrote: > > Sure, that addresses the larger I/O side of things, but it doesn't address > > the large filesystem blocksize issues that can only be solved with some kind > > of page aggregation abstraction. > > a) That wasn't a part of Christoph's original rationale list, so forgive >me for thinking it is not so important and got snuck in post-facto when >things got tough. It was definitely part of my thinking. I never thought anyone could do it differently so I did not emphasize it. > b) I don't immediately see why a filesystam cannot implement larger >blocksizes via this scheme - instantiate and lock four pages and go for >it. > > > Compound pages and high order page cache > > indexing solves this extremely neatly, regardless of whether the compound > > page is contiguous or not. > > We cannot say anything about neatness until we've seen mmap. Rough Draft was posted at http://marc.info/?l=linux-kernel=117709695522443=2 http://marc.info/?l=linux-kernel=117709215016822=2 http://marc.info/?l=linux-kernel=117709238129124=2 Basically 4k mmap semantics are preserved. One can mmap any 4k section of a compound page. state information is kept in the head page. So we have two page struct pointers to juggle 1. The one pointing to the page for address calculations COW etc. 2. The one pointing to the head page for state information. For each 4k pointer from a process to a compound page we would have to take a refcount. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.6.21 hangs during post boot initialization phase
On Fri, 27 Apr 2007, Peter Williams wrote: > > The 2.6.21 kernel is hanging during the post boot phase where various daemons > are being started (not always the same daemon unfortunately). > > This problem was not present in 2.6.21-rc7 and there is no oops or other > unusual output in the system log at the time the hang occurs. Can you use "git bisect" to narrow it down a bit more? It's only 125 commits, so bisecting even just three or four kernels will narrow it down to a handful. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rtc-cmos: make it load on PNPBIOS systems
On Thursday 26 April 2007, Marko Vrh wrote: > Replace CONFIG_PNPACPI with CONFIG_PNP, so it loads on > ACPI-less PNPBIOS systems. > > Signed-off-by: Marko Vrh <[EMAIL PROTECTED]> Acked-by: David Brownell <[EMAIL PROTECTED]> > > > > diff -urNp linux-2.6.21/drivers/rtc/rtc-cmos.c > linux-2.6.21.new/drivers/rtc/rtc-cmos.c > --- linux-2.6.21/drivers/rtc/rtc-cmos.c 2007-04-26 05:08:32.0 > +0200 > +++ linux-2.6.21.new/drivers/rtc/rtc-cmos.c 2007-04-26 23:58:36.0 > +0200 > @@ -379,12 +379,12 @@ static irqreturn_t cmos_interrupt(int ir > return IRQ_NONE; > } > > -#ifdef CONFIG_PNPACPI > -#define is_pnpacpi()1 > +#ifdef CONFIG_PNP > +#define is_pnp()1 > #define INITSECTION > > #else > -#define is_pnpacpi()0 > +#define is_pnp()0 > #define INITSECTION __init > #endif > > @@ -427,7 +427,7 @@ cmos_do_probe(struct device *dev, struct >* REVISIT for non-x86 systems we may need to handle io memory >* resources: ioremap them, and request_mem_region(). >*/ > - if (is_pnpacpi()) { > + if (is_pnp()) { > retval = request_resource(_resource, ports); > if (retval < 0) { > dev_dbg(dev, "i/o registers already in use\n"); > @@ -520,7 +520,7 @@ static void __exit cmos_do_remove(struct > > cmos_do_shutdown(); > > - if (is_pnpacpi()) > + if (is_pnp()) > release_resource(cmos->iomem); > rename_region(cmos->iomem, NULL); > > @@ -613,7 +613,7 @@ static int cmos_resume(struct device *de >* the device node will always be created as a PNPACPI device. >*/ > > -#ifdef CONFIG_PNPACPI > +#ifdef CONFIG_PNP > > #include > > @@ -684,11 +684,11 @@ static void __exit cmos_exit(void) > } > module_exit(cmos_exit); > > -#else/* no PNPACPI */ > +#else/* no PNP */ > > /**/ > > -/* Platform setup should have set up an RTC device, when PNPACPI is > +/* Platform setup should have set up an RTC device, when PNP is >* unavailable ... this could happen even on (older) PCs. >*/ > > @@ -734,7 +734,7 @@ static void __exit cmos_exit(void) > module_exit(cmos_exit); > > > -#endif /* !PNPACPI */ > +#endif /* !PNP */ > > MODULE_AUTHOR("David Brownell"); > MODULE_DESCRIPTION("Driver for PC-style 'CMOS' RTCs"); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.22 5/5] iw_cxgb3: Update required firmware revision to 4.0.0.
> Update required firmware revision to 4.0.0. Hmm... should we fold this into the earlier patch, which actually needs this new FW? Or at least merge this patch first? Also, is it cool with everyone to require a new FW, even for users who might not be using (or even building) the RDMA driver? I'm not sure what a good solution would be really, so maybe the pain of forcing everyone to update FW is the least bad thing to do. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] v3: utimensat implementation
One comment I got suggested to clean up the overflow tests. There is no reason to not perform the full micro-second overflow test in those two places. The four conditions are reduced by the compiler to just two tests so there's no problems with performance. Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]> diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index 796df69..12611c8 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -714,9 +714,10 @@ ia32_sys_call_table: .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad compat_sys_utimensat /* 320 */ ia32_syscall_end: diff --git a/fs/compat.c b/fs/compat.c index 040a8be..d8a8cb5 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -79,28 +79,55 @@ int compat_printk(const char *fmt, ...) */ asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf __user *t) { - struct timeval tv[2]; + struct timespec tv[2]; if (t) { if (get_user(tv[0].tv_sec, >actime) || get_user(tv[1].tv_sec, >modtime)) return -EFAULT; - tv[0].tv_usec = 0; - tv[1].tv_usec = 0; + tv[0].tv_nsec = 0; + tv[1].tv_nsec = 0; } return do_utimes(AT_FDCWD, filename, t ? tv : NULL); } +asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, struct compat_timespec __user *t) +{ + struct timespec tv[2]; + + if (t) { + if (get_compat_timespec([0], [0]) || + get_compat_timespec([1], [1])) + return -EFAULT; + + if ((tv[0].tv_nsec == UTIME_OMIT || tv[0].tv_nsec == UTIME_NOW) + && tv[0].tv_sec != 0) + return -EINVAL; + if ((tv[1].tv_nsec == UTIME_OMIT || tv[1].tv_nsec == UTIME_NOW) + && tv[1].tv_sec != 0) + return -EINVAL; + + if (tv[0].tv_nsec == UTIME_OMIT && tv[1].tv_nsec == UTIME_OMIT) + return 0; + } + return do_utimes(dfd, filename, t ? tv : NULL); +} + asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, struct compat_timeval __user *t) { - struct timeval tv[2]; + struct timespec tv[2]; if (t) { if (get_user(tv[0].tv_sec, [0].tv_sec) || - get_user(tv[0].tv_usec, [0].tv_usec) || + get_user(tv[0].tv_nsec, [0].tv_usec) || get_user(tv[1].tv_sec, [1].tv_sec) || - get_user(tv[1].tv_usec, [1].tv_usec)) + get_user(tv[1].tv_nsec, [1].tv_usec)) return -EFAULT; + if (tv[0].tv_nsec >= 100 || tv[0].tv_nsec < 0 || + tv[1].tv_nsec >= 100 || tv[1].tv_nsec < 0) + return -EINVAL; + tv[0].tv_nsec *= 1000; + tv[1].tv_nsec *= 1000; } return do_utimes(dfd, filename, t ? tv : NULL); } diff --git a/fs/utimes.c b/fs/utimes.c index 99cf2cb..d854af7 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -76,7 +77,7 @@ out: * must be owner or have write permission. * Else, update from *times, must be owner or super user. */ -long do_utimes(int dfd, char __user *filename, struct timeval *times) +long do_utimes(int dfd, char __user *filename, struct timespec *times) { int error; struct nameidata nd; @@ -100,11 +101,21 @@ long do_utimes(int dfd, char __user *filename, struct timeval *times) if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) goto dput_and_out; - newattrs.ia_atime.tv_sec = times[0].tv_sec; - newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000; - newattrs.ia_mtime.tv_sec = times[1].tv_sec; - newattrs.ia_mtime.tv_nsec = times[1].tv_usec * 1000; - newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET; + if (times[0].tv_nsec == UTIME_OMIT) + newattrs.ia_valid &= ~ATTR_ATIME; + else if (times[0].tv_nsec != UTIME_NOW) { + newattrs.ia_atime.tv_sec = times[0].tv_sec; + newattrs.ia_atime.tv_nsec = times[0].tv_nsec; + newattrs.ia_valid |= ATTR_ATIME_SET; + } + + if (times[1].tv_nsec == UTIME_OMIT) + newattrs.ia_valid &= ~ATTR_MTIME; + else if (times[1].tv_nsec != UTIME_NOW) { +
[PATCH] (i386) fix wrong comment for syscall stack layout
Fix wrong comment for syscall stack layout. `ret_from_sys_call' label no longer exist and `syscall_exit' label was introduced instead. Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]> Index: linux-2.6.21/arch/i386/kernel/entry.S === --- linux-2.6.21.orig/arch/i386/kernel/entry.S 2007-04-27 11:45:46.0 +0900 +++ linux-2.6.21/arch/i386/kernel/entry.S 2007-04-27 11:46:00.0 +0900 @@ -15,7 +15,7 @@ * I changed all the .align's to 4 (16 byte alignment), as that's faster * on a 486. * - * Stack layout in 'ret_from_system_call': + * Stack layout in 'syscall_exit': * ptrace needs to have all regs on the stack. * if the order here is changed, it needs to be * updated in fork.c:copy_process, signal.c:do_signal, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Fri, 27 Apr 2007 12:27:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote: > > On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote: > > > > > This patchset modifies the Linux kernel so that larger block sizes than > > > page size can be supported. Larger block sizes are handled by using > > > compound pages of an arbitrary order for the page cache instead of > > > single pages with order 0. > > > > Something I was looking for but couldn't find: suppose an application takes > > a pagefault against the third 4k page of an order-2 pagecache "page". We > > need to instantiate a pte against find_get_page(offset/4)+3. But these > > patches don't touch mm/memory.c at all and filemap_nopage() appears to > > return the zeroeth 4k page all the time in that case. > > > > So.. what am I missing, and how does that part work? > > "mmap not supported yet" ;) erk. I suspect this will have its sticky paws all over core mm. > > Also, afaict your important requirements would be met by retaining > > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by > > physically contiguous pages > > Sure, that addresses the larger I/O side of things, but it doesn't address > the large filesystem blocksize issues that can only be solved with some kind > of page aggregation abstraction. a) That wasn't a part of Christoph's original rationale list, so forgive me for thinking it is not so important and got snuck in post-facto when things got tough. b) I don't immediately see why a filesystam cannot implement larger blocksizes via this scheme - instantiate and lock four pages and go for it. > Compound pages and high order page cache > indexing solves this extremely neatly, regardless of whether the compound > page is contiguous or not. We cannot say anything about neatness until we've seen mmap. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.
On Thu, 26 Apr 2007 22:31:15 -0400 [EMAIL PROTECTED] wrote: > On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel) > > gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch Thanks for doing the bisection. It is boring, but helps so much. > Any ideas? You cc'ed the right guy ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] First "glitch1" results, 2.6.21-rc7-git6-CFSv5 + SD 0.46
On Thursday 26 April 2007 18:56, Con Kolivas wrote: > On Friday 27 April 2007 08:00, Bill Davidsen wrote: > > Ingo Molnar wrote: > > > * Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > >>> SD 0.46 1-2 FPS > > >>> cfs v5 nice -19 219-233 FPS > > >>> cfs v5 nice 0 1000-1996 > > >> > > >>cfs v5 nice -10 60-65 FPS > > > > > > the problem is, the glxgears portion of this test is an _inverse_ > > > testcase. > > > > > > The reason? glxgears on true 3D hardware will _not_ use X, it will > > > directly use the 3D driver of the kernel. So by renicing X to -19 you > > > give the xterms more chance to show stuff - the performance of the > > > glxgears will 'degrade' - but that is what you asked for: glxgears is > > > 'just another CPU hog' that competes with X, it's not a "true" X client. > > > > > > if you are after glxgears performance in this test then you'll get the > > > best performance out of this by renicing X to +19 or even SCHED_BATCH. > > > > Several points on this... > > > > First, I don't think this is accelerated in the way you mean, the > > machine is a test server, with motherboard video using the 945G video > > driver. Given the limitations of the support in that setup, I don't > > think it qualified as "true 3D hardware," although I guess I could try > > using the vesafb version as a test. > > > > The 2nd thing I note is that on FC6 this scheduler seems to confuse > > 'top' to some degree, since the glxgears is shown as taking 51% of the > > CPU (one core), while the state breakdown shows about 73% in idle, > > waitio, and int. image attached. > > top by itself certainly cannot be trusted to give true representation of the > cpu usage I'm afraid. It's not as convoluted as, say, trying to track memory > usage of an application, but top's resolution being tied to HZ accounting > makes it not reliable in that regard. > > > > After I upgrade the kernel and cfs to the absolute latest I'll repeat > > this, as well as test with vesafb, and my planned run under heavy load. > > I have a problem with your test case Bill. Its behaviour would depend on how > gpu bound vs cpu bound vs accelerated vs non-accelerated your graphics card > is. I get completely different results to those of the other testers given > the different hardware configuration and I don't think my results are > valuable. My problem with this testcase is - What would you define > as "perfect" behaviour for your test case? It seems far too arbitrary. Con, One thing I did not mention in all this is that renicing the glxgears process to -10 gets SD to give about 1000FPS, indeed you get most of this performance at -5 too. All in all SD does a very good job here. Get well soon! Ed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [mmc] alternative TI FM MMC/SD driver for 2.6.21-rc7
--- Pierre Ossman <[EMAIL PROTECTED]> wrote: > Sergey Yanovich wrote: > > > > I have found it easier to rewrite the driver, than to fix. > > Before you get your hopes up, this development model is not one that will get > your code merged upstream. You should really try to work with Alex, not side > step him. Drivers are rarely complex enough to warrant, or even have room > for, a > rewrite. And judging from your code it looks more like reorganising the code > that's already there. It is a sad truth. Instead of raising real issues that may remain in the driver, I was presented with "non-proof" that bus-adapter-device architecture I'm using is somehow bad and the driver should be turned into a monolithic blob, using config variables to disable unneeded functionality. Considering, that udev handles automatic loading of the drivers just fine (so it's not an end user issue at any rate), I don't see any justification for the change. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc7-mm2 breaks 'lvm vgscan'.
On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel) gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch The initrd on my system does an 'lvm vgscan' to get the root filesystem accessible. Under -rc5-mm2, this works fine. For -rc7-mm[12], it finds the disk: ata_piix :00:1f.2: version 2.11 ata_piix :00:1f.2: MAP [ P0 P2 IDE IDE ] ACPI: PCI Interrupt :00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device :00:1f.2 to 64 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001bfa0 irq 14 ata2: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001bfa8 irq 15 ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488 ata1.00: ATA-7: ST980825AS, 8.04, max UDMA/133 ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 0/32) ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488 ata1.00: configured for UDMA/133 ata2.00: ATAPI, max UDMA/33 ata2.00: configured for UDMA/33 scsi 0:0:0:0: Direct-Access ATA ST980825AS 8.04 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk sda1 is an ext3 /boot, sda2 is an LVM space covering the rest of the disk, so we're doing well so far. The 'lvm vgscan' fails and says 'No volume groups found, with no useful kernel messages issued. Then we get the infamous "Kernel panic - not syncing: Attempted to kill init!" when it can't find the root file system and we fall off the end of the initrd. Any ideas? pgpnzy74nWH0d.pgp Description: PGP signature
Re: MAINTAINERS file out of date?
On Thu, Apr 26, 2007 at 06:12:43PM -0700, H. Peter Anvin wrote: >WANG Cong wrote: >>> >>> I have considered myself as a rather unofficial maintainer of this code, >>> and wouldn't mind make it official now when I actually have a job which >>> both cares about and actually can support my upstream Linux activities, >>> which was a major pain for a while. >>> >>> Overall, there is a lot of cleanup which really is needed in the i386 >>> boot process; I have done some work on it already, but more is needed. >> >> Sounds interesting. Can you point me what needs to be done exactly? Maybe I >> can help you. ;) > >There was a long thread on the linux-virtualization list >(@lists.linux-foundation.org) just a few days ago. The biggest single >issue right now is probably how we transition from the bootup page >tables to the "real" init_mm page tables, but the real-mode code also >needs a massive overhaul (especially so since this code can and should >be shared with x86-64); in particular I really want to get rid of the >insane segment handling, where segments are constantly recalculated for >no good reason. > >For the real-mode stuff, I have some patches already in the works for >this. Eric W. Biederman has also done a lot of work in this area. > > -hpa Thanks for your point. I know little about virtualization, maybe can't help much. But I am interested in other things you mentioned. AFAIK, segments can't be avoided on i386, and Linux uses them very little, how are they recalculated constantly? Regards! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote: > On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote: > > > This patchset modifies the Linux kernel so that larger block sizes than > > page size can be supported. Larger block sizes are handled by using > > compound pages of an arbitrary order for the page cache instead of > > single pages with order 0. > > Something I was looking for but couldn't find: suppose an application takes > a pagefault against the third 4k page of an order-2 pagecache "page". We > need to instantiate a pte against find_get_page(offset/4)+3. But these > patches don't touch mm/memory.c at all and filemap_nopage() appears to > return the zeroeth 4k page all the time in that case. > > So.. what am I missing, and how does that part work? "mmap not supported yet" ;) > Also, afaict your important requirements would be met by retaining > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by > physically contiguous pages Sure, that addresses the larger I/O side of things, but it doesn't address the large filesystem blocksize issues that can only be solved with some kind of page aggregation abstraction. Compound pages and high order page cache indexing solves this extremely neatly, regardless of whether the compound page is contiguous or not. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
Neil Brown wrote: > Does it also specify how to find out what granularity is used by the > filesystem? I had a need for this just recently and couldn't see any > way to extract it. That's still on the table. We might end up with an fpathconf() solution. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: [patch] unprivileged mounts update
Quoting Miklos Szeredi ([EMAIL PROTECTED]): > > Quoting Miklos Szeredi ([EMAIL PROTECTED]): > > > > So then as far as you're concerned, the patches which were in -mm will > > > > remain unchanged? > > > > > > Basically yes. I've merged the update patch, which was not yet added > > > to -mm, did some cosmetic code changes, and updated the patch headers. > > > > > > There's one open point, that I think we haven't really explored, and > > > that is the propagation semantics. I think you had the idea, that a > > > propagated mount should inherit ownership from the parent into which > > > it was propagated. > > > > Don't think that was me. I stayed out of those early discussions > > because I wasn't comfortable guessing at the proper semantics yet. > > Yes, sorry, it was Eric's suggestion. > > > But really, I, as admin, have to set up both propagation and user mounts > > for a particular subtree, so why would I *not* want user mounts to be > > propagated? > > > > So, in my own situation, I have done > > > > make / rshared > > mount --bind /share /share > > make /share unbindable > > for u in $users; do > > mount --rbind / /share/$u/root > > make /share/$u/root rslave > > make /share/$u/root rshared > > mount --bind -o user=$u /share/$u/root/home/$u > > /share/$u/root/home/$u > > done > > > > All users get chrooted into /share/$USER/root, some also get their own > > namespace. Clearly if a user in a new namespace does > > > > mount --bind -o user=me ~/somedir ~/otherdir > > > > then logs out, and logs back in, I want the ~/otherdir in the new > > namespace (and the one in the 'init' namespace) to also be owned by > > 'me'. > > > > > That sounds good if everyone agrees? > > > > I've shown where I think propagating the mount owner is useful. Can you > > detail a scenario where doing so would be bad? Then we can work toward > > semantics that make sense... > > But in your example, the "propagated mount inherits ownership from > parent mount" would also work, since in all namespaces the owner of > the parent would necessary be "me". true. > The "inherits parent" semantics would work better for example in the > "all nosuid" namespace, where the user is free to modify it's mount > namespace. > > If for example propagation is set up from the initial namespace to > this user's namespace and a new mount is added to the initial > namespace, it would be nice if the propagated new mount would also be > owned by the user (and be "nosuid" of course). ok, so in the example i gave, this would be the admin in the initial namespace mounting something under /home/$USER/, which gets propagated to slave /share/$USER/root/home/$USER, where we would want a different mount owner. > Does the above make sense? I'm not sure I've explained clearly > enough. I think I see. Sounds like inherit from parent does the right thing all around, at least in cases we've thought of so far. thanks, -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] battery2 git repository
On Thu, Apr 26, 2007 at 05:09:28PM -0700, Greg KH wrote: > On Fri, Apr 27, 2007 at 03:29:02AM +0400, Anton Vorontsov wrote: > > > > To not confuse with David own battery-2.6 repository, it's called > > battery2-2.6, and can be found here: > > So, I'm still not quite understanding this, is David giving up on his > code now that you have pulled his changes into your tree? Or is there > some reason there are still two competing versions here? [ Of course David will answer for himself better. Below is what I'm seeing. ] David said he intend to port his OLPC and Apple PMU work to battery2 class implementation (and I volunteered to assist, i.e. port them myself ;-). So, when these two drivers will be ported, then there will no reasons for battery-2.6 existence anymore. As for the competition... No competition took place at all, there was two parallel works, each for two *independent* projects (OLPC and handhelds.org). battery2 should be suitable for both. David? > thanks, > > greg k-h Sincerely, -- Anton Vorontsov email: [EMAIL PROTECTED] backup email: [EMAIL PROTECTED] irc://irc.freenode.org/bd2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote: > This patchset modifies the Linux kernel so that larger block sizes than > page size can be supported. Larger block sizes are handled by using > compound pages of an arbitrary order for the page cache instead of > single pages with order 0. Something I was looking for but couldn't find: suppose an application takes a pagefault against the third 4k page of an order-2 pagecache "page". We need to instantiate a pte against find_get_page(offset/4)+3. But these patches don't touch mm/memory.c at all and filemap_nopage() appears to return the zeroeth 4k page all the time in that case. So.. what am I missing, and how does that part work? Also, afaict your important requirements would be met by retaining PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by physically contiguous pages - so instead of allocating and adding one 4k page, we allocate an order-2 page and sprinkle all four page*'s into the radix tree in one hit. That should be fairly straightforward to do, and could be made indistinguishably fast from doing a single 16k page for some common pagecache operations (gang-insert, gang-lookup). The BIO and block layers will do-the-right-thing with that pagecache and you end up with four times more data in the SG lists, worst-case. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
On Thursday April 26, [EMAIL PROTECTED] wrote: > The next revision of POSIX will support fine-grained filesystem > timestamps the way we already support. struct stat will report > nanosecond values. So far so good. Does it also specify how to find out what granularity is used by the filesystem? I had a need for this just recently and couldn't see any way to extract it. [If the mtime of a file matches the current time, then you cannot cache the contents of the file. You have to wait until the mtime is in the past. Without knowing the granularity, you cannot tell if the mtime still matches current time or not] Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpufreq: allow full selection of default governors
On Tue, Apr 24, 2007 at 08:03:27PM -0400, Dave Jones wrote: > On Tue, Apr 24, 2007 at 03:05:36PM -0700, Nish Aravamudan wrote: > > On 4/24/07, Dave Jones <[EMAIL PROTECTED]> wrote: > > > On Tue, Apr 24, 2007 at 09:03:23PM +, William Heimbigner wrote: > > > > The following patches should allow selection of conservative, > powersave, and > > > > ondemand in the kernel configuration. > > > > > > This has been rejected several times already. > > > Ondemand and conservative isn't a viable governor for all cpufreq > > > implementations (ie, ones with high switching latencies). > > > > This piques my curiosity -- some governors don't work with some > > cpufreq implementations. Are those implementations in the kernel or in > > userspace? If in the kernel, then perhaps there should be some > > dependency expressed there in Kconfig between cpufreq implementation > > and the available governors > > it can't be solved that easily. powernow-k8 for example is fine to > use with ondemand on newer systems, where the latency is low. > On older models however, it isn't. > > > > Also, see the > > > comment in the Kconfig a few lines above where you are adding this. > > > > Are these governors unfixable? If > > tbh, I've forgotten the original issues that caused the comment > to be placed there. Dominik ? Not unfixable, but: cpufreq is currently[*] built around the assumption that at least one governor is correctly initialized or can be brought to work when a CPU is registered with the cpufreq core. Dominik [*] That is, the last time I looked at it ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] change global zonelist order on NUMA v2
On Thu, 26 Apr 2007 18:25:10 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 27 Apr 2007, KAMEZAWA Hiroyuki wrote: > > > > DMA memory. > > > > > It seems a bit complicated. If we do so, following can occur, > > > > Node1: cpu0,1,2,3 > > Node0: cpu4,5,6,7 > > We were discussing a two node NUMA system. If you have more put it onto > the last. > Hmm, from technical point of view, renumbering may be an option. But I feel that it's not natural look and different from user's expectation... -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHYconfiguration
On Thu, 2007-04-26 at 17:28 -0700, Jeff Haran wrote: > > -Original Message- > > From: Benjamin Herrenschmidt [mailto:[EMAIL PROTECTED] > > Sent: Thursday, April 26, 2007 5:19 PM > > To: Jeff Haran > > Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH 1/1] IBM PPC EMAC driver:improved support > > for PHYconfiguration > > > > On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote: > > > From: Jeff Haran <[EMAIL PROTECTED]> > > > > > > This patch fixes some problems I found while debugging the IBM EMAC > > ... > > > Your patch appears to have been line wrapped by your mailer though... > > Benjamin, > > Friggin Outlook and our M$ centered corporate IT environment. Even when > you tell it to send plain text it modifies it. This was my first attempt > at patch submission, so I don't know all the hoops yet. > > I can try sending the patch again if I can figure out how to disable the > line wrap. Or if you have some FTP server I can put to I can send you > the patch file directly there. > > Please let me know which you'd prefer. In the worst case, just send the patch as a compressed attachment. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM
PCI transparent bridge This report is FYI, filed because of the kernel message. Using the kernel parameter seems to work fine. On boot there is a message that says this: Kernel:PCI: Transparent bridge - :00:1e.0 Kernel:PCI: Bus #04 (-#07) is hidden behind transparent bridge #03 (-#04) (try 'pci=assign-busses') When I pass 'pci=assign-busses' to the kernel then the message is only this: Kernel:PCI: Transparent bridge - :00:1e.0 Keywords: pci transparent bridge [EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/version Linux version 2.6.20-04-26-07 (2.6.20-1) ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Apr 26 04:15:26 CDT 2007 Scripts/ver_linux output: Linux debian5 2.6.20-04-26-07 #1 SMP Thu Apr 26 04:15:26 CDT 2007 i686 GNU/Linux Gnu C 4.1.2 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre2 e2fsprogs 1.40-WIP pcmciautils014 pcmcia-cs 3.2.8 Linux C Library2.3.6 Dynamic linker (ldd) 2.3.6 Procps 3.2.7 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 105 wireless-tools 28 Modules Loaded rfcomm l2cap bluetooth nfsd exportfs nfs lockd nfs_acl sunrpc ppdev lp thermal fan button processor ac battery ipv6 sbp2 scsi_mod loop eth1394 pcmcia firmware_class snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer yenta_socket snd rsrc_nonstatic i2c_i801 iTCO_wdt ide_cd soundcore snd_page_alloc cdrom pcspkr parport_pc parport uhci_hcd ehci_hcd i2c_core iTCO_vendor_support r8169 sdhci mmc_core ohci1394 ieee1394 pcmcia_core intel_agp agpgart serio_raw rtc usbcore psmouse ext3 jbd mbcache ide_disk generic piix ide_core evdev [EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping: 6 cpu MHz : 1828.883 cache size : 2048 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3661.67 clflush size: 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping: 6 cpu MHz : 1828.883 cache size : 2048 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3657.78 clflush size: 64 [EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/modules rfcomm 35288 0 - Live 0xf8d0d000 l2cap 22016 5 rfcomm, Live 0xf8c73000 bluetooth 48484 4 rfcomm,l2cap, Live 0xf8c8e000 nfsd 202672 17 - Live 0xf8d86000 exportfs 5568 1 nfsd, Live 0xf8c36000 nfs 219308 4 - Live 0xf8d4f000 lockd 58056 3 nfsd,nfs, Live 0xf8c7e000 nfs_acl 3520 2 nfsd,nfs, Live 0xf8c22000 sunrpc 148284 13 nfsd,nfs,lockd,nfs_acl, Live 0xf8ce7000 ppdev 8644 0 - Live 0xf8c67000 lp 11816 0 - Live 0xf8c41000 thermal 13832 0 - Live 0xf8c3c000 fan 4676 0 - Live 0xf8c2a000 button 7760 0 - Live 0xf8ae7000 processor 30008 1 thermal, Live 0xf8c2d000 ac 5124 0 - Live 0xf8af battery 9860 0 - Live 0xf8aec000 ipv6 233184 14 - Live 0xf8cad000 sbp2 21380 0 - Live 0xf8c1b000 scsi_mod 132812 1 sbp2, Live 0xf8c45000 loop 15816 0 - Live 0xf8adc000 eth1394 18500 0 - Live 0xf8ae1000 pcmcia 35436 0 - Live 0xf8ac2000 firmware_class 9536 1 pcmcia, Live 0xf89fc000 snd_hda_intel 19608 0 - Live 0xf8a8e000 snd_hda_codec 151680 1 snd_hda_intel, Live 0xf8af3000 snd_pcm_oss 38944 0 - Live 0xf8ab7000 snd_mixer_oss 15424 1 snd_pcm_oss, Live 0xf89f7000 snd_pcm 71172 3 snd_hda_intel,snd_hda_codec,snd_pcm_oss, Live 0xf8aa4000 snd_timer 20612 1 snd_pcm, Live 0xf89f yenta_socket 24844 1 - Live 0xf89e8000 snd 48036 6 snd_hda_intel,snd_hda_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer, Live 0xf8a81000 rsrc_nonstatic 12096 1 yenta_socket, Live 0xf896c000 i2c_i801 7500 0 - Live 0xf897 iTCO_wdt 10596 0 - Live 0xf893c000 ide_cd 36576 0 - Live 0xf89de000 soundcore 7520 1 snd, Live
[PATCH -rt] Stop interrupt storm for fasteoi.
Ingo, I've spent several days banging my head on this bug, and I finally found it. I originally thought we had a bug with the latency tracer, since it seemed to only occur when I turned on latency tracing. But I guess it just changed the timings to cause the bug to happen. Now that I found where the bug is, I don't know how it ever worked, even without the tracing. When taking an Ethernet interrupt, it was handled by handle_fasteoi_irq. handle_fasteoi_irq, when the irq is handled by a thread, sets the irq INPROGRESS and masks the irq. Before leaving handle_fasteoi_irq, a call to desc->chip->eoi is called. For the apic, this will call move_native_irq. In the -rt kernel, move_native_irq masks the irq, moves it, and then blindly unmasks it. So when interrupts are turned on next, we take an interrupt storm. The other handlers besides handle_fasteoi_irq mask the irq regardless of whether the irq is INPROGRESS. But handle_fasteoi_irq will not mask it, if the irq is already INPROGRESS, and just returns. So we keep taking the same interrupt. So, I change move_native_irq to not mask and unmask if the irq is currently INPROGRESS. But... I'm not sure if this is ok or not, since I don't know all the uses to this. My original patch was just to do a +@@ -68,6 +68,9 @@ void move_native_irq(int irq) if (unlikely(desc->status & IRQ_DISABLED)) return; if (unlikely(desc->status & IRQ_INPROGRESS)) + return; + desc->chip->mask(irq); move_masked_irq(irq); desc->chip->unmask(irq); But I thought that this might be too big of a hammer to this nail. So I changed it to the patch below. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Index: linux-2.6.21-rt1-i386/kernel/irq/migration.c === --- linux-2.6.21-rt1-i386.orig/kernel/irq/migration.c +++ linux-2.6.21-rt1-i386/kernel/irq/migration.c @@ -61,6 +61,7 @@ void move_masked_irq(int irq) void move_native_irq(int irq) { struct irq_desc *desc = irq_desc + irq; + int mask = 1; if (likely(!(desc->status & IRQ_MOVE_PENDING))) return; @@ -68,8 +69,17 @@ void move_native_irq(int irq) if (unlikely(desc->status & IRQ_DISABLED)) return; - desc->chip->mask(irq); + /* +* If the irq is already in progress, it should be masked. +* If we unmask it, we might cause an interrupt storm on RT. +*/ + if (unlikely(desc->status & IRQ_INPROGRESS)) + mask = 0; + + if (mask) + desc->chip->mask(irq); move_masked_irq(irq); - desc->chip->unmask(irq); + if (mask) + desc->chip->unmask(irq); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] drivers/net: move the nvidia forcedeth driver from 100M group to 1000M group
You are right,both ways would cause confusion,but sooner or later we need to move it because our NICs onward are all Gigabit and 100M NICs will disappear gradually in the future. Probably H.Peter's suggestion that have a single list for 100M and 1000M is a better choice. -Original Message- From: Lennart Sorensen [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 25, 2007 10:06 PM To: Peer Chen Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH] drivers/net: move the nvidia forcedeth driver from 100M group to 1000M group On Wed, Apr 25, 2007 at 01:30:04PM +0800, Peer Chen wrote: > nForce ehternet is a Gigabit NIC not 100M, move it to 1000M group to > avoid the confusion. The forcedeth on my nforce2 board is 100Mbit. I think the driver handles both 100Mbit and GBit type devices. Makes for an interesting categorization problem. Moving it would cause confusion. Leaving it where it is could cause confusion. Moving it may confuse existing users more though so I would recommend leaving it alone unless you can somehow make it appear in both. -- Len Sorensen --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] change global zonelist order on NUMA v2
On Fri, 27 Apr 2007, KAMEZAWA Hiroyuki wrote: > > DMA memory. > > > It seems a bit complicated. If we do so, following can occur, > > Node1: cpu0,1,2,3 > Node0: cpu4,5,6,7 We were discussing a two node NUMA system. If you have more put it onto the last. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] iop13xx: msi support (rev6)
From: Daniel Wolstenholme <[EMAIL PROTECTED]> Enable devices to signal interrupts via PCI memory cycles. rev6: * fix enable/disable typo, Michael Ellerman rev5: * fix up ack, enable, and disable for iop13xx_msi_chip rev4: * move smp compile fix to separate patch * use dynamic_irq_init in create_irq() * hookup mask/unmask routines in iop13xx_msi_chip rev3: * change msi.c to use linux/smp.h instead of asm/smp.h * call dynamic_irq_cleanup at destroy_irq time rev2: * destroy_irq did not take the full 128 bits of msi_irq_in_use into account * added missing '&' for calls to test_and_set_bit and clear_bit [EMAIL PROTECTED]: review comments/suggestions] [EMAIL PROTECTED]: cleanups/forward port to 2.6-git] Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Daniel Wolstenholme <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/arm/mach-iop13xx/Makefile |1 arch/arm/mach-iop13xx/irq.c|5 + arch/arm/mach-iop13xx/msi.c| 194 arch/arm/mach-iop13xx/pci.c| 16 +++ include/asm-arm/arch-iop13xx/iop13xx.h | 29 + include/asm-arm/arch-iop13xx/irqs.h|8 + include/asm-arm/arch-iop13xx/msi.h | 11 ++ 7 files changed, 262 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-iop13xx/Makefile b/arch/arm/mach-iop13xx/Makefile index 4185e05..02bd511 100644 --- a/arch/arm/mach-iop13xx/Makefile +++ b/arch/arm/mach-iop13xx/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_ARCH_IOP13XX) += pci.o obj-$(CONFIG_ARCH_IOP13XX) += io.o obj-$(CONFIG_MACH_IQ81340SC) += iq81340sc.o obj-$(CONFIG_MACH_IQ81340MC) += iq81340mc.o +obj-$(CONFIG_PCI_MSI) += msi.o diff --git a/arch/arm/mach-iop13xx/irq.c b/arch/arm/mach-iop13xx/irq.c index b2eb0b9..5791add 100644 --- a/arch/arm/mach-iop13xx/irq.c +++ b/arch/arm/mach-iop13xx/irq.c @@ -26,6 +26,7 @@ #include #include #include +#include /* INTCTL0 CP6 R0 Page 4 */ @@ -258,7 +259,7 @@ void __init iop13xx_init_irq(void) write_intbase(INTBASE); write_intsize(INTSIZE_4); - for(i = 0; i < NR_IOP13XX_IRQS; i++) { + for(i = 0; i <= IRQ_IOP13XX_HPI; i++) { if (i < 32) set_irq_chip(i, _irqchip1); else if (i < 64) @@ -271,4 +272,6 @@ void __init iop13xx_init_irq(void) set_irq_handler(i, handle_level_irq); set_irq_flags(i, IRQF_VALID | IRQF_PROBE); } + + iop13xx_msi_init(); } diff --git a/arch/arm/mach-iop13xx/msi.c b/arch/arm/mach-iop13xx/msi.c new file mode 100644 index 000..1b64eaa --- /dev/null +++ b/arch/arm/mach-iop13xx/msi.c @@ -0,0 +1,194 @@ +/* + * arch/arm/mach-iop13xx/msi.c + * + * PCI MSI support for the iop13xx processor + * + * Copyright (c) 2006, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ +#include +#include +#include +#include + + +#define IOP13XX_NUM_MSI_IRQS 128 +static DECLARE_BITMAP(msi_irq_in_use, IOP13XX_NUM_MSI_IRQS); + +/* IMIPR0 CP6 R8 Page 1 + */ +static inline u32 read_imipr_0(void) +{ + u32 val; + asm volatile("mrc p6, 0, %0, c8, c1, 0":"=r" (val)); + return val; +} +static inline void write_imipr_0(u32 val) +{ + asm volatile("mcr p6, 0, %0, c8, c1, 0"::"r" (val)); +} + +/* IMIPR1 CP6 R9 Page 1 + */ +static inline u32 read_imipr_1(void) +{ + u32 val; + asm volatile("mrc p6, 0, %0, c9, c1, 0":"=r" (val)); + return val; +} +static inline void write_imipr_1(u32 val) +{ + asm volatile("mcr p6, 0, %0, c9, c1, 0"::"r" (val)); +} + +/* IMIPR2 CP6 R10 Page 1 + */ +static inline u32 read_imipr_2(void) +{ + u32 val; + asm volatile("mrc p6, 0, %0, c10, c1, 0":"=r" (val)); + return val; +} +static inline void write_imipr_2(u32 val) +{ + asm volatile("mcr p6, 0, %0, c10, c1, 0"::"r" (val)); +} + +/* IMIPR3 CP6 R11 Page 1 + */ +static inline u32 read_imipr_3(void) +{ + u32 val; + asm volatile("mrc p6, 0, %0, c11, c1, 0":"=r" (val)); + return val; +} +static inline void write_imipr_3(u32 val) +{ + asm volatile("mcr p6, 0, %0, c11, c1, 0"::"r" (val)); +} + +static u32 (*read_imipr[])(void) = { + read_imipr_0, + read_imipr_1, + read_imipr_2, + read_imipr_3, +}; + +static void (*write_imipr[])(u32) = { +
[PATCH 1/2] msi: fix ARM compile
In file included from drivers/pci/msi.c:22: include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory include/asm/smp.h:20:3: #error " included in non-SMP build" include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined In file included from include/linux/sched.h:65, from include/linux/mm.h:4, from drivers/pci/msi.c:10: include/linux/smp.h:85:1: warning: this is the location of the previous definition Tested on powerpc, i386, and x86_64. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- drivers/pci/msi.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 435c195..c642e3b 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -16,10 +16,10 @@ #include #include #include +#include #include #include -#include #include "pci.h" #include "msi.h" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] IBM PPC EMAC driver:improved support for PHY, resending
From: Jeff Haran <[EMAIL PROTECTED]> Resending with Outlook patch mangling hopefully corrected (Maybe I should write a HOWTO, this was harder than fixing the driver). This patch fixes some problems I found while debugging the IBM EMAC driver for PPC32 systems. The first problem was in the function that configures the PHY for autonegotiation, genmii_setup_aneg(). The original code does a read/modify/write of the autonegotiation advertizement register (reg 4), followed by a read/modify/write of the control register (reg 0). While the original code follows the proper procedure as per reading the IEEE specs, what I found is that on at least one PHY model (National DP83843) the read of the control register comes back with the soft reset bit set (bit 15). Because of the read/modify/write operation, this causes the write to write a 1 back to the reset bit, which initiates a software reset of the PHY. This software reset causes the PHY to return to its power up state which advertizes all modes of operation, thus negating the write to the autoneg advertizement register. The modification is to spin reading the control register until the soft reset bit is clear before doing the modify/write. I guess this bit is in reality more of the "device busy" bit on at least some PHYs. The second problem was in the function that configures the PHY for forced operation, genmii_setup_forced(). The original code initiates a software reset operation via a write of a 1 to bit 15 of the control register (reg 0), but then proceeds to do a second write to that same register without waiting until that reset bit is cleared by the PHY itself (which according to the IEEE specs indicates that the PHY reset is complete). This is a violation of how one is supposed to use this software reset feature of these PHYs and I believe was the cause of mysterious, difficult to reproduce link failures that we've observed on some of our systems that use this driver. The fix is to modify the function so that it spins waiting for the reset bit to clear after doing the soft reset and before doing the subsequent write. Since this modification, we haven't seen the mysterious link failures, though they were so rare its difficult to say at this point whether this was the cause. I also added some error handling and reporting for the abnormal case where the reset bit never clears from the soft reset operation. Applied to kernel version 2.6.21. Signed-off-by: Jeff Haran <[EMAIL PROTECTED]> --- --- linux-2.6.21/drivers/net/ibm_emac/ibm_emac_phy.c.orig 2007-04-25 20:08:32.0 -0700 +++ linux-2.6.21/drivers/net/ibm_emac/ibm_emac_phy.c2007-04-26 14:42:09.562996000 -0700 @@ -22,8 +22,12 @@ #include +#include "ibm_emac_core.h" #include "ibm_emac_phy.h" +#define NL "\n" +#define PHY_DBG(f,x...) printk("emac" f, ##x) + static inline int phy_read(struct mii_phy *phy, int reg) { return phy->mdio_read(phy->dev, phy->address, reg); @@ -34,11 +38,34 @@ static inline void phy_write(struct mii_ phy->mdio_write(phy->dev, phy->address, reg, val); } -int mii_reset_phy(struct mii_phy *phy) +/* + * polls MII_BMCR until BMCR_RESET bit clears or operation times out. + * + * returns: + * >= 0 => success, value in BMCR returned to caller + * -EBUSY => failure, RESET bit never cleared + * otherwise => failure, lower level PHY read failed + */ + +static int mii_spin_reset_complete(struct mii_phy *phy) { int val; int limit = 1; + while (limit--) { + val = phy_read(phy, MII_BMCR); + if ((val >= 0) && ((val & BMCR_RESET) == 0)) + return val; /* success */ + udelay(10); + } + + return (val < 0) ? val : -EBUSY; +} + +int mii_reset_phy(struct mii_phy *phy) +{ + int val; + val = phy_read(phy, MII_BMCR); val &= ~BMCR_ISOLATE; val |= BMCR_RESET; @@ -46,16 +73,17 @@ int mii_reset_phy(struct mii_phy *phy) udelay(300); - while (limit--) { - val = phy_read(phy, MII_BMCR); - if (val >= 0 && (val & BMCR_RESET) == 0) - break; - udelay(10); + val = mii_spin_reset_complete(phy); + + if (val < 0) { + PHY_DBG("%d: reset_complete failed in reset %d" NL, + ((struct ocp_enet_private *) (phy->dev->priv))->def->index, val); + } else { + if (val & BMCR_ISOLATE) + phy_write(phy, MII_BMCR, val & ~BMCR_ISOLATE); } - if ((val & BMCR_ISOLATE) && limit > 0) - phy_write(phy, MII_BMCR, val & ~BMCR_ISOLATE); - return limit <= 0; + return val < 0; } static int genmii_setup_aneg(struct mii_phy *phy, u32 advertise) @@ -102,9 +130,18 @@ static int genmii_setup_aneg(struct mii_ } /* Start/Restart aneg */ - ctl = phy_read(phy, MII_BMCR); - ctl |= (BMCR_ANENABLE | BMCR_ANRESTART); -
Pin an IRQ to a particular CPU
Hello, In an effort to increase over all throughput of my Linux NFS file server, I thought about trying to force an IRQ, for the NIC, to be serviced by a particular CPU. Is this possible? TIA, Phy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-2.6.21-rc7-mm2 fails to compile
On Thu, 26 Apr 2007 17:54:20 -0700 Andrew Morton wrote: > On Fri, 27 Apr 2007 00:00:15 + (GMT) William Heimbigner <[EMAIL > PROTECTED]> wrote: > > > Output leading up to the error: > > > >CC drivers/macintosh/macio-adb.o > >LD drivers/macintosh/built-in.o > >CC [M] drivers/macintosh/apm_emu.o > >CC [M] drivers/macintosh/therm_windtunnel.o > > drivers/macintosh/therm_windtunnel.c: In function 'therm_of_remove': > > drivers/macintosh/therm_windtunnel.c:462: error: void value not ignored as > > it > > ought to be > > drivers/macintosh/therm_windtunnel.c:463: warning: control reaches end of > > non-void function > > make[2]: *** [drivers/macintosh/therm_windtunnel.o] Error 1 > > make[1]: *** [drivers/macintosh] Error 2 > > make: *** [drivers] Error 2 > > > > This is on an iMac G3 powerpc. > > i2c_del_driver() was changed to reutrn void in the i2c tree. > > --- > a/drivers/macintosh/therm_windtunnel.c~fix-jdelvare-i2c-i2c-del-driver-returns-void-on-powerpc > +++ a/drivers/macintosh/therm_windtunnel.c > @@ -459,7 +459,8 @@ therm_of_probe( struct of_device *dev, c > static int > therm_of_remove( struct of_device *dev ) > { > - return i2c_del_driver( _driver ); > + i2c_del_driver( _driver ); > + return 0; > } > > static struct of_device_id therm_of_match[] = {{ This will fail on the "ugly parentheses" test... :) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS file out of date?
WANG Cong wrote: >> >> I have considered myself as a rather unofficial maintainer of this code, >> and wouldn't mind make it official now when I actually have a job which >> both cares about and actually can support my upstream Linux activities, >> which was a major pain for a while. >> >> Overall, there is a lot of cleanup which really is needed in the i386 >> boot process; I have done some work on it already, but more is needed. > > Sounds interesting. Can you point me what needs to be done exactly? Maybe I > can help you. ;) There was a long thread on the linux-virtualization list (@lists.linux-foundation.org) just a few days ago. The biggest single issue right now is probably how we transition from the bootup page tables to the "real" init_mm page tables, but the real-mode code also needs a massive overhaul (especially so since this code can and should be shared with x86-64); in particular I really want to get rid of the insane segment handling, where segments are constantly recalculated for no good reason. For the real-mode stuff, I have some patches already in the works for this. Eric W. Biederman has also done a lot of work in this area. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
H. Peter Anvin wrote: > Primarily to let a backup program restore the full state of the filesystem. Is this wanted? Or needed? I would think there are good reasons why this hasn't been done so far. Intrusion detection is one reason I can think of. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: MAINTAINERS file out of date?
On Thu, Apr 26, 2007 at 04:13:01PM -0700, H. Peter Anvin wrote: >Michael McConnell wrote: >> Hi folks, >> >> I noticed the MAINTAINERS file still lists Riley Williams as the >> maintainer of the i386 boot code, presumably as no-one else has taken it >> up in his absence (though, I'm sure it's probably been touched since >> late 2004). >> >> If I knew the first thing about the i386 boot process I would volunteer, >> but sadly I don't >> >> This article below details the reason for his absence, but please make >> sure you're sitting down before clicking. (It is work-safe.) >> http://www.chorleytoday.co.uk/ViewArticle2.aspx?SectionID=81=905248 >> > >Dear. > >I have considered myself as a rather unofficial maintainer of this code, >and wouldn't mind make it official now when I actually have a job which >both cares about and actually can support my upstream Linux activities, >which was a major pain for a while. > >Overall, there is a lot of cleanup which really is needed in the i386 >boot process; I have done some work on it already, but more is needed. > Sounds interesting. Can you point me what needs to be done exactly? Maybe I can help you. ;) Have fun! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] ia64: convert to use clocksource code
* Peter Keilty ([EMAIL PROTECTED]) wrote: > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index 6077300..35ad71f 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -480,10 +480,12 @@ #endif > /* Get end time (ticks) */ > t2 = inl(acpi_gbl_FADT.xpm_timer_block.address); > > +#ifndef CONFIG_IA64 > #ifdef CONFIG_GENERIC_TIME > /* TSC halts in C2, so notify users */ > mark_tsc_unstable(); > #endif > +#endif Is this a better description of the dependency? #if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC) > /* Re-enable interrupts */ > local_irq_enable(); > current_thread_info()->status |= TS_POLLING; > @@ -522,10 +524,12 @@ #endif > acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0); > } > > +#ifndef CONFIG_IA64 > #ifdef CONFIG_GENERIC_TIME > /* TSC halts in C3, so notify users */ > mark_tsc_unstable(); > #endif > +#endif ditto > /* Re-enable interrupts */ > local_irq_enable(); > current_thread_info()->status |= TS_POLLING; > diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c > index 0be700f..5ea7d3e 100644 > --- a/drivers/char/hpet.c > +++ b/drivers/char/hpet.c > @@ -29,6 +29,7 @@ #include > #include > #include > #include > +#include > > #include > #include > @@ -51,8 +52,34 @@ #defineHPET_DRIFT (500) > > #define HPET_RANGE_SIZE 1024/* from HPET spec */ > > +#if BITS_PER_LONG == 64 > +#define write_counter(V, MC)writeq(V, MC) > +#define read_counter(MC)readq(MC) > +#else > +#define write_counter(V, MC)writel(V, MC) > +#define read_counter(MC)readl(MC) > +#endif > + > static u32 hpet_nhpet, hpet_max_freq = HPET_USER_FREQ; > > +static void __iomem *hpet_mc_ptr; CodingStyle nit: we don't need all this _ptr... > +static cycle_t read_hpet(void) > +{ > + return (cycle_t)read_counter((void __iomem *)hpet_mc_ptr); > +} > + > +static struct clocksource clocksource_hpet = { > +.name = "hpet", > +.rating = 300, > +.read = read_hpet, > +.mask = 0xLL, > +.mult = 0, /*to be caluclated*/ > +.shift = 10, > +.is_continuous = 1, > +}; > +static struct clocksource *hpet_clocksource_p; and _p naming. > + > /* A lock for concurrent access by app and isr hpet activity. */ > static DEFINE_SPINLOCK(hpet_lock); > /* A lock for concurrent intermodule access to hpet and isr hpet activity. */ > @@ -79,7 +106,7 @@ struct hpets { > struct hpets *hp_next; > struct hpet __iomem *hp_hpet; > unsigned long hp_hpet_phys; > - struct time_interpolator *hp_interpolator; > + struct clocksource *hp_clocksource; > unsigned long long hp_tick_freq; > unsigned long hp_delta; > unsigned int hp_ntimer; > @@ -94,13 +121,6 @@ #define HPET_IE 0x0002 /* interrupt en > #define HPET_PERIODIC 0x0004 > #define HPET_SHARED_IRQ 0x0008 > > -#if BITS_PER_LONG == 64 > -#define write_counter(V, MC)writeq(V, MC) > -#define read_counter(MC)readq(MC) > -#else > -#define write_counter(V, MC)writel(V, MC) > -#define read_counter(MC)readl(MC) > -#endif > > #ifndef readq > static inline unsigned long long readq(void __iomem *addr) > @@ -737,27 +757,6 @@ static ctl_table dev_root[] = { > > static struct ctl_table_header *sysctl_header; > > -static void hpet_register_interpolator(struct hpets *hpetp) > -{ > -#ifdef CONFIG_TIME_INTERPOLATION > - struct time_interpolator *ti; > - > - ti = kzalloc(sizeof(*ti), GFP_KERNEL); > - if (!ti) > - return; > - > - ti->source = TIME_SOURCE_MMIO64; > - ti->shift = 10; > - ti->addr = >hp_hpet->hpet_mc; > - ti->frequency = hpetp->hp_tick_freq; > - ti->drift = HPET_DRIFT; > - ti->mask = -1; > - > - hpetp->hp_interpolator = ti; > - register_time_interpolator(ti); > -#endif > -} > - > /* > * Adjustment for when arming the timer with > * initial conditions. That is, main counter > @@ -909,7 +908,14 @@ int hpet_alloc(struct hpet_data *hdp) > } > > hpetp->hp_delta = hpet_calibrate(hpetp); > - hpet_register_interpolator(hpetp); > + > + if (!hpet_clocksource_p) { > + clocksource_hpet.fsys_mmio_ptr = hpet_mc_ptr = > >hp_hpet->hpet_mc; > + clocksource_hpet.mult = clocksource_hz2mult(hpetp->hp_tick_freq, > + clocksource_hpet.shift); > + clocksource_register(_hpet); > + hpet_clocksource_p = hpetp->hp_clocksource = _hpet; > + } This looks like a change in behaviour for non-ia64. Now i386 and x86_64 will
Re: [patch] CFS scheduler, -v6
On Thursday 26 April 2007, Con Kolivas wrote: >On Friday 27 April 2007 10:39, Gene Heskett wrote: >> Not necessarily Con. Do you have a fresh one for 2.6.21? > >Since people get nervous about any rejects here is an (otherwise identical) >patch for 2.6.21 > >http://ck.kolivas.org/patches/staircase-deadline/2.6.21-sd-0.46.patch Thanks, I'll try to give it some cpu time in the next day. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Our informal mission is to improve the love life of operators worldwide. -- Peter Behrendt, president of Exabyte - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86_64: dynamic MCE poll interval
From: Tim Hockin <[EMAIL PROTECTED]> Background: We've found that MCEs (specifically DRAM SBEs) tend to come in bunches, especially when we are trying really hard to stress the system out. The current MCE poller uses a static interval which does not care whether it has or has not found MCEs recently. Description: This patch makes the MCE poller adjust the polling interval dynamically. If we find an MCE, poll 2x faster (down to 10 ms). When we stop finding MCEs, poll 2x slower (up to check_interval seconds). The check_interval tunable becomes the max polling interval. Result: If you start to take a lot of correctable errors (not exceptions), you log them faster and more accurately (less chance of overflowing the MCA registers). If you don't take a lot of errors, you will see no change. Alternatives: I considered simply reducing the polling interval to 10 ms immediately and keeping it there as long as we continue to find errors. This felt a bit heavy handed, but does perform significantly better for the default check_interval of 5 minutes (we're using a few seconds when testing for DRAM errors). Patch: This patch is against 2.6.21-rc7. Signed-Off-By: Tim Hockin <[EMAIL PROTECTED]>o --- diff -pruN linux-2.6.20/arch/x86_64/kernel/mce.c linux-2.6.20+th/arch/x86_64/kernel/mce.c --- linux-2.6.20/arch/x86_64/kernel/mce.c 2007-04-24 23:36:04.0 -0700 +++ linux-2.6.20+th/arch/x86_64/kernel/mce.c2007-04-26 10:40:29.0 -0700 @@ -327,6 +327,7 @@ void mce_log_therm_throt_event(unsigned */ static int check_interval = 5 * 60; /* 5 minutes */ +static int next_interval; /* in jiffies */ static void mcheck_timer(struct work_struct *work); static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer); @@ -339,7 +340,6 @@ static void mcheck_check_cpu(void *info) static void mcheck_timer(struct work_struct *work) { on_each_cpu(mcheck_check_cpu, NULL, 1, 1); - schedule_delayed_work(_work, check_interval * HZ); /* * It's ok to read stale data here for notify_user and @@ -349,17 +349,24 @@ static void mcheck_timer(struct work_str * writes. */ if (notify_user && console_logged) { + /* if we logged an MCE, reduce the polling interval */ + next_interval = max(next_interval/2, HZ/100); notify_user = 0; clear_bit(0, _logged); printk(KERN_INFO "Machine check events logged\n"); + } else { + next_interval = min(next_interval*2, check_interval*HZ); } + + schedule_delayed_work(_work, next_interval); } static __init int periodic_mcheck_init(void) { - if (check_interval) - schedule_delayed_work(_work, check_interval*HZ); + next_interval = check_interval * HZ; + if (next_interval) + schedule_delayed_work(_work, next_interval); return 0; } __initcall(periodic_mcheck_init); @@ -597,12 +604,13 @@ static int mce_resume(struct sys_device /* Reinit MCEs after user configuration changes */ static void mce_restart(void) { - if (check_interval) + if (next_interval) cancel_delayed_work(_work); /* Timer race is harmless here */ on_each_cpu(mce_init, NULL, 1, 1); - if (check_interval) - schedule_delayed_work(_work, check_interval*HZ); + next_interval = check_interval * HZ; + if (next_interval) + schedule_delayed_work(_work, next_interval); } static struct sysdev_class mce_sysclass = { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] v2: utimensat implementation
Oops, little bug, I hooked-up the wrong syscall for the IA32 compat code. Fixed in this revision. Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]> diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index 796df69..12611c8 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -714,9 +714,10 @@ ia32_sys_call_table: .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad compat_sys_utimensat /* 320 */ ia32_syscall_end: diff --git a/fs/compat.c b/fs/compat.c index 040a8be..1644cd1 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -79,28 +79,55 @@ int compat_printk(const char *fmt, ...) */ asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf __user *t) { - struct timeval tv[2]; + struct timespec tv[2]; if (t) { if (get_user(tv[0].tv_sec, >actime) || get_user(tv[1].tv_sec, >modtime)) return -EFAULT; - tv[0].tv_usec = 0; - tv[1].tv_usec = 0; + tv[0].tv_nsec = 0; + tv[1].tv_nsec = 0; } return do_utimes(AT_FDCWD, filename, t ? tv : NULL); } +asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, struct compat_timespec __user *t) +{ + struct timespec tv[2]; + + if (t) { + if (get_compat_timespec([0], [0]) || + get_compat_timespec([1], [1])) + return -EFAULT; + + if ((tv[0].tv_nsec == UTIME_OMIT || tv[0].tv_nsec == UTIME_NOW) + && tv[0].tv_sec != 0) + return -EINVAL; + if ((tv[1].tv_nsec == UTIME_OMIT || tv[1].tv_nsec == UTIME_NOW) + && tv[1].tv_sec != 0) + return -EINVAL; + + if (tv[0].tv_nsec == UTIME_OMIT && tv[1].tv_nsec == UTIME_OMIT) + return 0; + } + return do_utimes(dfd, filename, t ? tv : NULL); +} + asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, struct compat_timeval __user *t) { - struct timeval tv[2]; + struct timespec tv[2]; if (t) { if (get_user(tv[0].tv_sec, [0].tv_sec) || - get_user(tv[0].tv_usec, [0].tv_usec) || + get_user(tv[0].tv_nsec, [0].tv_usec) || get_user(tv[1].tv_sec, [1].tv_sec) || - get_user(tv[1].tv_usec, [1].tv_usec)) + get_user(tv[1].tv_nsec, [1].tv_usec)) return -EFAULT; + if (tv[0].tv_nsec > LONG_MAX / 1000 + || tv[1].tv_nsec > LONG_MAX / 1000) + return -EINVAL; + tv[0].tv_nsec *= 1000; + tv[1].tv_nsec *= 1000; } return do_utimes(dfd, filename, t ? tv : NULL); } diff --git a/fs/utimes.c b/fs/utimes.c index 99cf2cb..dc6612e 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -76,7 +77,7 @@ out: * must be owner or have write permission. * Else, update from *times, must be owner or super user. */ -long do_utimes(int dfd, char __user *filename, struct timeval *times) +long do_utimes(int dfd, char __user *filename, struct timespec *times) { int error; struct nameidata nd; @@ -100,11 +101,21 @@ long do_utimes(int dfd, char __user *filename, struct timeval *times) if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) goto dput_and_out; - newattrs.ia_atime.tv_sec = times[0].tv_sec; - newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000; - newattrs.ia_mtime.tv_sec = times[1].tv_sec; - newattrs.ia_mtime.tv_nsec = times[1].tv_usec * 1000; - newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET; + if (times[0].tv_nsec == UTIME_OMIT) + newattrs.ia_valid &= ~ATTR_ATIME; + else if (times[0].tv_nsec != UTIME_NOW) { + newattrs.ia_atime.tv_sec = times[0].tv_sec; + newattrs.ia_atime.tv_nsec = times[0].tv_nsec; + newattrs.ia_valid |= ATTR_ATIME_SET; + } + + if (times[1].tv_nsec == UTIME_OMIT) + newattrs.ia_valid &= ~ATTR_MTIME; + else if (times[1].tv_nsec != UTIME_NOW) { + newattrs.ia_mtime.tv_sec = times[1].tv_sec; + newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; + newattrs.ia_valid |= ATTR_MTIME_SET; + } }
Re: [patch] CFS scheduler, -v6
On Friday 27 April 2007 10:39, Gene Heskett wrote: > Not necessarily Con. Do you have a fresh one for 2.6.21? Since people get nervous about any rejects here is an (otherwise identical) patch for 2.6.21 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-sd-0.46.patch -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
Ulrich Drepper wrote: > H. Peter Anvin wrote: >> I'm a bit leery of abusing the timespec value like this, though. A >> flags field seem like it would be cleaner. > > It's ugly. Then you have the parameter, which might have nice valid > values, and they get ignored. I thought about it when this was > discussed in the working group and thought it's a toss up. It's pretty ugly either way :-/ >> Something else... if we're dickering with these interfaces, shouldn't we >> allow setting atime as well? > > Why? To allow somebody to hide her/his tracks? Primarily to let a backup program restore the full state of the filesystem. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-2.6.21 and __udivid3
It attachs, Here is another compilation failure. make CHK include/linux/version.h CHK include/linux/utsrelease.h CHK include/linux/compile.h CC [M] drivers/w1/w1.o drivers/w1/w1.c: In function 'w1_slave_read_id': drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' drivers/w1/w1.c:118: error: cannot take address of bit-field 'family' make[2]: *** [drivers/w1/w1.o] Error 1 make[1]: *** [drivers/w1] Error 2 make: *** [drivers] Error 2 I don't, i think i have to go back to gcc-4.2.0 which was fine with linux-2.6.21-rc7. It will be same to 2.6.21. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: W1 printk format warning
> From: Andrew Morton [mailto:[EMAIL PROTECTED] > On Thu, 26 Apr 2007 22:17:44 +0400 Evgeniy Polyakov <[EMAIL PROTECTED]> > wrote: > > > On Wed, Apr 25, 2007 at 04:45:09PM -0700, Andrew Morton ([EMAIL PROTECTED] > foundation.org) wrote: > > > On Wed, 25 Apr 2007 16:21:04 -0700 Randy Dunlap <[EMAIL PROTECTED]> > wrote: > > > > > > > in 2.6.21-rc7-mm1. Are you aware of this? > > > > > > > > drivers/w1/w1.c:460: warning: too few arguments for format > > > > > > > > dev_dbg(>dev, "%s: registering %s as %p.\n", __func__, > > > > >dev.bus_id[0]); > > > > > > > > > > Yeah, that's because Dan's dev_dbg-check-dev_dbg-arguments.patch added > > > printk arg-checking to dev_dbg() and a bunch of bugs got exposed. I fixed > > > a few of them. > > > > Could you post a full list for w1, if there is only one, here is a fix. > > > > There's only the one in w1, I think. I fixed a few more but I didn't > actually go looking for more such errors. Feel free to bug me if there is more fallout that is not otherwise picked up... -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
H. Peter Anvin wrote: > I'm a bit leery of abusing the timespec value like this, though. A > flags field seem like it would be cleaner. It's ugly. Then you have the parameter, which might have nice valid values, and they get ignored. I thought about it when this was discussed in the working group and thought it's a toss up. > Something else... if we're dickering with these interfaces, shouldn't we > allow setting atime as well? Why? To allow somebody to hide her/his tracks? -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: [PATCH] utimensat implementation
Andrew Morton wrote: > Does the spec say what the OS should do if (ts_nsec => 1e9)? Yes, return EINVAL. We already do this. It's just that now we have to recognize two special values. > OK, so there's no collision on ts_nsec if unnormalised timespecs are > disallowed. Indeed, that's the basis of using the special values. I chose the values of the constants so that they are a) out of the way of valid values and b) don't have to be adjusted for 32-bit compat code. > But there's a potential collision on ts_sec? Do we know what date that > corresponds to? No, there is no collision. The tv_sec value is relevant. The UTIME_OMIT and UTIME_NOW value refers to the atime/mtime respectively, not just the tv_nsec field of either. It makes no sense to just set tv_sec, the tv_nsec value would be basically random. In my patch I'm testing that tv_sec is zero in case any of the special values is used in the corresponding tv_nsec field. That's more than the standard currently requires but I think it's better and I try to get the standard proposal changed. If this doesn't happen I'll make appropriate changes at userlevel for the "strictly POSIX" mode. > Do you have a testcase app which can be used by arch maintainers? Attached here. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ #include #include #include #include #include #include #define UTIME_NOW ((1l << 30) - 1l) #define UTIME_OMIT ((1l << 30) - 2l) int main(void) { int status = 0; int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666); if (fd == -1) error (1, errno, "failed to create test file \"ttt\""); struct stat64 st1; if (fstat64 (fd, ) != 0) error (1, errno, "fstat failed"); struct timespec t[2]; t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(280, AT_FDCWD, "ttt", t) != 0) error (1, errno, "utimensat failed"); struct stat64 st2; if (fstat64 (fd, ) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0) { puts ("atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0] = st1.st_atim; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_OMIT; if (syscall(280, AT_FDCWD, "ttt", t) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, ) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("atim not set"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim changed from zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 0; t[0].tv_nsec = UTIME_OMIT; t[1] = st1.st_mtim; if (syscall(280, AT_FDCWD, "ttt", t) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, ) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("mtim changed from original time"); status = 1; } if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec) { puts ("mtim not set"); status = 1; } if (status != 0) goto out; sleep (2); t[0].tv_sec = 0; t[0].tv_nsec = UTIME_NOW; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_NOW; if (syscall(280, AT_FDCWD, "ttt", t) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, ) != 0) error (1, errno, "fstat failed"); struct timeval tv; gettimeofday(,NULL); if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec || st2.st_atim.tv_sec > tv.tv_sec) { puts ("atim not set to NOW"); status = 1; } if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec || st2.st_mtim.tv_sec > tv.tv_sec) { puts ("mtim not set to NOW"); status = 1; } if (status == 0) puts ("all OK"); out: close (fd); unlink ("ttt"); return status; } signature.asc Description: OpenPGP digital signature
Re: linux-2.6.21-rc7-mm2 fails to compile
On Fri, 27 Apr 2007 00:00:15 + (GMT) William Heimbigner <[EMAIL PROTECTED]> wrote: > Output leading up to the error: > >CC drivers/macintosh/macio-adb.o >LD drivers/macintosh/built-in.o >CC [M] drivers/macintosh/apm_emu.o >CC [M] drivers/macintosh/therm_windtunnel.o > drivers/macintosh/therm_windtunnel.c: In function 'therm_of_remove': > drivers/macintosh/therm_windtunnel.c:462: error: void value not ignored as it > ought to be > drivers/macintosh/therm_windtunnel.c:463: warning: control reaches end of > non-void function > make[2]: *** [drivers/macintosh/therm_windtunnel.o] Error 1 > make[1]: *** [drivers/macintosh] Error 2 > make: *** [drivers] Error 2 > > This is on an iMac G3 powerpc. i2c_del_driver() was changed to reutrn void in the i2c tree. --- a/drivers/macintosh/therm_windtunnel.c~fix-jdelvare-i2c-i2c-del-driver-returns-void-on-powerpc +++ a/drivers/macintosh/therm_windtunnel.c @@ -459,7 +459,8 @@ therm_of_probe( struct of_device *dev, c static int therm_of_remove( struct of_device *dev ) { - return i2c_del_driver( _driver ); + i2c_del_driver( _driver ); + return 0; } static struct of_device_id therm_of_match[] = {{ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] change global zonelist order on NUMA v2
On Thu, 26 Apr 2007 17:57:40 -0400 Lee Schermerhorn <[EMAIL PROTECTED]> wrote: > On Thu, 2007-04-26 at 18:34 +0900, KAMEZAWA Hiroyuki wrote: > > Changelog from V1 -> V2 > > - sysctl name is changed to be relaxed_zone_order > > - NORMAL->NORMAL->->DMA->DMA->DMA order (new ordering) is now default. > > NORMAL->DMA->NORMAL->DMA order (old ordering) is optional. > > - addes boot opttion to set relaxed_zone_order. ia64 is supported now. > > - Added documentation > > > > patch is against 2.6.21-rc7-mm2. tested on ia64 NUMA box. works well. > > [PATCH] factor/rework change zonelist order patch > > Against 2.6.21-rc7 atop KAMEZAWA Hiroyuki's "change global zonelist > order on NUMA v2" patch. > Hi, this looks 'easier-to-read' than mine. thanks. > 3) kept early_param() definition for boot parameter in mm/page_alloc.c, >along with the handler function. One less file to modify. > I put early_param() to arch dependent part just beacause no generic code except for pci seems to call it. If it is allowed, I welcome this change. > 4) modified the two Documentation additions to match these changes. > > I've tested various combinations [non-exhaustive], with an ad hoc > instrumentation patch, and it appears to work as expected [as I expect, > anyway] on ia64 NUMA. > > Question: do we need to rebuild the zonelist caches when we reorder >the zones? The z_to_n[] array appears to be dependent on >the zonelist order... > maybe no. > Also: I see the "Movable" zones show up in 21-rc7-mm2. This patch >will cause Movable zone to overflow to remote movable zones >before using local Normal memory in non-default, zone order. >Is this what we want? > >From my point of view, it's what I want. What we have to do will be establish a way to create ZONE_MOVABLE with suitable size on each node. I'll merge your change to my set and add "automatic detection" support. Thank you. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Thursday 26 April 2007, Con Kolivas wrote: >On Friday 27 April 2007 00:41, Gene Heskett wrote: >> On Thursday 26 April 2007, Redeeman wrote: >> >On Wed, 2007-04-25 at 23:47 +0200, Ingo Molnar wrote: >> > >> > >> >> As usual, any sort of feedback, bugreport, fix and suggestion is more >> >> than welcome, >> > >> >well, from my experiences with cfs on workstation/desktop, on amd64 >> >2ghz, cfs doesent measure up to SD at all. >> > >> >audio skips easily, with or without reniced X, and games are not as >> >smooth. also it seems to be much lower throughput. vanilla seems much >> >better by comparisin. > >That's a pretty bad regression for CFS to not even play audio without >stuttering. > >> Well, I don't generally feel as if the few stutters I just heard in a cbs >> news story I just played were anything but dsl glitches here. xmms, >> playing high quality (Q7) oggs from my own drive are dead smoothly done. >> nbc did want to let me play anything because I have adblock enabled, and >> neither did abc but their anims played nicely, fox news smoothly played >> everything I clicked on. >> >> The only problem I've had so far is with a proprietary daemon from belkin, >> which took about 10 kills and restarts after I'd booted to 2.6.21-CFS-v6 >> before it quit wanting a steady 40% of the cpu when it was running. Now >> it only does that when the bulldog gui is running. That's not normal >> either, but getting fixes out of belkin is like breeding elephants, lots >> of yelling and screaming and it takes 22 months to get results. My next >> ups will NOT be a belkin. >> >> X is not reniced here, but I did run kmail up to -5. >> >> Compared to mainline? I still think this is a 100% keeper for desktop >> users like me. > >Hi Gene > >Have you given up on SD testing then since it's stable and CFS is still > shiny and new (and written by Ingo) therefore it must be better? Not necessarily Con. Do you have a fresh one for 2.6.21? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Been Transferred Lately? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 -- x86_64 VDSO compile error II
> That's vdso.lds.S, not vdso.lds. The VDSO_TEXT_OFFSET is still 0x400. > This is what I should have posted the last time > > --- build/arch/x86_64/vdso/voffset.h.orig 2007-04-26 08:43:31.523739878 > -0700 > +++ build/arch/x86_64/vdso/voffset.h 2007-04-26 08:43:38.839579356 -0700 > @@ -1 +1 @@ > -#define VDSO_TEXT_OFFSET 0x500 > +#define VDSO_TEXT_OFFSET 0x400 It's definitely 0x400 here. Is your patch reversed and you want 0x500? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code
On Fri, 2007-04-27 at 02:13 +0200, Adrian Bunk wrote: > On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote: > > On Fri, 27 Apr 2007, Adrian Bunk wrote: > > > > > This patch removes kernel 2.4 code. > > > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > > > --- > > > > > > This patch has been sent on: > > > - 26 Mar 2007 > > > > > > drivers/scsi/nsp32.c | 109 +-- > > > 1 file changed, 13 insertions(+), 96 deletions(-) > > > > > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 > > > 20:27:34.0 +0200 > > > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 > > > 20:31:59.0 +0200 > > > @@ -49,10 +49,6 @@ > > > #include > > > #include > > > > > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)) > > > -# include > > > -#endif > > > - > > > > i'm curious about the rules for removing code like this. in the case > > of drivers, isn't it possible that some driver source could be > > relevant for both the 2.4 and 2.6 kernel source tree, and simply uses > > that kind of preprocessor check to make sure it's being compiled > > appropriately? > > That's what it was for. > > > or are you doing something more sophisticated than simply checking the > > kernel version being tested? > > No. > > The point is: > > It seems this driver was once maintained for both 2.4 and 2.6 in one > file. > > As long as this is done, such version checks are OK. Personally, I don't like to see 2.4 and 2.6 in a new driver, and will tend to try to force it to be 2.6 only. For an existing driver, I tend to be much more tolerant: removing the huge gobs of code to achieve 2.6 only is usually a bit disruptive on both the driver and the maintainer > But if a driver is no longer actually maintained for both kernels these > checks become useless (and there quickly arised unconditional 2.6-only > code in such a driver) and can be removed. This driver is maintained by Yokota Hiroshi <[EMAIL PROTECTED]> GOTO Masanori <[EMAIL PROTECTED]> As it says in the header. It was last modified in May 2006, so it is maintained under the somewhat elastic standards of SCSI. I've cc'd them to see what they think. James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
William Lee Irwin III <[EMAIL PROTECTED]> writes: >> In memory as on disk, contiguity matters a lot for performance. On Thu, Apr 26, 2007 at 12:21:24PM -0600, Eric W. Biederman wrote: > Not nearly so much though. In memory you don't have seeks to avoid. > On disks avoiding seeks is everything. I readily concede that seeks are most costly. Yet memory contiguity remains rather influential. Witness the fact that I'm now being called upon a second time to adjust the order in which mm/page_alloc.c returns pages for the sake of implicitly establishing IO contiguity (or otherwise determining why things are coming out backward now). -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-2.6.21 and __udivid3
Hi, Sorry for the late, i was in my bed. I assume this is a plain 2.6.21 from ftp.kernel.org? Yes. Can you reproduce this with gcc 4.1? If yes, please send your .config . I don't, i think i have to go back to gcc-4.2.0 which was fine with linux-2.6.21-rc7. It will be same to 2.6.21. TIA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] change global zonelist order on NUMA v2
On Thu, 26 Apr 2007 08:48:19 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Thu, 26 Apr 2007, KAMEZAWA Hiroyuki wrote: > > > (1)Use new zonelist ordering always and move init_task's tied cpu to a > > cpu on the best node. > > Child processes will start in good nodes even if Node 0 has small memory. > > How about renumbering the nodes? Node 0 is the one with no DMA memory and > node 1 may be the one with the DMA? That would take care of things even > without core modifications. We can start on node 0 (which hardware 1) and > consume the required memory for boot there not impacting the node with the > DMA memory. > It seems a bit complicated. If we do so, following can occur, Node1: cpu0,1,2,3 Node0: cpu4,5,6,7 the system layout will be not imaginable look, maybe. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHYconfiguration
> -Original Message- > From: Benjamin Herrenschmidt [mailto:[EMAIL PROTECTED] > Sent: Thursday, April 26, 2007 5:19 PM > To: Jeff Haran > Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org > Subject: Re: [PATCH 1/1] IBM PPC EMAC driver:improved support > for PHYconfiguration > > On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote: > > From: Jeff Haran <[EMAIL PROTECTED]> > > > > This patch fixes some problems I found while debugging the IBM EMAC ... > Your patch appears to have been line wrapped by your mailer though... Benjamin, Friggin Outlook and our M$ centered corporate IT environment. Even when you tell it to send plain text it modifies it. This was my first attempt at patch submission, so I don't know all the hoops yet. I can try sending the patch again if I can figure out how to disable the line wrap. Or if you have some FTP server I can put to I can send you the patch file directly there. Please let me know which you'd prefer. Thanks, Jeff > > Cheers, > Ben. > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Fri, 27 Apr 2007, Thomas Gleixner wrote: > > Maybe we need to coordinate changes better. 2.6.21 got three big updates > which affected suspend/resume - one of them is my fault. But fiddling > out which one of those - we had nested problems as well - makes it quite > hard to grok them in time, especially if they happen only on one > reporters system. Yes. _If_ we had known how painful the timer changes would end up being, we'd probably have done them separately from everything else. That is the kind of thing that looks obvious in hindsight: merge stuff that is questionable and scary alone, and don't do anything else that release cycle. But while the timer code is obviously pretty core, I think everybody expected it to be a lot easier to merge (and it had existed as patches in various forms for some time). So we simply didn't know beforehand that it was going to cause the kinds of regressions it did cause (and in fact, some of the regressions were initially blamed on other things entirely - some of them looked like IO regressions). Water under the bridge. It's also easy to say in hindsight that something should have been merged separately and been given a release cycle all its own. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, Apr 26, 2007 at 03:49:51PM -0700, David Lang wrote: > swap partitions are limited to 2G (or at least they were a couple of months > ago when I last checked). I also don't want to run the risk of having a box > try to _use_ 16G worth of swap. I'd rather have the box hit OOM first. They aren't limited anymore, I have a number of machines with 20G swap for experiments. OG. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On (26/04/07 20:39), Jens Axboe didst pronounce: >> But don't push large pages as the only solution to larger ios, because >> that is trivially not true. On Thu, Apr 26, 2007 at 09:22:02PM +0100, Mel Gorman wrote: > Would it be fair to say that your approach and using large pages are not > mutually exclusive solutions? It seems a lot of the debate here is > assuming there is One And Only One Solution for larger ios. I'd like to see how all the strategies mentioned thus far do in conjunction. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHY configuration
On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote: > From: Jeff Haran <[EMAIL PROTECTED]> > > This patch fixes some problems I found while debugging the IBM EMAC > driver for PPC32 systems. > The first problem was in the function that configures the PHY for > autonegotiation, genmii_setup_aneg(). The original code does a > read/modify/write of the autonegotiation advertizement register (reg 4), > followed by a read/modify/write of the control register (reg 0). While > the original code follows the proper procedure as per reading the IEEE > specs, what I found is that on at least one PHY model (National DP83843) > the read of the control register comes back with the soft reset bit set > (bit 15). Good catch ! I've seen that behaviour in the past too. Note that sungem has this problem too. .../... > The second problem was in the function that configures the PHY for > forced operation, genmii_setup_forced(). The original code initiates a > software reset operation via a write of a 1 to bit 15 of the control > register (reg 0), but then proceeds to do a second write to that same > register without waiting until that reset bit is cleared by the PHY > itself (which according to the IEEE specs indicates that the PHY reset > is complete). This is a violation of how one is supposed to use this > software reset feature of these PHYs and I believe was the cause of > mysterious, difficult to reproduce link failures that we've observed on > some of our systems that use this driver. The fix is to modify the > function so that it spins waiting for the reset bit to clear after doing > the soft reset and before doing the subsequent write. Since this > modification, we haven't seen the mysterious link failures, though they > were so rare its difficult to say at this point whether this was the > cause. This is also a bug inherited from sungem (thus my fault). > I also added some error handling and reporting for the abnormal case > where the reset bit never clears from the soft reset operation. > Applied to kernel version 2.6.21. Your patch appears to have been line wrapped by your mailer though... Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26, 2007 at 11:50:33PM +1000, David Chinner wrote: > On Thu, Apr 26, 2007 at 04:10:32AM -0600, Eric W. Biederman wrote: > > > And then there's the problem that most hardware is limited to 128 > > > s/g entries and that means 128 non-contiguous pages in memory is the > > > maximum I/O size we can issue to these devices. We have RAID arrays > > > that go twice as fast if we can send them 1MB I/Os instead of 512k > > > I/Os and that means we need contiguous pages to be handled to the > > > devices > > > > Ok. Now why are high end hardware manufacturers building crippled > > hardware? Or is there only an 8bit field in SCSI for describing > > scatter gather entries? Although I would think this would be > > move of a controller ranter than a drive issue. > > scsi.h: > > /* > * The maximum sg list length SCSI can cope with > * (currently must be a power of 2 between 32 and 256) > */ > #define SCSI_MAX_PHYS_SEGMENTS MAX_PHYS_SEGMENTS > > And from blkdev.h: > > #define MAX_PHYS_SEGMENTS 128 > #define MAX_HW_SEGMENTS 128 > > So currentlt on SCSI we are limited to 128 s/g entries, and the > maximum is 256. So I'd say we've got good grounds for needing > contiguous pages to go beyond 1MB I/O size on x86_64. Right, and there are also RAID devices that really want a 2 MiB I/O size. Even if we could use 512 s/g entries (which would take two pages), the other big problem is that many I/O chips/cards are limited in the amount of space they have for s/g lists. So, you'd face the possibility that you could do a 2MiB I/O request with 512 s/g entries, but then you couldn't start a second request on that host until the first one finished. jeremy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm patch] unexport highlevel_host_reset
Adrian Bunk wrote: > highlevel_host_reset no longer has any modular users. Thanks, I missed this when I removed the last usage outside the 1394 core. Committed to linux1394-2.6.git. -- Stefan Richter -=-=-=== -=-- ==-== http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] battery2 git repository
On Fri, Apr 27, 2007 at 03:29:02AM +0400, Anton Vorontsov wrote: > > To not confuse with David own battery-2.6 repository, it's called > battery2-2.6, and can be found here: So, I'm still not quite understanding this, is David giving up on his code now that you have pulled his changes into your tree? Or is there some reason there are still two competing versions here? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-04-25-02-49.tar.gz uploaded
On Thu, Apr 26, 2007 at 11:21:35PM +0200, Michal Piotrowski wrote: > Hi Greg, > > Michal Piotrowski napisa?(a): > > Hi all, > > > > [EMAIL PROTECTED] napisa?(a): > >> The mm snapshot broken-out-2007-04-25-02-49.tar.gz has been uploaded to > >> > >> > >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-04-25-02-49.tar.gz > >> > >> It contains the following patches against 2.6.21-rc7: > >> > > > > My network doesn't work > > > > $ ping 192.168.0.1 > > connect: Network is unreachable > > > > NIC: > > 02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T > > [Marvell] (rev 12) > > > > quilt patches drivers/net/skge.c > > patches/git-netdev-all.patch > > patches/git-net.patch > > > > Bisection... > > > > I tracked this problem down to your patches (kernel 2.6.21-rc7-mm2) > > gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch > gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch > gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch > gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch > gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch > gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch > gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch > gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch > gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch > gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch > gregkh-driver-sysfs-implement-bin_buffer.patch > gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch > gregkh-driver-sysfs-kill-attribute-file-orphaning.patch > gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch > gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch > gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch > gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch > gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch > gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > ABI breakage? I've dropped the gregkh-driver-sysfs-* patches listed above from my tree as Tejun is reworking the series to handle all of the different changes that have been needed as they have been tested, so you shouldn't have to worry about them. But I would be interested in finding out why the above is breaking for you. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code
On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote: > On Fri, 27 Apr 2007, Adrian Bunk wrote: > > > This patch removes kernel 2.4 code. > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > --- > > > > This patch has been sent on: > > - 26 Mar 2007 > > > > drivers/scsi/nsp32.c | 109 +-- > > 1 file changed, 13 insertions(+), 96 deletions(-) > > > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 > > 20:27:34.0 +0200 > > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 > > 20:31:59.0 +0200 > > @@ -49,10 +49,6 @@ > > #include > > #include > > > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)) > > -# include > > -#endif > > - > > i'm curious about the rules for removing code like this. in the case > of drivers, isn't it possible that some driver source could be > relevant for both the 2.4 and 2.6 kernel source tree, and simply uses > that kind of preprocessor check to make sure it's being compiled > appropriately? That's what it was for. > or are you doing something more sophisticated than simply checking the > kernel version being tested? No. The point is: It seems this driver was once maintained for both 2.4 and 2.6 in one file. As long as this is done, such version checks are OK. But if a driver is no longer actually maintained for both kernels these checks become useless (and there quickly arised unconditional 2.6-only code in such a driver) and can be removed. > rday cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
Andrew Morton wrote: > On Thu, 26 Apr 2007 18:49:05 -0400 Ulrich Drepper <[EMAIL PROTECTED]> wrote: >> >> If the tv_nsec value of either of the elements of the utimes parameter to >> utimensat() is UTIME_OMIT no update of that respective value is performed. > > ITYM "If the value of either of the elements..." > > +#define UTIME_NOW((1l << 30) - 1l) > +#define UTIME_OMIT ((1l << 30) - 2l) > > OK, so there's no collision on ts_nsec if unnormalised timespecs are > disallowed. > > But there's a potential collision on ts_sec? Do we know what date that > corresponds to? "If the tv_nsec value" implies that these magic numbers have no impact on these. I'm a bit leery of abusing the timespec value like this, though. A flags field seem like it would be cleaner. Something else... if we're dickering with these interfaces, shouldn't we allow setting atime as well? -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Fri, Apr 27, 2007 at 06:50:56AM +1000, Nigel Cunningham wrote: > I'm perfectly willing to think through some alternate approach if you > suggest something or prod my thinking in a new direction, but I'm afraid > I just can't see right now how we can achieve what you're after. Ok, what about this approach I've been mulling about for a while: Suspend-to-disk is pretty much an exercise in state saving. There are multiple ways to do state saving, but they tend to end up in two categories: implicit and explicit. In implicit state saving, you try to save the state of the system/application/whatever "under its feet", more or less, and then fixup what is no saved/saveable correctly. A well-known example is the undumping process Emacs goes (went?) where it tries to dump the state of the memory as a new executable, with a lot of pleasure with various executable formats and subtleties due to side effects in libc code you don't control. In explicit state saving each object saves what is needed from its state to an independently defined format (instead of "whatever the memory organization happens to be at that point"). When reloading the state you have to parse it, and it usually requires rebuilding/relocating all references/pointers/etc. XEmacs currently has a "portable dumper" that pretty much does just that. We don't have any redumping problems anymore, they're over. Which one is the best depends heavily on the application. The amount of code in the implicit case depends on the amount of fixups to do. In the kernel case it happens to be a lot, pretty much everything that touches hardware has to save to memory the device state and reload it on resume. And bugs on hardware handling can be quite annoying to debug. And if some driver does not to saving/resume correctly, you have no way outside of playing with modules to ensure the safety of the suspend cycle. The amount of code in the explicit case is an interesting variable in the case of the kernel. You have to save what is needed, but how do you define what is needed? It is, pretty much, what running processes can observe from userspace. Now, what can a process observe: - its application text and anonymous memory pages - its file handles - its mapped files - its mapped whatever else - its sys5 IPC stuff - futex stuff and friends, namespaces, etc - its intrinsic characteristics it can reach through syscalls (i.e. the user-visible parts of current, like pid, uid...) - its currently running system call, if any So that's what we'd have to explicitely save. Anonymous memory, sys5 IPC, futex and current structures, that's easy stuff in practice. The fun part are pretty much: - references to files - references to active networking links - references to devices and associated visible state - currently running system call, aka the kernel stack for the process The last one is the one I'm the most afraid of. I hope that the signal stuff and/or the asynchronous syscall stuff that was discussed recently would allow to "unwind" blocking system calls back to the syscall level and then store the parameters for resume-time restart. The non-blocking calls you can just let finish. The first one is really interesting. If you value your filesystems, you'd rather have them clean after the suspend. And also you pretty much know that filesystems can move around when you're not looking, be it USB hotplug stuff (discovery order is random-ish isn't it?), module loading order issues or multithreaded device discovery. So you're way more happy *not* caching anything from the filesystem you can avoid. But what is a file reference, really? With the dcache handy, it's pretty much a path, since inodes don't always exist reliably. And if you have the lists of paths used by the processes on a particular filesystem, you can easily get an idea of where, if anywhere, the filesystem is even if you don't have reliable serials. More interestingly, you cannot, in any case, instantly corrupt your filesystem by having a mismatch between the in-memory cache and the reality. The processes which referenced files you can't find anywhere will end-up with EBADF or segfault depending on whether it was fd or mmap, ala revoke(). They'll probably die horribly. I'd rather have processes die than filesystems die, since in any case if the file isn't here anymore in practice the process could only destroy things. An interesting things there, nothing in that touches either the filesystem or the block devices. Everything is done at the VFS level. The devices don't need to care. And the "this filesystem goes there" can be done in userspace in an initramfs if people want to experiment with kinky strategies. After all, why not allow a sysadmin to regroup two filesystems into one though a suspend, the processes mostly don't need to care (well, tar may, but heh). Deleted files would have to be sillyrenamed or something. Implementation details ;-) Active networking links, you can
Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code
On Fri, 27 Apr 2007, Adrian Bunk wrote: > This patch removes kernel 2.4 code. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > > This patch has been sent on: > - 26 Mar 2007 > > drivers/scsi/nsp32.c | 109 +-- > 1 file changed, 13 insertions(+), 96 deletions(-) > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 > 20:27:34.0 +0200 > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 20:31:59.0 > +0200 > @@ -49,10 +49,6 @@ > #include > #include > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)) > -# include > -#endif > - i'm curious about the rules for removing code like this. in the case of drivers, isn't it possible that some driver source could be relevant for both the 2.4 and 2.6 kernel source tree, and simply uses that kind of preprocessor check to make sure it's being compiled appropriately? or are you doing something more sophisticated than simply checking the kernel version being tested? rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/