Re: Kconfig style question
Kumar Gala <[EMAIL PROTECTED]> wrote: > For source lines I've seen both: > > source "arch/powerpc/platforms/52xx/Kconfig" > > and > > source arch/powerpc/platforms/85xx/Kconfig > > Is there a preferred style? Quotes or not? $ find . -name Kconfig -exec grep ^source '{}' \;|grep \"|wc -l 732 $ find . -name Kconfig -exec grep ^source '{}' \;|grep -v \"|wc -l 44 -- Funny quotes: 18. Mind like a steel trap - rusty and illegal in 37 states. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] revoke: misc fixes
Pekka J Enberg wrote: Hi Nick, On Fri, 16 Mar 2007, Nick Piggin wrote: Could you try something like walk the i_mmap lists to find mms with vmas that haven't need revoking, then each time you find one, take a ref on the mm, drop i_mmap_lock, take mmap_sem, and walk all its vmas looking for any that reference the inode? Yes, that would work. What I am cooking up now is dropping ->i_mmap_lock, restarting the scan after each revoke_vma() and skipping vmas that are VM_REVOKED. Of course you can't take a reference to a vma, so to pin a vma you need the mmap_sem, and to do that you need to drop i_mmap_lock, which means your vma might go away ;) So I think you really do need to get back to the mm, and then search its vmas. Also, a down_write_trylock attempt inside i_mmap_lock should be a valid optimisation. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] revoke: misc fixes
On Fri, 16 Mar 2007, Nick Piggin wrote: > So I think you really do need to get back to the mm, and then search its > vmas. You're right. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On 3/16/07, Horms <[EMAIL PROTECTED]> wrote: On Fri, Mar 16, 2007 at 08:52:30AM +0530, Vivek Goyal wrote: > So it will now be left to the user. If he tries to kexec to a 64bit kernel > on a machine not supporting 32bit extensions, then kexec will not give > any advance warning. I feel comfortable with that. Well for now anyway. But I think that Magnus has other ideas. I don't mind switching back and forth between 32-bit and 64-bit for plain kexec, especially if we can validate that the kernel we load will use an instruction set that is supported. But for kdump, switching between 32-bit and 64-bit kernels is just another new dimension in the already too complex kdump matrix IMO. I think more focus should be put on fixing up bugs in kexec-tools than adding new features. / magnus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] revoke: misc fixes
Hi Nick, On Fri, 16 Mar 2007, Nick Piggin wrote: > Could you try something like walk the i_mmap lists to find mms with vmas that > haven't need revoking, then each time you find one, take a ref on the mm, drop > i_mmap_lock, take mmap_sem, and walk all its vmas looking for any that > reference the inode? Yes, that would work. What I am cooking up now is dropping ->i_mmap_lock, restarting the scan after each revoke_vma() and skipping vmas that are VM_REVOKED. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/13] signalfd/timerfd/asyncfd v5 - timerfd core ...
On Thu, 2007-03-15 at 16:02 -0700, Davide Libenzi wrote: > > > + /* > > > + * When we call this, the initialization must be complete, since > > > + * aino_getfd() will install the fd. > > > + */ > > > + error = aino_getfd(&ufd, &inode, &file, "[timerfd]", > > > +&timerfd_fops, ctx); > > > + if (error) > > > + goto err_ctxfree; > > > > Again: Please turn this around. No need to start the timer before we > > know, that everything works. > > The timerfd_setup() is not locked, so we need to make sure everything is > setup, before advertising the fd (and aino_getfd does that). Right. Did not think about the bad boys peeking at file descriptors :) tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 8040] Hang before INIT when CONFIG_HIGHMEM4G=y [Fix CONFIG_COMPAT_VDSO] <- Bad?
Correcting my previous message, and confirming Leroy's one : linux-2.6.20.tar.gz : bad linux-2.6.20.1.tar.gz: bad linux-2.6.20.2.tar.gz: good linux-2.6.20.3.tar.gz: good With : COMPAT_VDSO=y CONFIG_HIGHMEM64G=y So problem have been solved with 2.6.20.2 Nilshar. 2007/3/15, Leroy van Logchem <[EMAIL PROTECTED]>: > Can you please double check this by trying with/without again -- sometimes bisects go bad. As requested I started to redo the test but now without git using kernel.org tars. The results now are, still using the same .config: linux-2.6.20.tar.gz : bad linux-2.6.20.1.tar.gz: bad (boot log equal) linux-2.6.20.2.tar.gz: good linux-2.6.20.3.tar.gz: good (triple checked) Really strange. Nilshar, please try these kernels too with: COMPAT_VDSO=y CONFIG_HIGHMEM64G=y - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/13] signal/timer/event fds v6 - timerfd core ...
On Thu, 2007-03-15 at 17:22 -0700, Davide Libenzi wrote: > +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags, > + const struct itimerspec *ktmr) > +{ > + enum hrtimer_mode htmode; > + > + htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: > HRTIMER_MODE_REL; > + > + ctx->ticks = 0; > + ctx->texp = timespec_to_ktime(ktmr->it_value); I know, I'm racking your nerves. texp is only used for setup. No need to carry it in the ctx data structure. :) > + ctx->tintv = timespec_to_ktime(ktmr->it_interval); > + hrtimer_init(&ctx->tmr, clockid, htmode); > + ctx->tmr.expires = ctx->texp; > + ctx->tmr.function = timerfd_tmrproc; > + if (ctx->texp.tv64 != 0) > + hrtimer_start(&ctx->tmr, ctx->texp, htmode); > +} Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] revoke: misc fixes
Pekka J Enberg wrote: Pekka J Enberg wrote: /* -* Not holding ->mmap_sem here. +* Not holding ->mmap_sem here but we must watch out for page +* faults and after the shared mappings have been taken down +* and sys_mmap() trying to remap the revoked range. */ vma->vm_flags |= VM_REVOKED; smp_mb(); @@ -455,7 +457,7 @@ int err = 0; On Fri, 16 Mar 2007, Nick Piggin wrote: You're still modifying vm_flags without down_write mmap_sem, so this will corrupt vm_flags. Uhm, you're right, two concurrent writes and we can lose some bits so a barrier doesn't work. Too bad as we're under mapping->i_mmap_lock here and thus cannot take ->mmap_sep... Could you try something like walk the i_mmap lists to find mms with vmas that haven't need revoking, then each time you find one, take a ref on the mm, drop i_mmap_lock, take mmap_sem, and walk all its vmas looking for any that reference the inode? Bit of a roundabout way to go, but it might work. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, Mar 16, 2007 at 07:17:43AM +, Ian Campbell wrote: > On Fri, 2007-03-16 at 08:48 +0900, Horms wrote: > > > > > > > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> > > > > > > > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > > > > index d960507..523e109 100644 > > > > --- a/fs/proc/vmcore.c > > > > +++ b/fs/proc/vmcore.c > > > > @@ -514,7 +514,7 @@ static int __init parse_crash_elf64_headers(void) > > > > /* Do some basic Verification. */ > > > > if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 || > > > > (ehdr.e_type != ET_CORE) || > > > > - !elf_check_arch(&ehdr) || > > > > + !vmcore_elf_check_arch(&ehdr) || > > > > ehdr.e_ident[EI_CLASS] != ELFCLASS64 || > > > > ehdr.e_ident[EI_VERSION] != EV_CURRENT || > > > > ehdr.e_version != EV_CURRENT || > > > > diff --git a/include/asm-i386/kexec.h b/include/asm-i386/kexec.h > > > > index 4dfc9f5..c76737e 100644 > > > > --- a/include/asm-i386/kexec.h > > > > +++ b/include/asm-i386/kexec.h > > > > @@ -47,6 +47,9 @@ > > > > /* The native architecture */ > > > > #define KEXEC_ARCH KEXEC_ARCH_386 > > > > > > > > +/* We can also handle crash dumps from 64 bit kernel. */ > > > > +#define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) > > > > + > > > > > > Ideal place for this probably should have been arch dependent crash_dump.h > > > file. But we don't have one and no point introducing one just for this > > > macro. > > > > > > This change looks good to me. > > > > Won't the above change break non i386 archtectures as > > vmcore_elf_check_arch_cross isn't defined for them? > > No, because of this hunk: Thanks, silly me :( -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, Mar 16, 2007 at 08:52:30AM +0530, Vivek Goyal wrote: > On Fri, Mar 16, 2007 at 11:40:07AM +0900, Magnus Damm wrote: > > On 3/16/07, Horms <[EMAIL PROTECTED]> wrote: > > >On Thu, Mar 15, 2007 at 06:56:16PM +0530, Vivek Goyal wrote: > > >> On Thu, Mar 15, 2007 at 12:22:57PM +, Ian Campbell wrote: > > >> > On Thu, 2007-03-15 at 11:17 +0530, Vivek Goyal wrote: > > >> > > > > But I think changing this macro might run into issues. It is > > >> > > > > being used at few places in kernel, for example while loading > > >> > > > > module. This will essentially mean that we allow loading 64bit > > >> > > > > x86_64 modules on 32bit i386 systems? > > >> > > > >> > Yes, not sure how I missed that fact... > > >> > > > >> > > Kexec will also not allow loading an x86_64 kernel on a 32bit > > >machine. > > >> > > > >> > For crash kernel only or for regular kexec too? > > >> > > > >> > > >> I think for both. One of the possible reasons I think is that one never > > >> knows is underlying machine has got 64bit extensions or not. So even if > > >> we load the kernel it will never boot. Secondly, we might not be able to > > >> handle 64bit address in 32bit kernel/user space? > > > > > >Perhaps I am miss-understanding what you are saying, but I do > > >recally kexecing from 32->64 and 64->32 bit kernels on x86_64 hardware. > > >I can run these checks again if it helps. > > > > I stand corrected. I can kexec an bzImage 32->64bit. That's a different > thing that it ran into some initrd issues later but fundamentally kexec > could load 64bit kernel bzImage and do the successful transition. > > So it will now be left to the user. If he tries to kexec to a 64bit kernel > on a machine not supporting 32bit extensions, then kexec will not give > any advance warning. I feel comfortable with that. Well for now anyway. But I think that Magnus has other ideas. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, 2007-03-16 at 08:12 +0530, Vivek Goyal wrote: > I did not investigate deeper but I got a basic question. How will kexec > know that underlying 32bit machine supports 64bit extensions or not? It looks like /proc/cpuinfo flags contains "lm" (which is long mode, right?) even if the machine is running 32 bit mode. Ian. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] revoke: misc fixes
Pekka J Enberg wrote: > > /* > > -* Not holding ->mmap_sem here. > > +* Not holding ->mmap_sem here but we must watch out for page > > +* faults and after the shared mappings have been taken down > > +* and sys_mmap() trying to remap the revoked range. > > */ > > vma->vm_flags |= VM_REVOKED; > > smp_mb(); > > @@ -455,7 +457,7 @@ int err = 0; On Fri, 16 Mar 2007, Nick Piggin wrote: > You're still modifying vm_flags without down_write mmap_sem, so this will > corrupt vm_flags. Uhm, you're right, two concurrent writes and we can lose some bits so a barrier doesn't work. Too bad as we're under mapping->i_mmap_lock here and thus cannot take ->mmap_sep... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, 2007-03-16 at 11:40 +0900, Magnus Damm wrote: > Right. And maybe it's a good idea to make sure that this feature is > actually supported by kexec-tools before adding code to the kernel? I sent patches to the fastboot list at the same time I sent these ones to support differences in the underlying hypervisor architecture in the tools. They haven't appeared in the archives yet so I fear they have gone astray. I'll resend when I get to the office in a bit. The tools already have support for introducing a SHIM when kexecing between different architectures (at least in the 64->32 direction if I understand kexec-tools-testing/purgatory/arch/i386/compat_x86_64.S and k-t-t.../kexec/arch/i386/compat_x86_64.S correctly). This is really just an extension of that. > My gut feeling about this is that you are begging for trouble. The > kexec/kdump solution is fragile just by itself, and trying to go > between architectures is just going to be painful. It works fine under Xen and I think going from 64Xen+32Kernel->32Kernel makes more sense than going from 64Xen+32Kernel->64Kernel. As I said originally I'm not so convinced it makes sense in the native case but I see no reason to outlaw it (people get to keep both pieces etc...) Ian. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Linux 2.6.20.2 - unable to handle kernel paging request - still accessing freed memory
On 3/16/07, Greg KH <[EMAIL PROTECTED]> wrote: Is there any way you can use 'git bisect' to try to track down the root cause of this? Chris, If 2.6.19 works for you, could you please do a git bisect for this bug? See the following URL for details: http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make sure we populate the initroot filesystem late enough
On Tue, Mar 13, 2007 at 08:03:49AM +0100, Benjamin Herrenschmidt wrote: >> Hmm. The crash came back after I booted into Mac OS X and back. It was >> however >> a different crash, I believe it was coming from the USB modules (as it would >> keep going when it happened, and get another crash, which tended to scroll >> away >> too fast for me to capture) but I believe it was still getting down into the >> slab code and actually dying there. > Have you tried, instead, to apply > 38f3323037de22bb0089d08be27be01196e7148b ? (That is revert > 39d61db0edb34d60b83c5e0d62d0e906578cc707). That's working fine at the moment, and has even survived a trip to Mac OS X and back. Thankyou. -- --- Paul "TBBle" Hampson, B.Sc, LPI, MCSE On-hiatus Asian Studies student, ANU The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361) [EMAIL PROTECTED] Of course Pacman didn't influence us as kids. If it did, we'd be running around in darkened rooms, popping pills and listening to repetitive music. -- Kristian Wilson, Nintendo, Inc, 1989 License: http://creativecommons.org/licenses/by/2.1/au/ --- pgpS681tgc7G1.pgp Description: PGP signature
Re: [PATCH 1/3] revoke: misc fixes
Pekka J Enberg wrote: From: Pekka Enberg <[EMAIL PROTECTED]> This is a rollup patch of the following fixes to address some of Andrew's review comments: - Fix return value type of system calls to long - Add comment for vma->vm_flag barrier - No need for GFP_NOFS for inode allocation, use GFP_KERNEL instead - Remove unnecessary line break before EXPORT_SYMBOL Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/revoke.c |9 + include/linux/syscalls.h |4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) Index: uml-2.6/fs/revoke.c === --- uml-2.6.orig/fs/revoke.c2007-03-16 08:58:31.0 +0200 +++ uml-2.6/fs/revoke.c 2007-03-16 09:00:37.0 +0200 @@ -167,7 +167,9 @@ static int revoke_vma(struct vm_area_str end_addr = vma->vm_end; /* -* Not holding ->mmap_sem here. +* Not holding ->mmap_sem here but we must watch out for page +* faults and after the shared mappings have been taken down +* and sys_mmap() trying to remap the revoked range. */ vma->vm_flags |= VM_REVOKED; smp_mb(); @@ -455,7 +457,7 @@ int err = 0; You're still modifying vm_flags without down_write mmap_sem, so this will corrupt vm_flags. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] revoke: misc fixes
From: Pekka Enberg <[EMAIL PROTECTED]> This is a rollup patch of the following fixes to address some of Andrew's review comments: - Fix return value type of system calls to long - Add comment for vma->vm_flag barrier - No need for GFP_NOFS for inode allocation, use GFP_KERNEL instead - Remove unnecessary line break before EXPORT_SYMBOL Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/revoke.c |9 + include/linux/syscalls.h |4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) Index: uml-2.6/fs/revoke.c === --- uml-2.6.orig/fs/revoke.c2007-03-16 08:58:31.0 +0200 +++ uml-2.6/fs/revoke.c 2007-03-16 09:00:37.0 +0200 @@ -167,7 +167,9 @@ static int revoke_vma(struct vm_area_str end_addr = vma->vm_end; /* -* Not holding ->mmap_sem here. +* Not holding ->mmap_sem here but we must watch out for page +* faults and after the shared mappings have been taken down +* and sys_mmap() trying to remap the revoked range. */ vma->vm_flags |= VM_REVOKED; smp_mb(); @@ -455,7 +457,7 @@ int err = 0; return err; } -asmlinkage int sys_revokeat(int dfd, const char __user * filename) +asmlinkage long sys_revokeat(int dfd, const char __user * filename) { struct nameidata nd; int err; @@ -499,7 +501,6 @@ int generic_file_revoke(struct file *fil out: return err; } - EXPORT_SYMBOL(generic_file_revoke); /* @@ -510,7 +511,7 @@ static struct inode *revokefs_alloc_inod { struct revokefs_inode_info *info; - info = kmem_cache_alloc(revokefs_inode_cache, GFP_NOFS); + info = kmem_cache_alloc(revokefs_inode_cache, GFP_KERNEL); if (!info) return NULL; Index: uml-2.6/include/linux/syscalls.h === --- uml-2.6.orig/include/linux/syscalls.h 2007-03-16 08:58:30.0 +0200 +++ uml-2.6/include/linux/syscalls.h2007-03-16 08:59:59.0 +0200 @@ -605,7 +605,7 @@ asmlinkage long sys_getcpu(unsigned __us int kernel_execve(const char *filename, char *const argv[], char *const envp[]); -asmlinkage int sys_revokeat(int dfd, const char __user *filename); -asmlinkage int sys_frevoke(unsigned int fd); +asmlinkage long sys_revokeat(int dfd, const char __user *filename); +asmlinkage long sys_frevoke(unsigned int fd); #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] revoke: fix shared mapping revoke
From: Pekka Enberg <[EMAIL PROTECTED]> On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > This all looks very strange. If the calling process expires its timeslice, > the entire system call fails? This changes revoke_mapping() to restart after cond_resched() to fix an obvious goof made by me. Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/revoke.c | 49 ++--- 1 file changed, 30 insertions(+), 19 deletions(-) Index: uml-2.6/fs/revoke.c === --- uml-2.6.orig/fs/revoke.c2007-03-16 09:02:16.0 +0200 +++ uml-2.6/fs/revoke.c 2007-03-16 09:11:46.0 +0200 @@ -194,34 +194,49 @@ return 0; return -EINTR; } -static int revoke_mapping(struct address_space *mapping, struct file *to_exclude) +static void revoke_mapping_tree(struct address_space *mapping, + struct file *to_exclude, + struct zap_details *details) { struct vm_area_struct *vma; struct prio_tree_iter iter; - struct zap_details details; - int err = 0; - - details.i_mmap_lock = &mapping->i_mmap_lock; - spin_lock(&mapping->i_mmap_lock); + restart: vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX) { if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) { - err = revoke_vma(vma, &details); - if (err) - goto out; + if (revoke_vma(vma, details)) + goto restart; } } +} +static void revoke_mapping_list(struct address_space *mapping, + struct file *to_exclude, + struct zap_details *details) +{ + struct vm_area_struct *vma; + + restart: list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) { if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) { - err = revoke_vma(vma, &details); - if (err) - goto out; + if (revoke_vma(vma, details)) + goto restart; } } - out: +} + +static void revoke_mapping(struct address_space *mapping, struct file *to_exclude) +{ + struct zap_details details; + + details.i_mmap_lock = &mapping->i_mmap_lock; + + spin_lock(&mapping->i_mmap_lock); + if (unlikely(!prio_tree_empty(&mapping->i_mmap))) + revoke_mapping_tree(mapping, to_exclude, &details); + if (unlikely(!list_empty(&mapping->i_mmap_nonlinear))) + revoke_mapping_list(mapping, to_exclude, &details); spin_unlock(&mapping->i_mmap_lock); - return err; } static void restore_file(struct revokefs_inode_info *info) @@ -441,11 +456,7 @@int err = 0; /* * Take down shared memory mappings. */ - err = revoke_mapping(inode->i_mapping, to_exclude); - if (err) { - restore_files(table); - goto out_free_table; - } + revoke_mapping(inode->i_mapping, to_exclude); /* * Now, revoke the files for good. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] revoke: move magic
From: Pekka Enberg <[EMAIL PROTECTED]> Move REVOKEFS_MAGIC to where it belongs. Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/revoke.c |1 + include/linux/magic.h|1 + include/linux/revoked_fs_i.h |2 -- 3 files changed, 2 insertions(+), 2 deletions(-) Index: uml-2.6/include/linux/magic.h === --- uml-2.6.orig/include/linux/magic.h 2007-03-16 09:01:07.0 +0200 +++ uml-2.6/include/linux/magic.h 2007-03-16 09:01:50.0 +0200 @@ -34,6 +34,7 @@ #define REISERFS_SUPER_MAGIC 0x52654973 #define REISERFS_SUPER_MAGIC_STRING"ReIsErFs" #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs" #define REISER2FS_JR_SUPER_MAGIC_STRING"ReIsEr3Fs" +#define REVOKEFS_MAGIC 0x5245564B /* REVK */ #define SMB_SUPER_MAGIC0x517B #define USBDEVICE_SUPER_MAGIC 0x9fa2 Index: uml-2.6/include/linux/revoked_fs_i.h === --- uml-2.6.orig/include/linux/revoked_fs_i.h 2007-03-16 09:01:12.0 +0200 +++ uml-2.6/include/linux/revoked_fs_i.h2007-03-16 09:01:21.0 +0200 @@ -1,8 +1,6 @@ #ifndef _LINUX_REVOKED_FS_I_H #define _LINUX_REVOKED_FS_I_H -#define REVOKEFS_MAGIC 0x5245564B /* REVK */ - struct revokefs_inode_info { struct task_struct *owner; struct file *file; Index: uml-2.6/fs/revoke.c === --- uml-2.6.orig/fs/revoke.c2007-03-16 09:01:59.0 +0200 +++ uml-2.6/fs/revoke.c 2007-03-16 09:02:08.0 +0200 @@ -9,6 +9,7 @@ * Copyright (C) 2006-2007 Pekka Enberg #include #include #include +#include #include #include #include - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, 2007-03-16 at 08:48 +0900, Horms wrote: > > > > > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> > > > > > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > > > index d960507..523e109 100644 > > > --- a/fs/proc/vmcore.c > > > +++ b/fs/proc/vmcore.c > > > @@ -514,7 +514,7 @@ static int __init parse_crash_elf64_headers(void) > > > /* Do some basic Verification. */ > > > if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 || > > > (ehdr.e_type != ET_CORE) || > > > - !elf_check_arch(&ehdr) || > > > + !vmcore_elf_check_arch(&ehdr) || > > > ehdr.e_ident[EI_CLASS] != ELFCLASS64 || > > > ehdr.e_ident[EI_VERSION] != EV_CURRENT || > > > ehdr.e_version != EV_CURRENT || > > > diff --git a/include/asm-i386/kexec.h b/include/asm-i386/kexec.h > > > index 4dfc9f5..c76737e 100644 > > > --- a/include/asm-i386/kexec.h > > > +++ b/include/asm-i386/kexec.h > > > @@ -47,6 +47,9 @@ > > > /* The native architecture */ > > > #define KEXEC_ARCH KEXEC_ARCH_386 > > > > > > +/* We can also handle crash dumps from 64 bit kernel. */ > > > +#define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) > > > + > > > > Ideal place for this probably should have been arch dependent crash_dump.h > > file. But we don't have one and no point introducing one just for this > > macro. > > > > This change looks good to me. > > Won't the above change break non i386 archtectures as > vmcore_elf_check_arch_cross isn't defined for them? No, because of this hunk: diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 3250365..db60dac 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -14,5 +14,13 @@ extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, extern const struct file_operations proc_vmcore_operations; extern struct proc_dir_entry *proc_vmcore; +/* Architecture code defines this if there are other possible ELF + * machine types, e.g. on bi-arch capable hardware. */ +#ifndef vmcore_elf_check_arch_cross(x) +#define vmcore_elf_check_arch_cross(x) 0 +#endif [snip] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 kernel oops
Am Freitag, 16. März 2007 03:06 schrieben Sie: > Check out http://bugzilla.kernel.org/show_bug.cgi?id=8067 which is a > duplicate of http://bugzilla.kernel.org/show_bug.cgi?id=7727 which is > fixed. There is a patch available on the bugzilla if you want to try it > out. Thank you, I'll test this patch as soon as possible. regards, Jörg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] revoke: core code
Hi Andrew, On Sun, 11 Mar 2007 13:30:49 +0200 (EET) Pekka J Enberg <[EMAIL PROTECTED]> wrote: On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: n all system calls must return long. Fixed. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: so the modification of vm_flags is racy? > + smp_mb(); Please always document barriers. There's presumably some vm_flags reader we're concerned about here, but how is the code reader to know what the code writer was thinking? We're need to watch out for page faults after the shared mappings have been taken down and mmap(2) trying to remap. I'll add a comment here. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: This all looks very strange. If the calling process expires its timeslice, the entire system call fails? What's happening here? Me being stupid. I followed what unmap_mapping_range_vma is doing but failed to see what its callers are doing. I'll fix it up. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: do_fsync() is seriously suboptimal - it will run an ext3 commit. do_sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) will run maybe five times quicker. But otoh, do_sync_file_range() will fail to write back the pages for a data=journal ext3 file, I expect (oops). But it's good enough for generic_file_revoke, no? Ext3 should probably implement it's own revoke hook so you can drop the ext2 and ext3 hooks if you're worried, I did them mostly for testing. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: Why is this code using invalidate_inode_pages2()? That function keeps on breaking, has ill-defined semantics and will probably change in the future. Exactly what semantics are you looking for here, and why? What the comment says "make pending reads fail." When revoking an inode, we need to make sure there are no pending I/O that will complete after revocation and thus leak information. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: The blank line before the EXPORT_SYMBOL() is a waste of space. I'll fix that up. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > +static struct inode *revokefs_alloc_inode(struct super_block *sb) > +{ > + struct revokefs_inode_info *info; > + > + info = kmem_cache_alloc(revokefs_inode_cache, GFP_NOFS); > + if (!info) > + return NULL; > + > + return &info->vfs_inode; > +} Why GFP_NOFS? GFP_KERNEL should be sufficient. I'll fix that up. On 3/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > === > --- /dev/null 1970-01-01 00:00:00.0 + > +++ uml-2.6/include/linux/revoked_fs_i.h 2007-03-11 13:09:20.0 +0200 > @@ -0,0 +1,20 @@ > +#ifndef _LINUX_REVOKED_FS_I_H > +#define _LINUX_REVOKED_FS_I_H > + > +#define REVOKEFS_MAGIC 0x5245564B /* REVK */ This is supposed to go into magic.h. Will do. Thank you Andrew. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: ibm-acpi: allow module to load when acpi notifiers can't be set (v2)
Applied. On Thursday 15 March 2007 15:15, Henrique de Moraes Holschuh wrote: > This patch allows for ibm-acpi to coexist (with diminished functionality) with > other drivers like ACPI_BAY. ibm-acpi will simply disable the functions it is > not able to register ACPI notifiers for. > > Signed-off-by: Henrique de Moraes Holschuh <[EMAIL PROTECTED]> > Cc: Chris Wedgwood <[EMAIL PROTECTED]> > Cc: Kristen Carlson Accardi <[EMAIL PROTECTED]> > --- > > There was a minor problem in the first version of the patch, which I didn't > notice when backporting from acpi-test. This is a fixed version. Sorry > about this. > > Len, you can pull this patch from: > git://repo.or.cz/linux-2.6/linux-acpi-2.6/ibm-acpi-2.6.git > branch for-upstream/acpi-release > > Please send it to Linus for merge in 2.6.21. > > It will clash with the patches in acpi-test that are waiting for 2.6.22. > I will rediff those, and send you a pull request when this patch > gets accepted in mainline. ok thanks Henrique, -Len > drivers/acpi/ibm_acpi.c | 19 --- > 1 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/ibm_acpi.c b/drivers/acpi/ibm_acpi.c > index 3690136..dc10966 100644 > --- a/drivers/acpi/ibm_acpi.c > +++ b/drivers/acpi/ibm_acpi.c > @@ -2507,7 +2507,7 @@ static int __init setup_notify(struct ibm_struct *ibm) > ret = acpi_bus_get_device(*ibm->handle, &ibm->device); > if (ret < 0) { > printk(IBM_ERR "%s device not present\n", ibm->name); > - return 0; > + return -ENODEV; > } > > acpi_driver_data(ibm->device) = ibm; > @@ -2516,8 +2516,13 @@ static int __init setup_notify(struct ibm_struct *ibm) > status = acpi_install_notify_handler(*ibm->handle, ibm->type, >dispatch_notify, ibm); > if (ACPI_FAILURE(status)) { > - printk(IBM_ERR "acpi_install_notify_handler(%s) failed: %d\n", > -ibm->name, status); > + if (status == AE_ALREADY_EXISTS) { > + printk(IBM_NOTICE "another device driver is already > handling %s events\n", > + ibm->name); > + } else { > + printk(IBM_ERR "acpi_install_notify_handler(%s) failed: > %d\n", > + ibm->name, status); > + } > return -ENODEV; > } > ibm->notify_installed = 1; > @@ -2553,6 +2558,8 @@ static int __init register_driver(struct ibm_struct > *ibm) > return ret; > } > > +static void ibm_exit(struct ibm_struct *ibm); > + > static int __init ibm_init(struct ibm_struct *ibm) > { > int ret; > @@ -2594,6 +2601,12 @@ static int __init ibm_init(struct ibm_struct *ibm) > > if (ibm->notify) { > ret = setup_notify(ibm); > + if (ret == -ENODEV) { > + printk(IBM_NOTICE "disabling subdriver %s\n", > + ibm->name); > + ibm_exit(ibm); > + return 0; > + } > if (ret < 0) > return ret; > } > -- > 1.5.0.3 > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1
Hello Mel, > > > > > Today after +- 24h of uptime I found some more page allocation > > > > > failures ('eth1: Can't allocate skb for Rx'). You'll find more here: > > > > > > > > > > http://tuxland.pl/misc/2.6.21-rc3-mm1-page-allocation-failure.txt > > > > > > > > > > System wasn't doing anything unusual, as usual ;-) X, some p2p > > > > > software, firefox+flash playing music. > > > > > > > > Do other kernels do this, or is 2.6.21-rc3-mm1 worse? > > > > > > > > It is of course a non-fatal problem and will inevitably happen > > > > sometimes, > > > > but we would like the VM to be able to minimise the occurrence of this > > > > problem. > > > > > > Mariusz, I would be interested in finding out if this problem still > > > occurs when > > > you set min_free_kbytes to 16384 via /proc/sys/vm/min_free_kbytes. I > > > understand > > > that the problem is not easily reproduced and requiring configuration > > > changes > > > is far from ideal but it'd allow me to find out if options 2 or 3 below > > > make > > > sense in advance. > > > > After a few hours I can confirm that this happens with > > > > $ cat /proc/sys/vm/min_free_kbytes > > 16384 > > > > as well. See the syslog output below. Feel free to mail me to do some more > > tests. > > Ok, great. Well, not great because it's broken, but I know what's going > on. I was able to reproduce the problem based on your report on my desktop > and put together a fix for it. Full regression tests are still running but > it should be in good enough state for you to test. > > Without this patch, I got allocation failures within 15 minutes by stressing > the machine. With the patch below, it's been up an hour and 15 minutes and > I'm seeing no problems so far. Will keep the machine running a few days to > see what happens. > [...] > Mariusz, please try the following patch. It should not be necessary to > adjust your min_free_kbytes again but if you see a failure, please try > with min_free_kbytes set to 16384. Thanks a lot. Works for me. min_free_kbytes was left at default 2791. I left the laptop with X + aMule + azureus + firefox&flash (playing music) + kernel compilation so the box was pushed a bit. Uptime close to 9 hours and no page allocation failures. I leave it running some more. If anything pops out you'll know it :-) Thanks, Mariusz Kozlowski - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] signalfd v1 - signalfd core ...
On Thu, 15 Mar 2007, Ulrich Drepper wrote: > On 3/7/07, Davide Libenzi wrote: > > Let's do this. How about you throw this way one of the case that would > > possibly break, and I test it? > > Since you make such claims I assume your signalfd() implementation > considers a signal delivered once it is reported to an epoll() caller. > Right? The wakeup phase in epoll does not mean the signal is delivered. That happens when the caller does a read(2) on the signalfd. The read(2) on the signalfd ends up in calling dequeue_signal(), the same function use by the kernel to spill out signals to deliver (they peek from the same queue). > This is not what you really want, at least not in all cases. A signal > might be something you want to react on right away. Unless > pthread_kill() is used it is delivered to the _process_ and not a > specific thread. But this means if epoll() reports two events to one > thread calling epoll() (one of them being a signal) and this thread is > then stuck processing the other request, the signal is not handled > even though there might be a second or third thread available to > receive the signal. Those threads have the same right to receive the > signal and the current implementation always looks for the > best/fastest way to deliver the signal. The behaviour depends on the sigmask you pass to signalfd(). You can select signals that you want to handle in a standard way, and the ones that you want to handle with signalfd. Typically programs using signalfd() do not want the asyncronous behaviour of signals at all (with all the limits you have in the handler), and the event dispatch loop never blocks by definition (otherwise you have more serious problems than a signal not delivered). They are also very likely to be single threaded. > This means to me that reporting the signal in epoll() does _not_ mark > the signal as handled. Somehow (probably using the signalfd() > descriptor) the thread must explicitly request the signal to be > delivered. But if you do this the epoll() handling is fantastically > racy if the signal is not blocked. As I said, when a signal hits send_signal (or the queued versions), a wakeup is done on the wait queue head poll (or select/epoll) is sleeping on. This ends up delivring a POLLIN, but the signal is not fetched (by the mean of dequeue_signal) until a read(2) on the signalfd is done. Since both standard delivery and signalfd's read(2) fish from the same queue, you have to block the signals that you want to have the guarantee to be able to fetch with a read(2) (signalfd supports O_NONBLOCK also). If you do not block the signal, you get a wakeup, but you may not find a signal to dequeue at the next read(2), because a standard delivery might have stole the signal by preceeding you in dequeue_signal. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Three critical patches still aren't merged in 2.6.21
On Thu, Mar 15, 2007 at 12:34:07PM -0400, Chuck Ebbert wrote: > I've been holding off sending these in for -stable until they're > merged, but now I wonder when that will happen. Feel free to send them to stable@ when they go to Linus as it sounds like they are almost there. Thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT
Jeremy Fitzhardinge wrote: +} else if (strcmp(secstrings+sechdrs[i].sh_name, ".dynamic") == 0) { +Elf32_Dyn *dyn = (void *)hdr + sechdrs[i].sh_offset; +int tag; +while ((tag = (++dyn)->d_tag) != DT_NULL) Um, no. Walk based on size instead? No, I was just complaining about the embedded assignment, before dinner, so I was overly terse. My last embedded assignment was a robot microcontroller, and I dropped out of that class. So I _need_ embedded assignments. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: core2 duo, interrupts: is this normal?
On Thursday 15 March 2007 21:24, Norberto Bensa wrote: > Hello, > > is this output, normal? I meant, why counters on CPU1 is zero? Isn't this > balanced? yes, it is normal. If you had an interrupt-limited workload then irqbalance would pick things up and spread them out. -Len > $ cat /proc/interrupts >CPU0 CPU1 > 0:4180170 0 IO-APIC-edge timer > 1: 8060 0 IO-APIC-edge i8042 > 7: 0 0 IO-APIC-edge parport0 > 9: 0 0 IO-APIC-fasteoi acpi > 12: 5 0 IO-APIC-edge i8042 > 16: 322297 0 IO-APIC-fasteoi uhci_hcd:usb3, libata, nvidia, > EMU10K1 > 17: 896399 0 IO-APIC-fasteoi bttv0, eth0, libata > 18: 72867 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb7 > 19: 27770 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5 > 20: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 > 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb6 > 22: 3 0 IO-APIC-fasteoi ohci1394 > 23:155 0 IO-APIC-fasteoi HDA Intel > 219: 103056 0 PCI-MSI-edge libata > NMI: 0 0 > LOC:40776134077622 > ERR: 0 > MIS: 0 > > > Many thanks in advance, > Norberto > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] signalfd v1 - signalfd core ...
On 3/7/07, Davide Libenzi wrote: Let's do this. How about you throw this way one of the case that would possibly break, and I test it? Since you make such claims I assume your signalfd() implementation considers a signal delivered once it is reported to an epoll() caller. Right? This is not what you really want, at least not in all cases. A signal might be something you want to react on right away. Unless pthread_kill() is used it is delivered to the _process_ and not a specific thread. But this means if epoll() reports two events to one thread calling epoll() (one of them being a signal) and this thread is then stuck processing the other request, the signal is not handled even though there might be a second or third thread available to receive the signal. Those threads have the same right to receive the signal and the current implementation always looks for the best/fastest way to deliver the signal. This means to me that reporting the signal in epoll() does _not_ mark the signal as handled. Somehow (probably using the signalfd() descriptor) the thread must explicitly request the signal to be delivered. But if you do this the epoll() handling is fantastically racy if the signal is not blocked. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PPC: Delete unused header file.
Robert P. J. Day writes: > Delete apparently unused header file arch/ppc/syslib/cpc710.h. I suggest you send this to [EMAIL PROTECTED] and Matt Porter <[EMAIL PROTECTED]> for review. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT
Zachary Amsden wrote: > Well testing that is not so fun. I installed SUSE Pro 9.0, and > strings on ld.so contains the magic at_sysinfo assert! But it doesn't > install TLS libraries, so I'll have to install them by hand. > > In works - in theory. Look, a puppy! > > Scratchbox is rumored to produce the fabled assertion even on modern > distros by installing its own toolchain which includes the dreaded glibc. I think Andi and Andrew have boxes which are afflicted. > I'm playing safe. Binary identical relocation to 0xe000 was my goal. Yeah, fair enough. But as Eric likes to keep pointing out, an executable ELF file need not have any sections at all, so the only safe course for anything "real" is via the section headers. So I guess the right thing to do is relocate the dynamic stuff via PT_DYNAMIC, and relocate the symtab if its present. >>> +} else if (strcmp(secstrings+sechdrs[i].sh_name, >>> ".dynamic") == 0) { >>> +Elf32_Dyn *dyn = (void *)hdr + sechdrs[i].sh_offset; >>> +int tag; >>> +while ((tag = (++dyn)->d_tag) != DT_NULL) >>> >> >> Um, no. >> > > Walk based on size instead? No, I was just complaining about the embedded assignment, before dinner, so I was overly terse. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Improve error recovery in serial mouse driver
On Thursday 15 March 2007 15:16, Peter Osterlund wrote: > If bytes get lost in the communication with a serial mouse using the > MS protocol, the kernel driver could do a better job getting back in > sync. The first byte in a packet has bit 6 set, and no other bytes > have that bit set. Therefore, if a byte is received with bit 6 cleared > when the driver thinks it is at byte 0 in the packet, the driver thinks > wrong and the byte should just be ignored. > > This fix prevents spurious left/right button events when the serial > communication is disturbed by a CPU-hungry real-time process. > > Signed-off-by: Peter Osterlund <[EMAIL PROTECTED]> Applied, thank you Peter. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thread stacks and strict vm overcommit accounting
On Thu, 15 Mar 2007 11:06:21 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Tue, 13 Mar 2007 18:33:20 +0200 Dan Aloni <[EMAIL PROTECTED]> wrote: > > Hello, > > > > This question is relevent to 2.6.20. > > > > I noticed that if the RSS for the stack size is say, 8MB, running > > a single-threaded process doesn't incur an increase of 8MB to > > Committed_AS (/proc/meminfo). > > > > However, on multi-threaded apps linked with pthread (on Debian > > Etch with 2.6.20 vanilla x86_64), every thread will incur the > > the specified maximum stack size RSS (assuming that you use > > the default attr). In other words, it appears that vm accounting > > works differently in that case. > > > > Is this the intended behaviour? > > That sounds like a bug to me. AFAIK, "main" thread's stack is marked as VM_GROWS?? and its size can be changed dynamically. "other" threads' stack are alloced by mmap (or malloc maybe) and it never grows. This is difference between multi-thread and single thread. So, you should be carefull to the size of stack when you use multi-threaded apps and vm_overcommit_ratio at the same time. Because MAP_NORESERVE is accounted if sysctl_overcommit_memory == OVERCOMMIT_NEVER, a program like java will fail to create a new thread sometimes. I have no good idea to fix this difference, sorry. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2
On Thu, 15 Mar 2007, Steven Rostedt wrote: > On Thu, 2007-03-15 at 17:06 +0100, Andi Kleen wrote: > > > Well I just see a lot of pain from these patches but I doubt > > they will avoid any bugs. If people don't compile test both > > archs they will always likely break on another. There are lots > > of subtle dependencies that are not expressed in the pathname > > even after this intrusive operation (e.g. in the includes). > > > > That's just how it is. > > Or that's just how you see it. In the future it is likely that x86_64 will significantly deviate from i386. i386 is going to be gradually abandoned because it does not support the ever larger memory sizes and be mainly used for embedded devices. x86_64 is going to acquire more functionality that will not be available for i386. We plan f.e. to add virtual memmap support for x86_64. Virtual memmap support may require a large chunk of virtual memory space that is not available on i386. Its not good to have to deal with i386 issues when doing x86_64 arch development. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2
On Wed, 2007-03-14 at 01:08 -0400, Steven Rostedt wrote: > [Hopefully fixed email client to make it to the list this time] > [This series has changed by using git-diff -M] > Seems appropriate, but I really don't care what it's called. One thing about > this name, is that typing arch/x86 doesn't tab complete x86_64 anymore. > But if you can think of something better, I'd be happy to apply it. > sorry for being so late, but about what it could be called, well, what about common_x86 or common/x86 or something? > > -- Steve > > PS. Sorry for the spam. I need to figure out how to tame quilt mail! > > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
[EMAIL PROTECTED] wrote: On the other hand, Andreas suggested only marking it once every 32 calls, but that required a helper variable. Statistically, jiffies%32 should end up about the same as a helper variable %32. This of course, if just calling mark_page_accessed() is actually expensive enough that we don't want to do it unconditionally. Not caching a needed page and having to wait for a disk seek to complete will be *way* more expensive than any call to mark_page_accessed(). A modern CPU can do somewhere on the order of 50 million instructions in the time it takes to bring one page in from disk. However, this does not mean we should unconditionally call mark_page_accessed(), since that could cause use to push wanted data out of the cache because of one program that does its streaming accesses in a strange way... This is a situation where getting it right almost certainly matters. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Replace pid_t in autofs with struct pid reference
On Mon, 12 Mar 2007, [EMAIL PROTECTED] wrote: > > From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> > Subject: [PATCH 2/2] Replace pid_t in autofs with struct pid reference. > > Make autofs container-friendly by caching struct pid reference rather > than pid_t and using pid_nr() to retreive a task's pid_t. > > ChangeLog: > - Fix Eric Biederman's comments - Use find_get_pid() to hold a > reference to oz_pgrp and release while unmounting; separate out > changes to autofs and autofs4. What changes to autofs4? Do you intend this change to be made for autofs4 also? Perhaps you expected me to do them, in which case you probably should ask me to do the patch. > - Fix Cedric's comments: retain old prototype of parse_options() > and move necessary change to its caller. > > Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> > Cc: Cedric Le Goater <[EMAIL PROTECTED]> > Cc: Dave Hansen <[EMAIL PROTECTED]> > Cc: Serge Hallyn <[EMAIL PROTECTED]> > Cc: Eric Biederman <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> > > --- > fs/autofs/autofs_i.h |4 ++-- > fs/autofs/inode.c| 20 > fs/autofs/root.c |6 -- > 3 files changed, 22 insertions(+), 8 deletions(-) > > Index: lx26-21-rc3-mm2/fs/autofs/autofs_i.h > === > --- lx26-21-rc3-mm2.orig/fs/autofs/autofs_i.h 2007-03-12 17:12:05.0 > -0700 > +++ lx26-21-rc3-mm2/fs/autofs/autofs_i.h 2007-03-12 17:18:55.0 > -0700 > @@ -101,7 +101,7 @@ struct autofs_symlink { > struct autofs_sb_info { > u32 magic; > struct file *pipe; > - pid_t oz_pgrp; > + struct pid *oz_pgrp; > int catatonic; > struct super_block *sb; > unsigned long exp_timeout; > @@ -122,7 +122,7 @@ static inline struct autofs_sb_info *aut > filesystem without "magic".) */ > > static inline int autofs_oz_mode(struct autofs_sb_info *sbi) { > - return sbi->catatonic || process_group(current) == sbi->oz_pgrp; > + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; > } > > /* Hash operations */ > Index: lx26-21-rc3-mm2/fs/autofs/inode.c > === > --- lx26-21-rc3-mm2.orig/fs/autofs/inode.c2007-03-12 17:18:48.0 > -0700 > +++ lx26-21-rc3-mm2/fs/autofs/inode.c 2007-03-12 17:18:55.0 -0700 > @@ -37,6 +37,8 @@ void autofs_kill_sb(struct super_block * > if (!sbi->catatonic) > autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */ > > + put_pid(sbi->oz_pgrp); > + > autofs_hash_nuke(sbi); > for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) { > if (test_bit(n, sbi->symlink_bitmap)) > @@ -139,6 +141,7 @@ int autofs_fill_super(struct super_block > int pipefd; > struct autofs_sb_info *sbi; > int minproto, maxproto; > + pid_t pgid; > > sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); > if (!sbi) > @@ -150,7 +153,6 @@ int autofs_fill_super(struct super_block > sbi->pipe = NULL; > sbi->catatonic = 1; > sbi->exp_timeout = 0; > - sbi->oz_pgrp = process_group(current); > autofs_initialize_hash(&sbi->dirhash); > sbi->queues = NULL; > memset(sbi->symlink_bitmap, 0, sizeof(long)*AUTOFS_SYMLINK_BITMAP_LEN); > @@ -171,7 +173,7 @@ int autofs_fill_super(struct super_block > > /* Can this call block? - WTF cares? s is locked. */ > if (parse_options(data, &pipefd, &root_inode->i_uid, > - &root_inode->i_gid, &sbi->oz_pgrp, &minproto, > + &root_inode->i_gid, &pgid, &minproto, > &maxproto)) { > printk("autofs: called with bogus options\n"); > goto fail_dput; > @@ -184,13 +186,21 @@ int autofs_fill_super(struct super_block > goto fail_dput; > } > > - DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, sbi->oz_pgrp)); > + DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, pgid)); > + sbi->oz_pgrp = find_get_pid(pgid); > + > + if (!sbi->oz_pgrp) { > + printk("autofs: could not find process group %d\n", pgid); > + goto fail_dput; > + } > + > pipe = fget(pipefd); > > if (!pipe) { > printk("autofs: could not open pipe file descriptor\n"); > - goto fail_dput; > + goto fail_put_pid; > } > + > if (!pipe->f_op || !pipe->f_op->write) > goto fail_fput; > sbi->pipe = pipe; > @@ -205,6 +215,8 @@ int autofs_fill_super(struct super_block > fail_fput: > printk("autofs: pipe file descriptor does not contain proper ops\n"); > fput(pipe); > +fail_put_pid: > + put_pid(sbi->oz_pgrp); > fail_dput: > dput(root); > goto fail_free; > Index: lx26-21-rc3-mm2/fs/autofs/root.c > =
Re: [PATCH] blackfin: balance parenthesis in macros
On Thu, 2007-03-15 at 18:12 -0400, Mariusz Kozlowski wrote: > Hello, > > This patch (against 2.6.21-rc3-mm1) balances parenthesis in blackfin > header files. > > Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> > > include/asm-blackfin/mach-bf535/bf535.h |4 ++-- > include/asm-blackfin/scatterlist.h |2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff -u linux-2.6.21-rc3-mm1-a/include/asm-blackfin/mach-bf535/bf535.h > linux-2.6.21-rc3-mm1-b/include/asm-blackfin/mach-bf535/bf535.h > --- linux-2.6.21-rc3-mm1-a/include/asm-blackfin/mach-bf535/bf535.h > 2007-03-15 22:25:34.0 +0100 > +++ linux-2.6.21-rc3-mm1-b/include/asm-blackfin/mach-bf535/bf535.h > 2007-03-15 22:33:09.0 +0100 > @@ -224,7 +224,7 @@ > #define UART0_LSR_TEMT 0x40/* TSR and UARTx_thr both > empty */ > > #define UART0_MSR_ADDR 0xffc0180c /* UART 0 Modem status > register 16 bit */ > -#define UART0_MSR HALFWORD_REF(UART0_MSR_ADDR > +#define UART0_MSR HALFWORD_REF(UART0_MSR_ADDR) > #define UART0_SCR_ADDR 0xffc0180e /* UART 0 Scratch register > 16 bit */ > #define UART0_SCR HALFWORD_REF(UART0_SCR_ADDR) > #define UART0_IRCR_ADDR0xffc01810 /* UART 0 IrDA > Control register 16 bit */ > @@ -331,7 +331,7 @@ > #define UART1_LSR_TEMT 0x40 /* TSR and UARTx_thr both empty */ > > #define UART1_MSR_ADDR 0xffc01c0c /* UART 1 Modem status > register 16 bit */ > -#define UART1_MSR HALFWORD_REF(UART1_MSR_ADDR > +#define UART1_MSR HALFWORD_REF(UART1_MSR_ADDR) > #define UART1_SCR_ADDR 0xffc01c0e /* UART 1 Scratch register > 16 bit */ > #define UART1_SCR HALFWORD_REF(UART1_SCR_ADDR) > > diff -upr linux-2.6.21-rc3-mm1-a/include/asm-blackfin/scatterlist.h > linux-2.6.21-rc3-mm1-b/include/asm-blackfin/scatterlist.h > --- linux-2.6.21-rc3-mm1-a/include/asm-blackfin/scatterlist.h 2007-03-15 > 22:25:34.0 +0100 > +++ linux-2.6.21-rc3-mm1-b/include/asm-blackfin/scatterlist.h 2007-03-15 > 22:30:18.0 +0100 > @@ -17,7 +17,7 @@ struct scatterlist { > * returns, or alternatively stop on the first sg_dma_len(sg) which > * is 0. > */ > -#define sg_address(sg) (page_address((sg)->page) + (sg)->offset > +#define sg_address(sg) (page_address((sg)->page) + (sg)->offset) > #define sg_dma_address(sg) ((sg)->dma_address) > #define sg_dma_len(sg) ((sg)->length) > > > > Regards, > > Mariusz Kozlowski Thank you Mariusz, Mike applied your patch to our SVN repo. I will send out blackfin-arch update patch including your contribution later. Your review is very helpful for our development. Best regards, -Bryan Wu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Invoke black magic to relocate the VDSO even when COMPAT_VDSO is enabled by fixing up the ELF object. So does it actually work? Can you boot the broken distros with this in place? Well testing that is not so fun. I installed SUSE Pro 9.0, and strings on ld.so contains the magic at_sysinfo assert! But it doesn't install TLS libraries, so I'll have to install them by hand. In works - in theory. Look, a puppy! Scratchbox is rumored to produce the fabled assertion even on modern distros by installing its own toolchain which includes the dreaded glibc. Using sections is wrong; you should be going through the phdrs, and looking for PT_DYNAMIC for relocation. Will do. Does anyone expect the symbolic info to be correct? It might be better to just stomp it so nobody gets any ideas. On the other hand, we don't want to break compatibility with anything... I'm playing safe. Binary identical relocation to 0xe000 was my goal. + } else if (strcmp(secstrings+sechdrs[i].sh_name, ".dynamic") == 0) { + Elf32_Dyn *dyn = (void *)hdr + sechdrs[i].sh_offset; + int tag; + while ((tag = (++dyn)->d_tag) != DT_NULL) Um, no. Walk based on size instead? + } else if (strcmp(secstrings+sechdrs[i].sh_name, ".useless") == 0) { + /* This is demonic; see vsyscall.lds.S; it puts the +* .got in a section named .useless */ + uint32_t *got = (void *)hdr + sechdrs[i].sh_offset; + *got += VDSO_HIGH_BASE; + } This won't get relocated with one of the other relocations? It's in the text phdr. Hmm, I can try that. Thanks for the suggestions / fixes. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
On Thu, 15 Mar 2007 14:35:17 EDT, Rik van Riel said: > [EMAIL PROTECTED] wrote: > > On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said: > > > >> it'd seem we need some kind of state management here to figure out good > >> intervals of when to call mark_page_accessed() *again* for this page. E.g. > >> despite non-changing access patterns you could still call > >> mark_page_accessed() > >> every 32 calls or so to avoid expiry, but this would need extra helper > >> variables. > > > > What if you did something like > > > > if (jiffies%32) {... > > > > (Possibly scaling it so the low-order bits change). No need to lock it, as > > "right most of the time" is close enough. > > Bad idea. That way you would only count page accesses if the > phase of the moon^Wjiffie is just right. On the other hand, Andreas suggested only marking it once every 32 calls, but that required a helper variable. Statistically, jiffies%32 should end up about the same as a helper variable %32. This of course, if just calling mark_page_accessed() is actually expensive enough that we don't want to do it unconditionally. pgp7KzMDuLQFc.pgp Description: PGP signature
Re: [patch 10/34] Xen-pv_ops: Simplify smp_call_function*() by using common implementation
On Tue, 13 Mar 2007 16:30:27 -0700 Jeremy Fitzhardinge wrote: > smp_call_function and smp_call_function_single are almost complete > duplicates of the same logic. This patch combines them by > implementing them in terms of the more general > smp_call_function_mask(). The kernel-doc is still not quite correct. Patch below applies on top of this patch from Jeremy. --- From: Randy Dunlap <[EMAIL PROTECTED]> Clean up arch/i386/kernel/smp.c after the Xen pv_ops patches for smp_call_function variants. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- arch/i386/kernel/smp.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) --- linux-2621-rc3.orig/arch/i386/kernel/smp.c +++ linux-2621-rc3/arch/i386/kernel/smp.c @@ -517,14 +517,14 @@ static struct call_data_struct *call_dat /** - * smp_call_function_mask(): Run a function on a set of other CPUs. + * smp_call_function_mask - Run a function on a set of other CPUs. * @mask: The set of cpus to run on. Must not include the current cpu. * @func: The function to run. This must be fast and non-blocking. * @info: An arbitrary pointer to pass to the function. * @wait: If true, wait (atomically) until function has completed on other CPUs. * * Returns 0 on success, else a negative status code. Does not return until - * remote CPUs are nearly ready to execute <> or are or have finished. + * remote CPUs are nearly ready to execute func() or are or have finished. * * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. @@ -583,14 +583,14 @@ int smp_call_function_mask(cpumask_t mas } /** - * smp_call_function(): Run a function on all other CPUs. + * smp_call_function - Run a function on all other CPUs. * @func: The function to run. This must be fast and non-blocking. * @info: An arbitrary pointer to pass to the function. * @nonatomic: currently unused. * @wait: If true, wait (atomically) until function has completed on other CPUs. * * Returns 0 on success, else a negative status code. Does not return until - * remote CPUs are nearly ready to execute <> or are or have executed. + * remote CPUs are nearly ready to execute func() or are or have executed. * * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. @@ -602,8 +602,9 @@ int smp_call_function(void (*func) (void } EXPORT_SYMBOL(smp_call_function); -/* +/** * smp_call_function_single - Run a function on another CPU + * @cpu: The target (destination) CPU number. * @func: The function to run. This must be fast and non-blocking. * @info: An arbitrary pointer to pass to the function. * @nonatomic: Currently unused. @@ -611,7 +612,7 @@ EXPORT_SYMBOL(smp_call_function); * * Retrurns 0 on success, else a negative status code. * - * Does not return until the remote CPU is nearly ready to execute + * Does not return until the remote CPU is nearly ready to execute func() * or is or has executed. */ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, Mar 16, 2007 at 11:40:07AM +0900, Magnus Damm wrote: > On 3/16/07, Horms <[EMAIL PROTECTED]> wrote: > >On Thu, Mar 15, 2007 at 06:56:16PM +0530, Vivek Goyal wrote: > >> On Thu, Mar 15, 2007 at 12:22:57PM +, Ian Campbell wrote: > >> > On Thu, 2007-03-15 at 11:17 +0530, Vivek Goyal wrote: > >> > > > > But I think changing this macro might run into issues. It is > >> > > > > being used at few places in kernel, for example while loading > >> > > > > module. This will essentially mean that we allow loading 64bit > >> > > > > x86_64 modules on 32bit i386 systems? > >> > > >> > Yes, not sure how I missed that fact... > >> > > >> > > Kexec will also not allow loading an x86_64 kernel on a 32bit > >machine. > >> > > >> > For crash kernel only or for regular kexec too? > >> > > >> > >> I think for both. One of the possible reasons I think is that one never > >> knows is underlying machine has got 64bit extensions or not. So even if > >> we load the kernel it will never boot. Secondly, we might not be able to > >> handle 64bit address in 32bit kernel/user space? > > > >Perhaps I am miss-understanding what you are saying, but I do > >recally kexecing from 32->64 and 64->32 bit kernels on x86_64 hardware. > >I can run these checks again if it helps. > I stand corrected. I can kexec an bzImage 32->64bit. That's a different thing that it ran into some initrd issues later but fundamentally kexec could load 64bit kernel bzImage and do the successful transition. So it will now be left to the user. If he tries to kexec to a 64bit kernel on a machine not supporting 32bit extensions, then kexec will not give any advance warning. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Lumpy Reclaim V5
On Mon, 12 Mar 2007 18:22:45 + Andy Whitcroft <[EMAIL PROTECTED]> wrote: > Following this email are three patches which represent the > current state of the lumpy reclaim patches; collectively lumpy V5. So where do we stand with this now?Does it make anything get better? I (continue to) think that if this is to be truly useful, we need some way of using it from kswapd to keep a certain minimum number of order-1, order-2, etc pages in the freelists. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT
Zachary Amsden wrote: > Invoke black magic to relocate the VDSO even when COMPAT_VDSO is enabled > by fixing up the ELF object. > So does it actually work? Can you boot the broken distros with this in place? > Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]> > > Index: linux-2.6.21/arch/i386/kernel/entry.S > === > --- linux-2.6.21.orig/arch/i386/kernel/entry.S2007-03-06 > 18:51:33.0 -0800 > +++ linux-2.6.21/arch/i386/kernel/entry.S 2007-03-15 18:14:11.0 > -0800 > @@ -305,16 +305,12 @@ sysenter_past_esp: > pushl $(__USER_CS) > CFI_ADJUST_CFA_OFFSET 4 > /*CFI_REL_OFFSET cs, 0*/ > -#ifndef CONFIG_COMPAT_VDSO > /* >* Push current_thread_info()->sysenter_return to the stack. >* A tiny bit of offset fixup is necessary - 4*4 means the 4 words >* pushed above; +8 corresponds to copy_thread's esp0 setting. >*/ > pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) > -#else > - pushl $SYSENTER_RETURN > -#endif > CFI_ADJUST_CFA_OFFSET 4 > CFI_REL_OFFSET eip, 0 > > Index: linux-2.6.21/arch/i386/kernel/sysenter.c > === > --- linux-2.6.21.orig/arch/i386/kernel/sysenter.c 2007-03-06 > 18:51:34.0 -0800 > +++ linux-2.6.21/arch/i386/kernel/sysenter.c 2007-03-15 18:27:43.0 > -0800 > @@ -72,6 +72,99 @@ extern const char vsyscall_int80_start, > extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; > static struct page *syscall_pages[1]; > > +#ifdef CONFIG_COMPAT_VDSO > +static void fixup_vsyscall_elf(char *page) > +{ > + Elf32_Ehdr *hdr; > + Elf32_Shdr *sechdrs; > + Elf32_Phdr *phdr; > + char *secstrings; > + int i, j, n; > + > + hdr = (Elf32_Ehdr *)page; > + > + printk("Remapping vsyscall page to %08x\n", (unsigned > int)VDSO_HIGH_BASE); > + > + /* Sanity checks against insmoding binaries or wrong arch, > + weird elf version */ > + if (memcmp(hdr->e_ident, ELFMAG, 4) != 0 || > + !elf_check_arch(hdr) || > + hdr->e_type != ET_DYN) > + panic("Bogus ELF in vsyscall DSO\n"); > + > + hdr->e_entry += VDSO_HIGH_BASE; > + sechdrs = (void *)hdr + hdr->e_shoff; > + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; > + > + for (i = 1; i < hdr->e_shnum; i++) { > Using sections is wrong; you should be going through the phdrs, and looking for PT_DYNAMIC for relocation. > + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) > + continue; > + > + sechdrs[i].sh_addr += VDSO_HIGH_BASE; > + if (strcmp(secstrings+sechdrs[i].sh_name, ".dynsym") == 0) { > + Elf32_Sym *sym = (void *)hdr + sechdrs[i].sh_offset; > + n = sechdrs[i].sh_size / sizeof(*sym); > + for (j = 1; j < n; j++) { > + int ndx = sym[j].st_shndx; > + if (ndx == SHN_UNDEF || ndx == SHN_ABS) > + continue; > + sym[j].st_value += VDSO_HIGH_BASE; > + } > Does anyone expect the symbolic info to be correct? It might be better to just stomp it so nobody gets any ideas. On the other hand, we don't want to break compatibility with anything... > + } else if (strcmp(secstrings+sechdrs[i].sh_name, ".dynamic") == > 0) { > + Elf32_Dyn *dyn = (void *)hdr + sechdrs[i].sh_offset; > + int tag; > + while ((tag = (++dyn)->d_tag) != DT_NULL) > Um, no. > + } else if (strcmp(secstrings+sechdrs[i].sh_name, ".useless") == > 0) { > + /* This is demonic; see vsyscall.lds.S; it puts the > + * .got in a section named .useless */ > + uint32_t *got = (void *)hdr + sechdrs[i].sh_offset; > + *got += VDSO_HIGH_BASE; > + } > This won't get relocated with one of the other relocations? It's in the text phdr. > + } > + phdr = (void *)hdr + hdr->e_phoff; > + for (i = 0; i < hdr->e_phnum; i++) { > + phdr[i].p_vaddr += VDSO_HIGH_BASE; > + phdr[i].p_paddr += VDSO_HIGH_BASE; > + } > + > +#if 0 > +/* > + * To verify the binary image in memory is identical, linked in the VDSO page > + * from a COMPAT_VDSO compile without this patch; then diff the two. For a > + * non-relocated fixmap, the VDSO image is identical. > + */ > +{ > + extern const char vsyscall_orig_start, vsyscall_orig_end; > + int *l1 = (int *)page, *l2 = (int *)&vsyscall_orig_start; > + int foo = vsyscall_orig_end - vsyscall_orig_start / 4; > + for (i = 0; i < foo; i++) { > + if (l1[i] != l2[i]) { > + printk("vsyscall - delta [%03x] orig %08x
Re: [PATCH 10/22 take 3] UBI: EBA unit
On Thu, 15 Mar 2007 18:29:51 -0500 Josh Boyer wrote: > On Thu, Mar 15, 2007 at 02:24:10PM -0700, Randy Dunlap wrote: > > On Thu, 15 Mar 2007 11:07:03 -0800 Andrew Morton wrote: > > > > > > > > There's way too much code here to expect it to get decently reviewed, > > > alas. > > > > Yes. > > > > /me repeats wish that Not Everything Should Be Sent to lkml. :( > > Just curious, but where would you suggest this be sent to for review then? Valid question. I should have chosen some other more appropriate patch to make that comment. I don't see a better list for UBI patches, so lkml is OK IMO. Here is a summary of my thinking on Linux-related mailing lists. 1. Bug reports can go to lkml or focused mailing lists. 2. Development (like patches) should go to focused mailing lists if there is such a list and they have enough usage. Development areas that qualify for this IMO are: - ACPI - ATA - file systems - frame buffer - ieee1394 - MM/VM - multimedia - networking - PCI - power management, suspend/resume - SCSI - sound - USB - virtualization (not that I expect anything close to concensus on this) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX][PATCH] fixing placement of register stack under ulimit -s
plz allow me to explain more. "Why register-stack/memory-stack upside down is bad" is a bit complicated. So...this is a test and result for explaining bug. This is a sample code and its result on 2.6.21-rc3. Note: base address of memory'stack can be randomly change. == sample code == [EMAIL PROTECTED] ~]$ cat sample.c #include void do_print(int num) { if (num == 0) return; printf("%d\n",num); do_print(num - 1); } int main(int argc, char *argv[]) { do_print(1); return 0; } == before ulimit == [EMAIL PROTECTED] ~]$ uname -a Linux drpq 2.6.21-rc3 #3 SMP Fri Mar 16 11:57:47 JST 2007 ia64 ia64 ia64 GNU/Linux [EMAIL PROTECTED] ~]$ ulimit -s 8192 [EMAIL PROTECTED] ~]$ ulimit -s -H unlimited [EMAIL PROTECTED] ~]$ ./sample 1 1 [EMAIL PROTECTED] ~]$ == after ulimit -s 8192 == [EMAIL PROTECTED] ~]$ ulimit -s 8192 [EMAIL PROTECTED] ~]$ ulimit -s -H 8192 [EMAIL PROTECTED] ~]$ ./sample 1 9612 9611 9610 9609 9608 Segmentation fault [EMAIL PROTECTED] ~]$ ./sample (when I'm lucky) 1 1 [EMAIL PROTECTED] ~]$ = This number 9608 is too short to use up all stack. The reason of this is "ulimit -s + memory stack randomization + register-stack-expansion" is buggy. The program can only use one page for register stack if unlucky. My patch will fix this case. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC, PATCH] Fixup COMPAT_VDSO to work with CONFIG_PARAVIRT
Paravirt-ops guests which move the fixmap also end up moving the syscall VDSO. This fails if it is prelinked at a fixed address, which is why COMPAT_VDSO is broken under CONFIG_VMI (and also under CONFIG_XEN). Several options are available to try to address this. Jan had cooked up a patch for Xen that used build magic to find the parts of the VDSO that need relocation. I don't like the idea of having auto-generated relocations, as someday something could change between two linked objects (timestamp, elf notes perhaps) that is not a relocation. So I prefer human supervision over the relocation and explicitly fixing everything by hand. I'm not necessarily advocating one solution over the other; my way is more code to maintain if the VDSO linkage changes. I'm looking for feedback about which way is best. Also, it appears that COMPAT_VDSO could disappear entirely. Since this approach should work with older broken ld.so (2.3.2 is the version, I believe), we should be able to switch over completely to using the gate vma style of linking the vdso. One can even get the address randomization benefits by simply running fixup on the vdso if you are prepared to take the cost of allocating an extra page per process. Or you could randomize just once at boot, which makes the randomization per-machine, still sufficient to slow network based worm attacks which might rely on a fixed VDSO address. Clearly this patch needs more testing and feedback, which I'm sure it will get... Zach P.S. - Eric, I've copied you as you appear to be an ELF expert, or at least have a greater grasp of Elven Magic than me, and I'm hoping I got all the dynamic tags which need relocation right. Invoke black magic to relocate the VDSO even when COMPAT_VDSO is enabled by fixing up the ELF object. Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]> Index: linux-2.6.21/arch/i386/kernel/entry.S === --- linux-2.6.21.orig/arch/i386/kernel/entry.S 2007-03-06 18:51:33.0 -0800 +++ linux-2.6.21/arch/i386/kernel/entry.S 2007-03-15 18:14:11.0 -0800 @@ -305,16 +305,12 @@ sysenter_past_esp: pushl $(__USER_CS) CFI_ADJUST_CFA_OFFSET 4 /*CFI_REL_OFFSET cs, 0*/ -#ifndef CONFIG_COMPAT_VDSO /* * Push current_thread_info()->sysenter_return to the stack. * A tiny bit of offset fixup is necessary - 4*4 means the 4 words * pushed above; +8 corresponds to copy_thread's esp0 setting. */ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) -#else - pushl $SYSENTER_RETURN -#endif CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eip, 0 Index: linux-2.6.21/arch/i386/kernel/sysenter.c === --- linux-2.6.21.orig/arch/i386/kernel/sysenter.c 2007-03-06 18:51:34.0 -0800 +++ linux-2.6.21/arch/i386/kernel/sysenter.c2007-03-15 18:27:43.0 -0800 @@ -72,6 +72,99 @@ extern const char vsyscall_int80_start, extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; static struct page *syscall_pages[1]; +#ifdef CONFIG_COMPAT_VDSO +static void fixup_vsyscall_elf(char *page) +{ + Elf32_Ehdr *hdr; + Elf32_Shdr *sechdrs; + Elf32_Phdr *phdr; + char *secstrings; + int i, j, n; + + hdr = (Elf32_Ehdr *)page; + + printk("Remapping vsyscall page to %08x\n", (unsigned int)VDSO_HIGH_BASE); + + /* Sanity checks against insmoding binaries or wrong arch, + weird elf version */ + if (memcmp(hdr->e_ident, ELFMAG, 4) != 0 || + !elf_check_arch(hdr) || + hdr->e_type != ET_DYN) + panic("Bogus ELF in vsyscall DSO\n"); + + hdr->e_entry += VDSO_HIGH_BASE; + sechdrs = (void *)hdr + hdr->e_shoff; + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; + + for (i = 1; i < hdr->e_shnum; i++) { + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) + continue; + + sechdrs[i].sh_addr += VDSO_HIGH_BASE; + if (strcmp(secstrings+sechdrs[i].sh_name, ".dynsym") == 0) { + Elf32_Sym *sym = (void *)hdr + sechdrs[i].sh_offset; + n = sechdrs[i].sh_size / sizeof(*sym); + for (j = 1; j < n; j++) { + int ndx = sym[j].st_shndx; + if (ndx == SHN_UNDEF || ndx == SHN_ABS) + continue; + sym[j].st_value += VDSO_HIGH_BASE; + } + } else if (strcmp(secstrings+sechdrs[i].sh_name, ".dynamic") == 0) { + Elf32_Dyn *dyn = (void *)hdr + sechdrs[i].sh_offset; + int tag; + while ((tag = (++dyn)->d_tag) != DT_NULL) + switch(tag)
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Fri, Mar 16, 2007 at 08:48:08AM +0900, Horms wrote: > On Thu, Mar 15, 2007 at 06:56:16PM +0530, Vivek Goyal wrote: > > On Thu, Mar 15, 2007 at 12:22:57PM +, Ian Campbell wrote: > > > On Thu, 2007-03-15 at 11:17 +0530, Vivek Goyal wrote: > > > > > > But I think changing this macro might run into issues. It is > > > > > > being used at few places in kernel, for example while loading > > > > > > module. This will essentially mean that we allow loading 64bit > > > > > > x86_64 modules on 32bit i386 systems? > > > > > > Yes, not sure how I missed that fact... > > > > > > > Kexec will also not allow loading an x86_64 kernel on a 32bit machine. > > > > > > For crash kernel only or for regular kexec too? > > > > > > > I think for both. One of the possible reasons I think is that one never > > knows is underlying machine has got 64bit extensions or not. So even if > > we load the kernel it will never boot. Secondly, we might not be able to > > handle 64bit address in 32bit kernel/user space? > > Perhaps I am miss-understanding what you are saying, but I do > recally kexecing from 32->64 and 64->32 bit kernels on x86_64 hardware. > I can run these checks again if it helps. > Yesterday I tested it. I could kexec from 64->32bit but not vice versa. kexec-tools itself gave error message. "Cannot determine the file type of ../x86_64-vmlinux/vmlinux" I did not investigate deeper but I got a basic question. How will kexec know that underlying 32bit machine supports 64bit extensions or not? Do we allow loading 64bit kernel even underlying machine might not support it? Probably you can also give it a try. > > > > So how about something like vmcore_elf_allowed_cross_arch()? Vmcore > > > > code can continue to check elf_check_arch() and if that fails it can > > > > invoke vmcore_elf_allowed_cross_arch() to find out what cross arch are > > > > allowed for vmcore. > > > > > > Something like this? > > > > > > Ian. > > > > > > --- > > > > > > Allow i386 crash kernels to handle x86_64 dumps. > > > > > > The specific case I am encountering is kdump under Xen with a 64 bit > > > hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit > > > due to the hypervisor but the dump kernel is 32 bit in for maximum > > > compatibility. > > > > > > It's possibly less likely to be useful in a purely native scenario but > > > I see no reason to disallow it. > > > > > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> > > > > > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > > > index d960507..523e109 100644 > > > --- a/fs/proc/vmcore.c > > > +++ b/fs/proc/vmcore.c > > > @@ -514,7 +514,7 @@ static int __init parse_crash_elf64_headers(void) > > > /* Do some basic Verification. */ > > > if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 || > > > (ehdr.e_type != ET_CORE) || > > > - !elf_check_arch(&ehdr) || > > > + !vmcore_elf_check_arch(&ehdr) || > > > ehdr.e_ident[EI_CLASS] != ELFCLASS64 || > > > ehdr.e_ident[EI_VERSION] != EV_CURRENT || > > > ehdr.e_version != EV_CURRENT || > > > diff --git a/include/asm-i386/kexec.h b/include/asm-i386/kexec.h > > > index 4dfc9f5..c76737e 100644 > > > --- a/include/asm-i386/kexec.h > > > +++ b/include/asm-i386/kexec.h > > > @@ -47,6 +47,9 @@ > > > /* The native architecture */ > > > #define KEXEC_ARCH KEXEC_ARCH_386 > > > > > > +/* We can also handle crash dumps from 64 bit kernel. */ > > > +#define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) > > > + > > > > Ideal place for this probably should have been arch dependent crash_dump.h > > file. But we don't have one and no point introducing one just for this > > macro. > > > > This change looks good to me. > > Won't the above change break non i386 archtectures as > vmcore_elf_check_arch_cross isn't defined for them? > In original patch he has put an arch independent definition in include/linux/crash_dump.h which will make sure it is not broken on other architectures. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On 3/16/07, Horms <[EMAIL PROTECTED]> wrote: On Thu, Mar 15, 2007 at 06:56:16PM +0530, Vivek Goyal wrote: > On Thu, Mar 15, 2007 at 12:22:57PM +, Ian Campbell wrote: > > On Thu, 2007-03-15 at 11:17 +0530, Vivek Goyal wrote: > > > > > But I think changing this macro might run into issues. It is > > > > > being used at few places in kernel, for example while loading > > > > > module. This will essentially mean that we allow loading 64bit > > > > > x86_64 modules on 32bit i386 systems? > > > > Yes, not sure how I missed that fact... > > > > > Kexec will also not allow loading an x86_64 kernel on a 32bit machine. > > > > For crash kernel only or for regular kexec too? > > > > I think for both. One of the possible reasons I think is that one never > knows is underlying machine has got 64bit extensions or not. So even if > we load the kernel it will never boot. Secondly, we might not be able to > handle 64bit address in 32bit kernel/user space? Perhaps I am miss-understanding what you are saying, but I do recally kexecing from 32->64 and 64->32 bit kernels on x86_64 hardware. I can run these checks again if it helps. I recall kexecing a bzImage for x86_64 on i386, but I'm not 100% sure. I think it worked because the bzImage loader code was regular 32 bit x86 code, but that may be wrong as well. Won't the above change break non i386 archtectures as vmcore_elf_check_arch_cross isn't defined for them? Right. And maybe it's a good idea to make sure that this feature is actually supported by kexec-tools before adding code to the kernel? My gut feeling about this is that you are begging for trouble. The kexec/kdump solution is fragile just by itself, and trying to go between architectures is just going to be painful. / magnus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_REORDER Kconfig help strange sentence.
On Tue, 13 Mar 2007 17:37:35 +1100 Rusty Russell wrote: > On Tue, 2007-03-13 at 00:56 +0100, Andi Kleen wrote: > > On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote: > > > OK, this confused me: > > > > > > Function reordering (REORDER) [N/y/?] (NEW) ? > > > > > > This option enables the toolchain to reorder functions for a more > > > optimal TLB usage. If you have pretty much any version of > > > binutils, > > > this can increase your kernel build time by roughly one minute. > > > > > > "If you have pretty much any version of binutils"? Huh? > > > > > > You mean "This will slow your kernel build by about a minute"? > > > > Yes. Lots of sections seem to trigger some quadratic behaviour in ld. > > > > It might be fixed in some unreleased CVS version though (not 100% sure) > > > > -Andi > > OK, well here is a patch for the moment. > > == > Clarify CONFIG_REORDER explanation > > if (1 && X) => if (X). > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> > > diff -r de5618b5e562 arch/x86_64/Kconfig > --- a/arch/x86_64/Kconfig Tue Mar 13 11:41:55 2007 +1100 > +++ b/arch/x86_64/Kconfig Tue Mar 13 17:27:05 2007 +1100 > @@ -632,8 +632,8 @@ config REORDER > default n > help > This option enables the toolchain to reorder functions for a more > - optimal TLB usage. If you have pretty much any version of binutils, > - this can increase your kernel build time by roughly one minute. > + optimal TLB usage. This will slow your kernel build by > + roughly one minute. Please consistently use for help text. Yes, it was already mucked up. > config K8_NB > def_bool y --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Thu, Mar 15, 2007 at 01:42:39PM +, Ian Campbell wrote: > On Thu, 2007-03-15 at 18:56 +0530, Vivek Goyal wrote: > > > > Ideal place for this probably should have been arch dependent > > crash_dump.h file. But we don't have one and no point introducing one > > just for this macro. > > Agreed. > > > This change looks good to me. > > Is there a kdump tree which you'll apply to or shall I resend CCing > apkm? (I'll add an Acked-by if that's ok). > There is no separate kdump tree. Generally Andrew picks up these changes. I guess just resend it copying Andrew. Yes you can add Acked-by me. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Loading both the pata_atiixp and the ahci driver causes problems
Chuck Ebbert wrote: If you try to load both the pata_atiixp and the ahci driver (for the same ATI SB600 adapter), very strange things happen. The AHCI driver churns for three minutes or so, spewing messages like this, then nothing works: <6>ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) <4>ata3.00: qc timeout (cmd 0xec) <4>ata3.00: failed to IDENTIFY (I/O error, err_mask=0x104) Shouldn't it be able to tell the device has already been claimed by some other driver? One would assume it'd fail to grab the PCI IO ranges twice? I haven't looked at the code but I have seen this bug mentioned elsewhere so I might well end up having to do that yet :-) Jon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 kernel oops
Joerg Platte naasa.net> writes: > Pid: 14, comm: events/0 Not tainted 2.6.18-4-amd64 #1 > RIP: 0010:[] [] keyring_destroy+0x32/0x96 [Snip] > Can this oops be caused by a known and already > fixed problem in a newer kernel versions? In this case I would submit a bug > to the Debian BTS. Otherwise what can I do to further reproduce and debug > this oops? > Check out http://bugzilla.kernel.org/show_bug.cgi?id=8067 which is a duplicate of http://bugzilla.kernel.org/show_bug.cgi?id=7727 which is fixed. There is a patch available on the bugzilla if you want to try it out. HTH Parag - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] scc_pata: move from ide/ppc to ide/pci
This patch moves scc_pata from ide/ppc to ide/pci in order to build it in normal module. Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]> Signed-off-by: Akira Iguchi <[EMAIL PROTECTED]> --- diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff linux-2.6.21-rc3/drivers/ide/pci/scc_pata.c linux-2.6.21-rc3.mod/drivers/ide/pci/scc_pata.c --- linux-2.6.21-rc3/drivers/ide/pci/scc_pata.c 1970-01-01 09:00:00.0 +0900 +++ linux-2.6.21-rc3.mod/drivers/ide/pci/scc_pata.c 2007-03-16 18:47:36.0 +0900 @@ -0,0 +1,858 @@ +/* + * Support for IDE interfaces on Celleb platform + * + * (C) Copyright 2006 TOSHIBA CORPORATION + * + * This code is based on drivers/ide/pci/siimage.c: + * Copyright (C) 2001-2002 Andre Hedrick <[EMAIL PROTECTED]> + * Copyright (C) 2003 Red Hat <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define PCI_DEVICE_ID_TOSHIBA_SCC_ATA0x01b4 + +#define SCC_PATA_NAME "scc IDE" + +#define TDVHSEL_MASTER 0x0001 +#define TDVHSEL_SLAVE 0x0004 + +#define MODE_JCUSFEN0x0080 + +#define CCKCTRL_ATARESET0x0004 +#define CCKCTRL_BUFCNT 0x0002 +#define CCKCTRL_CRST0x0001 +#define CCKCTRL_OCLKEN 0x0100 +#define CCKCTRL_ATACLKOEN 0x0002 +#define CCKCTRL_LCLKEN 0x0001 + +#define QCHCD_IOS_SS 0x0001 + +#define QCHSD_STPDIAG 0x0002 + +#define INTMASK_MSK 0xD112 +#define INTSTS_SERROR 0x8000 +#define INTSTS_PRERR 0x4000 +#define INTSTS_RERR0x1000 +#define INTSTS_ICERR 0x0100 +#define INTSTS_BMSINT 0x0010 +#define INTSTS_BMHE0x0008 +#define INTSTS_IOIRQS 0x0004 +#define INTSTS_INTRQ0x0002 +#define INTSTS_ACTEINT 0x0001 + +#define ECMODE_VALUE 0x01 + +static struct scc_ports { + unsigned long ctl, dma; + unsigned char hwif_id; /* for removing hwif from system */ +} scc_ports[MAX_HWIFS]; + +/* PIO transfer mode table */ +/* JCHST */ +static unsigned long JCHSTtbl[2][7] = { + {0x0E, 0x05, 0x02, 0x03, 0x02, 0x00, 0x00}, /* 100MHz */ + {0x13, 0x07, 0x04, 0x04, 0x03, 0x00, 0x00}/* 133MHz */ +}; + +/* JCHHT */ +static unsigned long JCHHTtbl[2][7] = { + {0x0E, 0x02, 0x02, 0x02, 0x02, 0x00, 0x00}, /* 100MHz */ + {0x13, 0x03, 0x03, 0x03, 0x03, 0x00, 0x00}/* 133MHz */ +}; + +/* JCHCT */ +static unsigned long JCHCTtbl[2][7] = { + {0x1D, 0x1D, 0x1C, 0x0B, 0x06, 0x00, 0x00}, /* 100MHz */ + {0x27, 0x26, 0x26, 0x0E, 0x09, 0x00, 0x00}/* 133MHz */ +}; + + +/* DMA transfer mode table */ +/* JCHDCTM/JCHDCTS */ +static unsigned long JCHDCTxtbl[2][7] = { + {0x0A, 0x06, 0x04, 0x03, 0x01, 0x00, 0x00}, /* 100MHz */ + {0x0E, 0x09, 0x06, 0x04, 0x02, 0x01, 0x00}/* 133MHz */ +}; + +/* JCSTWTM/JCSTWTS */ +static unsigned long JCSTWTxtbl[2][7] = { + {0x06, 0x04, 0x03, 0x02, 0x02, 0x02, 0x00}, /* 100MHz */ + {0x09, 0x06, 0x04, 0x02, 0x02, 0x02, 0x02}/* 133MHz */ +}; + +/* JCTSS */ +static unsigned long JCTSStbl[2][7] = { + {0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x00}, /* 100MHz */ + {0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05}/* 133MHz */ +}; + +/* JCENVT */ +static unsigned long JCENVTtbl[2][7] = { + {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00}, /* 100MHz */ + {0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02}/* 133MHz */ +}; + +/* JCACTSELS/JCACTSELM */ +static unsigned long JCACTSELtbl[2][7] = { + {0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00}, /* 100MHz */ + {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01}/* 133MHz */ +}; + + +static u8 scc_ide_inb(unsigned long port) +{ + u32 data = in_be32((void*)port); + return (u8)data; +} + +static u16 scc_ide_inw(unsigned long port) +{ + u32 data = in_be32((void*)port); + return (u16)data; +} + +static void scc_ide_insw(unsigned long port, void *addr, u32 count) +{ + u16 *ptr = (u16 *)addr; + while (count--) { + *ptr++ = le16_to_cpu(in_be32((void*)port)); + } +} + +static void scc_ide_insl(unsigned long
[PATCH 1/2] scc_pata: dependency fix
This patch fixes: * the dependency of scc_pata on BLK_DEV_IDEDMA_PCI * incorrect link to ide-core * move scc_pata from ide/ppc to ide/pci Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]> Signed-off-by: Akira Iguchi <[EMAIL PROTECTED]> --- diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff linux-2.6.21-rc3/drivers/ide/Kconfig linux-2.6.21-rc3.mod/drivers/ide/Kconfig --- linux-2.6.21-rc3/drivers/ide/Kconfig2007-03-07 13:41:20.0 +0900 +++ linux-2.6.21-rc3.mod/drivers/ide/Kconfig2007-03-16 18:49:04.0 +0900 @@ -769,6 +769,14 @@ config BLK_DEV_TC86C001 help This driver adds support for Toshiba TC86C001 GOKU-S chip. +config BLK_DEV_CELLEB + tristate "Toshiba's Cell Reference Set IDE support" + depends on PPC_CELLEB + help + This driver provides support for the built-in IDE controller on + Toshiba Cell Reference Board. + If unsure, say Y. + endif config BLK_DEV_IDE_PMAC @@ -800,14 +808,6 @@ config BLK_DEV_IDEDMA_PMAC to transfer data to and from memory. Saying Y is safe and improves performance. -config BLK_DEV_IDE_CELLEB - bool "Toshiba's Cell Reference Set IDE support" - depends on PPC_CELLEB - help - This driver provides support for the built-in IDE controller on - Toshiba Cell Reference Board. - If unsure, say Y. - config BLK_DEV_IDE_SWARM tristate "IDE for Sibyte evaluation boards" depends on SIBYTE_SB1xxx_SOC diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff linux-2.6.21-rc3/drivers/ide/Makefile linux-2.6.21-rc3.mod/drivers/ide/Makefile --- linux-2.6.21-rc3/drivers/ide/Makefile 2007-03-07 13:41:20.0 +0900 +++ linux-2.6.21-rc3.mod/drivers/ide/Makefile 2007-03-16 18:48:02.0 +0900 @@ -37,7 +37,6 @@ ide-core-$(CONFIG_BLK_DEV_Q40IDE) += leg # built-in only drivers from ppc/ ide-core-$(CONFIG_BLK_DEV_MPC8xx_IDE) += ppc/mpc8xx.o ide-core-$(CONFIG_BLK_DEV_IDE_PMAC)+= ppc/pmac.o -ide-core-$(CONFIG_BLK_DEV_IDE_CELLEB) += ppc/scc_pata.o # built-in only drivers from h8300/ ide-core-$(CONFIG_H8300) += h8300/ide-h8300.o diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff linux-2.6.21-rc3/drivers/ide/pci/Makefile linux-2.6.21-rc3.mod/drivers/ide/pci/Makefile --- linux-2.6.21-rc3/drivers/ide/pci/Makefile 2007-03-07 13:41:20.0 +0900 +++ linux-2.6.21-rc3.mod/drivers/ide/pci/Makefile 2007-03-16 18:49:05.0 +0900 @@ -3,6 +3,7 @@ obj-$(CONFIG_BLK_DEV_AEC62XX) += aec62x obj-$(CONFIG_BLK_DEV_ALI15X3) += alim15x3.o obj-$(CONFIG_BLK_DEV_AMD74XX) += amd74xx.o obj-$(CONFIG_BLK_DEV_ATIIXP) += atiixp.o +obj-$(CONFIG_BLK_DEV_CELLEB) += scc_pata.o obj-$(CONFIG_BLK_DEV_CMD64X) += cmd64x.o obj-$(CONFIG_BLK_DEV_CS5520) += cs5520.o obj-$(CONFIG_BLK_DEV_CS5530) += cs5530.o - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] USB Keyboard
On Thu, 15 Mar 2007, linux-os (Dick Johnson) wrote: > It's not the same hardware and all the machines that I tried that > have keyboards end up WORKING with the USB keyboard as well! But > Dmitry Torokhov was right! I just burned a CD with all three modules, > and the keyboard works! I didn't bother to check the DEBUG messages. Congratulations. Sometimes these problems have easy solutions. :-) > It's interesting that the "wrong" module loaded fine with no warnings > that it might not be the correct one! There's no warning because the driver doesn't know anything is wrong. Even though it may not find any devices to manage when it first gets loaded, there's nothing to prevent you adding, for example, a PC-card with a USB controller on it at some later time. That's true in general for most Linux drivers. (The ones that aren't platform-specific, anyway.) They don't look for devices to manage at load time; instead the driver core calls their probe() routine later on. Consequently drivers can't tell at load time whether there will be any useful work for them to do. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] Fix rmmod/read/write races in /proc entries
On Sun, 11 Mar 2007 20:04:56 +0300 Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > Differences from version 4: > Updated in-code comments. Largely rewritten changelog. > Lockdep please. --akpm > ->read_proc, ->write_proc aren't special, Extend protection to > most methods for regular /proc files. Mentioned by viro. > Differences from version 3: > Use completion instead of unlock/schedule/lock > Move refcount waiting business after removing PDE from lists, > so that *cough* possible concurrent remove_proc_entry() will > work. My, what a lot of code you have here. I note that nobody can be assed even reviewing it. Now why is that? > Fix following races: > === > 1. Write via ->write_proc sleeps in copy_from_user(). Module disappears >meanwhile. Or, more generically, system call done on /proc file, method >supplied by module is called, module dissapeares meanwhile. > >pde = create_proc_entry() >if (!pde) > return -ENOMEM; >pde->write_proc = ... > open > write > copy_from_user >pde = create_proc_entry(); >if (!pde) { > remove_proc_entry(); > return -ENOMEM; > /* module unloaded */ >} We usually fix that race by pinning the module: make whoever registered the proc entries also register their THIS_MODULE, do a try_module_get() on it before we start to play with data structures which the module owns. Can we do that here? And is the above race fix related to the below one in any fashion? > == > 2. bogo-revoke aka proc_kill_inodes() > > remove_proc_entry vfs_read > proc_kill_inodes[check ->f_op validness] > [check ->f_op->read validness] > [verify_area, security permissions checks] > ->f_op = NULL; > if (file->f_op->read) > /* ->f_op dereference, boom */ So you fixed this via sort-of-refcounting on pde->pde_users. hmm. > NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's > see how this scheme behaves, then extend if needed for directories. > Directories creators in /proc only set ->owner for them, so proxying for > directories may be unneeded. > > NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write, > ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release. > If your in-tree module uses something else, yell on me. Full audit pending. > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> > --- > > fs/proc/generic.c | 32 + > fs/proc/inode.c | 279 > +++- > include/linux/proc_fs.h | 13 ++ > 3 files changed, 321 insertions(+), 3 deletions(-) > > --- a/fs/proc/generic.c > +++ b/fs/proc/generic.c > @@ -20,6 +20,7 @@ #include > #include > #include > #include > +#include > #include > > #include "internal.h" > @@ -613,6 +614,9 @@ static struct proc_dir_entry *proc_creat > ent->namelen = len; > ent->mode = mode; > ent->nlink = nlink; > + ent->pde_users = 0; > + spin_lock_init(&ent->pde_unload_lock); > + ent->pde_unload_completion = NULL; > out: > return ent; > } > @@ -734,9 +738,35 @@ void remove_proc_entry(const char *name, > de = *p; > *p = de->next; > de->next = NULL; > + > + spin_lock(&de->pde_unload_lock); > + /* > + * Stop accepting new callers into module. If you're > + * dynamically allocating ->proc_fops, save a pointer somewhere. > + */ > + de->proc_fops = NULL; > + /* Wait until all existing callers into module are done. */ > + if (de->pde_users > 0) { > + DECLARE_COMPLETION_ONSTACK(c); > + > + if (!de->pde_unload_completion) > + de->pde_unload_completion = &c; > + > + spin_unlock(&de->pde_unload_lock); > + spin_unlock(&proc_subdir_lock); > + > + wait_for_completion(de->pde_unload_completion); > + > + spin_lock(&proc_subdir_lock); > + goto continue_removing; > + } > + spin_unlock(&de->pde_unload_lock); > + > +continue_removing: > if (S_ISDIR(de->mode)) > parent->nlink--; > - proc_kill_inodes(de); > + if (!S_ISREG(de->mode)) > + proc_kill_inodes(de); > de->nlink = 0; > WARN_ON(de->subdir); > if (!atomic_read(&de->count)) > --- a/fs/proc/inode.c > +++ b/fs/proc/inode.c > @@ -142,6 +142,277 @@ static const struct super_operations pro > .remount_fs
Re: PCI DAC DMA APIs
From: Christoph Hellwig <[EMAIL PROTECTED]> Date: Thu, 15 Mar 2007 19:18:34 + > On Thu, Mar 15, 2007 at 12:38:13PM +, Jan Beulich wrote: > > While the kernel headers provide for this, there don't appear to be any > > in-tree users (which seems contrary to general Linux policies). Would there > > be objections to remove all of these? > > They should go away. Having them in for more than five years without > any users is almost a guarantee for bitrot. Yes, probably we should get rid of them. The idea wasn't sparc optimizations, it was for things like those Dolphin clustering cards that essentially want to get at all of physical memory from the PCI card. The IOMMU is a limited resource, so at the expense of lack of prefetching and write caching we provide a way to do unlimited DMA mapping with 64-bit DAC addresses. None of these drivers ever got integrated, so it's a total loss. Someone will complain when we pull it out, but fsck them, they had years to do something about this. :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] revoke: core code
On Sun, 11 Mar 2007 13:30:49 +0200 (EET) Pekka J Enberg <[EMAIL PROTECTED]> wrote: > From: Pekka Enberg <[EMAIL PROTECTED]> > > The revokeat(2) and frevoke(2) system calls invalidate open file > descriptors and shared mappings of an inode. After an successful > revocation, operations on file descriptors fail with the EBADF or > ENXIO error code for regular and device files, > respectively. Attempting to read from or write to a revoked mapping > causes SIGBUS. > > The actual operation is done in two passes: > > 1. Revoke all file descriptors that point to the given inode. We do > this under tasklist_lock so that after this pass, we don't need > to worry about racing with close(2) or dup(2). > > 2. Take down shared memory mappings of the inode and close all file > pointers. > > The file descriptors and memory mapping ranges are preserved until the > owning task does close(2) and munmap(2), respectively. > > ... > > +asmlinkage int sys_revokeat(int dfd, const char __user *filename); > +asmlinkage int sys_frevoke(unsigned int fd); n all system calls must return long. > +static int revoke_vma(struct vm_area_struct *vma, struct zap_details > *details) > +{ > + unsigned long restart_addr, start_addr, end_addr; > + int need_break; > + > + start_addr = vma->vm_start; > + end_addr = vma->vm_end; > + > + /* > + * Not holding ->mmap_sem here. > + */ > + vma->vm_flags |= VM_REVOKED; so the modification of vm_flags is racy? > + smp_mb(); Please always document barriers. There's presumably some vm_flags reader we're concerned about here, but how is the code reader to know what the code writer was thinking? > + again: > + restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr, > + details); > + > + need_break = need_resched() || need_lockbreak(details->i_mmap_lock); > + if (need_break) > + goto out_need_break; > + > + if (restart_addr < end_addr) { > + start_addr = restart_addr; > + goto again; > + } > + return 0; > + > + out_need_break: > + spin_unlock(details->i_mmap_lock); > + cond_resched(); > + spin_lock(details->i_mmap_lock); > + return -EINTR; > +} > + > +static int revoke_mapping(struct address_space *mapping, struct file > *to_exclude) > +{ > + struct vm_area_struct *vma; > + struct prio_tree_iter iter; > + struct zap_details details; > + int err = 0; > + > + details.i_mmap_lock = &mapping->i_mmap_lock; > + > + spin_lock(&mapping->i_mmap_lock); > + vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX) { > + if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) { > + err = revoke_vma(vma, &details); > + if (err) > + goto out; > + } > + } > + > + list_for_each_entry(vma, &mapping->i_mmap_nonlinear, > shared.vm_set.list) { > + if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) { > + err = revoke_vma(vma, &details); > + if (err) > + goto out; > + } > + } > + out: > + spin_unlock(&mapping->i_mmap_lock); > + return err; > +} This all looks very strange. If the calling process expires its timeslice, the entire system call fails? What's happening here? > + > +int generic_file_revoke(struct file *file) > +{ > + int err; > + > + /* > + * Flush pending writes. > + */ > + err = do_fsync(file, 1); > + if (err) > + goto out; > + > + /* > + * Make pending reads fail. > + */ > + err = invalidate_inode_pages2(file->f_mapping); > + > + out: > + return err; > +} > + > +EXPORT_SYMBOL(generic_file_revoke); do_fsync() is seriously suboptimal - it will run an ext3 commit. do_sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) will run maybe five times quicker. But otoh, do_sync_file_range() will fail to write back the pages for a data=journal ext3 file, I expect (oops). Why is this code using invalidate_inode_pages2()? That function keeps on breaking, has ill-defined semantics and will probably change in the future. Exactly what semantics are you looking for here, and why? The blank line before the EXPORT_SYMBOL() is a waste of space. > +/* > + * Filesystem for revoked files. > + */ > + > +static struct inode *revokefs_alloc_inode(struct super_block *sb) > +{ > + struct revokefs_inode_info *info; > + > + info = kmem_cache_alloc(revokefs_inode_cache, GFP_NOFS); > + if (!info) > + return NULL; > + > + return &info->vfs_inode; > +} Why GFP_NOFS? > === > --- /dev/null 1970-01-01 00:00:00.0 + > +++ uml-2.6/include/linux/revoked_fs_i.h 2007-
Re: Summary of resource management discussion
On Thu, Mar 15, 2007 at 12:12:50PM -0700, Paul Menage wrote: > There are some things that benefit from having an abstract > container-like object available to store state, e.g. "is this > container deleted?", "should userspace get a callback when this > container is empty?". IMO we can still get these bits of information using nsproxy itself (I admit I haven't looked at the callback requirement yet). But IMO a bigger use of 'struct container' object in your patches is to store hierarchical information and avoid /repeating/ that information in each resource object (struct cpuset, struct cpu_limit, struct rss_limit etc) a 'struct container' is attached to (as pointed out here : http://lkml.org/lkml/2007/3/7/356). However I don't know how many controllers will ever support such hierarchical res mgmt and thats why I said option 3 [above URL] may not be a bad compromise. Also if you find a good answer for my earlier question "what more task-grouping behavior do you want to implement using an additional pointer that you can't reusing ->task_proxy", it would drive home the need for additional pointers/structures. > >> >a. Paul Menage's patches: > >> > > >> >(tsk->containers->container[cpu_ctlr.subsys_id] - X)->cpu_limit > >> > >> So what's the '-X' that you're referring to > > > >Oh ..that's to seek pointer to begining of the cpulimit structure (subsys > >pointer in 'struct container' points to a structure embedded in a larger > >structure. -X gets you to point to the larger structure). > > OK, so shouldn't that be listed as an overhead for your rcfs version > too? X shouldn't be needed in rcfs patches, because "->ctlr_data" in nsproxy can directly point to the larger structure (there is no 'struct container_subsys_state' equivalent in rcfs patches). Container patches: (tsk->containers->container[cpu_ctlr.subsys_id] - X)->cpu_limit rcfs: tsk->nsproxy->ctlr_data[cpu_ctlr.subsys_id]->cpu_limit > >Yes me too. But maybe to keep in simple in initial versions, we should > >avoid that optimisation and at the same time get statistics on duplicates?. > > That's an implementation detail - we have more important points to > agree on right now ... yes :) Eric, did you have any opinion on this thread? -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/13] BLK_DEV_IDE_CELLEB dependency fix
Hi, > Bart wrote: >> Al wrote: >> So AFAICS the minimal fix for that sucker is dependency on BLK_DEV_IDE=y; >> however, I really wonder if >> * it needs to be linked into ide-core (as opposed to being a normal >> module of its own) > >AFAICS there are no legacy device ordering issues with scc_pata so it doesn't >need to be linked into ide-core but I'll leave the definitive answer to Akira > >> * alternatively, its init should be called explicitly. I don't have the answer why scc_pata is linked into ide-core. Reviewing your comments and codes, I will make the following fixes: * remove link to ide-core and make normal module * move from ide/ppc to ide/pci I will send these patches later. Best regards, Akira Iguchi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
core2 duo, interrupts: is this normal?
Hello, is this output, normal? I meant, why counters on CPU1 is zero? Isn't this balanced? $ cat /proc/interrupts CPU0 CPU1 0:4180170 0 IO-APIC-edge timer 1: 8060 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 9: 0 0 IO-APIC-fasteoi acpi 12: 5 0 IO-APIC-edge i8042 16: 322297 0 IO-APIC-fasteoi uhci_hcd:usb3, libata, nvidia, EMU10K1 17: 896399 0 IO-APIC-fasteoi bttv0, eth0, libata 18: 72867 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb7 19: 27770 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5 20: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb6 22: 3 0 IO-APIC-fasteoi ohci1394 23:155 0 IO-APIC-fasteoi HDA Intel 219: 103056 0 PCI-MSI-edge libata NMI: 0 0 LOC:40776134077622 ERR: 0 MIS: 0 Many thanks in advance, Norberto - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd
On Wed, Mar 14, 2007 at 12:34:29PM +0100, Marco Berizzi wrote: > Hello everybody. > Since 2.6.19.2 + commit 7fbbb01dca7704d52ace6f45a805c98a5b0362f9 What commit is that? gitweb search tells me it's an nmi watchdog change. Doesn't seem likely to change XFS behaviour - can you post a url to the commit? > I'm experimenting these errors. > 2.6.19.1 has been worked good for more > than 30 days. With the above commit? > I have reverted back to 2.6.19.1 to see if > this problem happens again. without the above commit? > find_or_create_page+0x37/0x8e > _xfs_buf_lookup_pages+0x132/0x2ea > _xfs_buf_initialize+0xc8/0xf6 > xfs_buf_get_flags+0xf8/0x11d > xfs_buf_read_flags+0x1c/0x7f > xfs_trans_read_buf+0x16a/0x34f > xfs_itobp+0x7c/0x242 > xfs_iread+0x68/0x1d3 > xfs_iget_core+0xe7/0x687 > xfs_iget+0xd8/0x150 > xfs_dir_lookup_int+0x98/0x10e > xfs_lookup+0x5a/0x90 > xfs_vn_lookup+0x52/0x93 Curious - never seen this before - possibly a corrupted inode number in the directory has led to this. > ba 4e 8b cd > Mar 12 14:35:21 Pleiadi kernel: Filesystem "sda8": XFS internal error > xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller > 0xc01b00bd > Mar 12 14:35:21 Pleiadi kernel: [] xfs_da_do_buf+0x70c/0x7b1 > Mar 12 14:35:21 Pleiadi kernel: [] xfs_da_read_buf+0x30/0x35 > Mar 12 14:35:21 Pleiadi kernel: [] xfs_da_read_buf+0x30/0x35 Hmm - these could simply be follow-on errors from the first problem - the buffer would now probably be bad or corrupted, and the directory buffer read code here is saying the buffer is bad. All the errors appear to have thesame data in the buffer (which is lacking the correct magic numbers) so i'd say they are related to the above error. Can you run xfs_repair on that filesystem and see if reports (and fixes) any problems? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Return EPERM not ECHILD on security_task_wait failure
wait* syscalls return -ECHILD even when an individual PID of a live child was requested explicitly, when security_task_wait denies the operation. This means that something like a broken SELinux policy can produce an unexpected failure that looks just like a bug with wait or ptrace or something. This patch makes do_wait return -EPERM instead of -ECHILD if some children were ruled out solely because security_task_wait failed. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- kernel/exit.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index f132349..a41052f 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1067,7 +1067,7 @@ static int eligible_child(pid_t pid, int return 2; if (security_task_wait(p)) - return 0; + return -1; return 1; } @@ -1449,6 +1449,7 @@ static long do_wait(pid_t pid, int optio DECLARE_WAITQUEUE(wait, current); struct task_struct *tsk; int flag, retval; + int allowed, denied; add_wait_queue(¤t->signal->wait_chldexit,&wait); repeat: @@ -1457,6 +1458,7 @@ repeat: * match our criteria, even if we are not able to reap it yet. */ flag = 0; + allowed = denied = 0; current->state = TASK_INTERRUPTIBLE; read_lock(&tasklist_lock); tsk = current; @@ -1472,6 +1474,12 @@ repeat: if (!ret) continue; + if (unlikely(ret < 0)) { + denied = 1; + continue; + } + allowed = 1; + switch (p->state) { case TASK_TRACED: /* @@ -1570,6 +1578,8 @@ check_continued: goto repeat; } retval = -ECHILD; + if (unlikely(denied) && !allowed) + retval = -EPERM; end: current->state = TASK_RUNNING; remove_wait_queue(¤t->signal->wait_chldexit,&wait); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.20] pwc : Cisco VT Camera support
Hi, I already sent this e-mail to Luc and on the pwc mailing list, and got no answer. I'm trying again with the hope that this patch would go in the kernel... I have a Cisco VT Camera, and it was just collecting dust. I decided to try connecting it to my Linux box at home. Just a disgression about the product. The Cisco VT Camera is a webcam Cisco sold to work with their IP phone hardware and software. It's mostly useless on Windows, as it interfaces only to Cisco software. You can find some for cheap on eBay... Physically, it's just a Logitech Pro 4000. The only difference with the Pro 4000 is the Cisco logo and that it's grey like the Pro 3000. I believe Cisco is now selling the Cisco VT Camera II, which look to be something else... So, assuming that it was a Pro 4000 inside, I created the little patch attached. I'm new to webcam under Linux, but I managed to get an image from it using xawtv, and the image looked all right, so I consider that a success. The imaged seemed a bit small and I could not get the microphone driver loaded, but I assume it's my lack of experience. Note that I did not try any other type_id, but this one works great. Have fun... Jean --- diff -u -p linux/drivers/media/video/pwc/pwc-if.c~ linux/drivers/media/video/pwc/pwc-if.c --- linux/drivers/media/video/pwc/pwc-if.c~ 2007-02-23 22:08:40.0 -0800 +++ linux/drivers/media/video/pwc/pwc-if.c 2007-03-04 22:42:43.0 -0800 @@ -1547,6 +1547,10 @@ static int usb_pwc_probe(struct usb_inte features |= FEATURE_MOTOR_PANTILT; break; case 0x08b6: + PWC_INFO("Logitech/Cisco VT Camera webcam detected.\n"); + name = "Cisco VT Camera"; + type_id = 740; /* CCD sensor */ + break; case 0x08b7: case 0x08b8: PWC_INFO("Logitech QuickCam detected (reserved ID).\n"); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
Alan Cox <[EMAIL PROTECTED]> writes: >> stuff is happening by comparing page->count and page->_mapcount, but it >> certainly wouldn't be conclusive. But, does this kind of nonsense even >> happen in practice? > > "Is it useful for me as a bad guy to make it happen ?" To create a DOS attack. - Allocate some memory you know your victim will want in the future, (shared libraries and the like). - Wait until your victim is using the memory you allocated. - Terminate your memory resource group. - Victim is pushed over memory limits by your exiting. - Victim can no longer allocate memory - Victim dies It's not quite that easy unless your victim calls mlockall(MCL_FUTURE), but the potential is clearly there. Am I missing something? Or is this fundamental to any first touch scenario? I just know I have problems with first touch because it is darn hard to reason about. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sky2 PHY setup
Hello Stephen, > yesterday I pulled from Linus tree because I saw the sky2 updated and I > tried to break it but it seems that my problems are gone. I let you know > if anything pops up in the future. bad news. I today tried the sky2 driver which is in Linus Kernel Tree (HEAD) on a machine with very high network load and it stopped working without any kernel messages after doing a flawless job under high load for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-( Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH take3 00/20] Make common x86 arch area for i386 and x86_64 - Take 3
On Thu, 2007-03-15 at 01:13 -0400, Steven Rostedt wrote: > Once again here's an attempt to put the shared files of x86_64 and i386 > into a separate directory. OK, that's fine, but the next step is to have "make ARCH=x86" compile, with a config option as to whether to build 32 or 64 bit. This will involve a fair amount of Makefile hair, but if you can get Andi to buy into that then the rest is a simple matter of code churn. For most kernel hackers, this would be the flag day. Moving the rest of the files across to xxx_32.c, xxx_64.h etc is going to involve a great deal of untangling and code cleanup. It's also going to completely screw a whole heap of my cleanup patches. Oh well. (Still hoping for an executive summary from the PPC folks). Cheers! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [REPOST] x86_64, i386: Add command line length to boot protocol
Bernhard Walle wrote: Because the command line is increased to 2048 characters after 2.6.21, it's not possible for boot loaders and userspace tools to determine the length of the command line the kernel can understand. The benefit of knowing the length is that users can be warned if the command line size is too long which prevents surprise if things don't work after bootup. This patch updates the boot protocol to contain a field called "cmdline_size" that contain the length of the command line (excluding the terminating zero). The patch also adds missing fields (of protocol version 2.05) to the x86_64 setup code. Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]> Cc: Alon Bar-Lev <[EMAIL PROTECTED]> Acked-by: H. Peter Anvin <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 13/13] signal/timer/event fds v6 - KAIO eventfd support example ...
This is an example about how to add eventfd support to the current KAIO code, in order to enable KAIO to post readiness events to a pollable fd (hence compatible with POSIX select/poll). The KAIO code simply signals the eventfd fd when events are ready, and this triggers a POLLIN in the fd. This patch uses a reserved for future use member of the struct iocb to pass an eventfd file descriptor, that KAIO will use to post events every time a request completes. At that point, an aio_getevents() will return the completed result to a struct io_event. I made a quick test program to verify the patch, and it runs fine here: http://www.xmailserver.org/eventfd-aio-test.c The test program uses poll(2), but it'd, of course, work with select and epoll too. This can allow to schedule both block I/O and other poll-able devices requests, and wait for results using select/poll/epoll. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/aio.c === --- linux-2.6.21-rc3.quilt.orig/fs/aio.c2007-03-15 15:52:45.0 -0700 +++ linux-2.6.21-rc3.quilt/fs/aio.c 2007-03-15 17:15:20.0 -0700 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -421,6 +422,7 @@ req->private = NULL; req->ki_iovec = NULL; INIT_LIST_HEAD(&req->ki_run_list); + req->ki_eventfd = ERR_PTR(-EINVAL); /* Check if the completion queue has enough free space to * accept an event from this io. @@ -462,6 +464,8 @@ { assert_spin_locked(&ctx->ctx_lock); + if (!IS_ERR(req->ki_eventfd)) + fput(req->ki_eventfd); if (req->ki_dtor) req->ki_dtor(req); if (req->ki_iovec != &req->ki_inline_vec) @@ -946,6 +950,14 @@ return 1; } + /* +* Check if the user asked us to deliver the result through an +* eventfd. The eventfd_signal() function is safe to be called +* from IRQ context. +*/ + if (unlikely(!IS_ERR(iocb->ki_eventfd))) + eventfd_signal(iocb->ki_eventfd, 1); + info = &ctx->ring_info; /* add a completion event to the ring buffer. @@ -1555,6 +1567,19 @@ fput(file); return -EAGAIN; } + if (iocb->aio_resfd != 0) { + /* +* If the aio_resfd field of the iocb is not zero, get an +* instance of the file* now. The file descriptor must be +* an eventfd() fd, and will be signaled for each completed +* event using the eventfd_signal() function. +*/ + req->ki_eventfd = eventfd_fget((int) iocb->aio_resfd); + if (IS_ERR(req->ki_eventfd)) { + ret = PTR_ERR(req->ki_eventfd); + goto out_put_req; + } + } req->ki_filp = file; ret = put_user(req->ki_key, &user_iocb->aio_key); Index: linux-2.6.21-rc3.quilt/include/linux/aio.h === --- linux-2.6.21-rc3.quilt.orig/include/linux/aio.h 2007-03-15 15:52:45.0 -0700 +++ linux-2.6.21-rc3.quilt/include/linux/aio.h 2007-03-15 16:13:45.0 -0700 @@ -119,6 +119,12 @@ struct list_headki_list;/* the aio core uses this * for cancellation */ + + /* +* If the aio_resfd field of the userspace iocb is not zero, +* this is the underlying file* to deliver event to. +*/ + struct file *ki_eventfd; }; #define is_sync_kiocb(iocb)((iocb)->ki_key == KIOCB_SYNC_KEY) Index: linux-2.6.21-rc3.quilt/include/linux/aio_abi.h === --- linux-2.6.21-rc3.quilt.orig/include/linux/aio_abi.h 2007-03-15 15:52:45.0 -0700 +++ linux-2.6.21-rc3.quilt/include/linux/aio_abi.h 2007-03-15 16:13:45.0 -0700 @@ -84,7 +84,11 @@ /* extra parameters */ __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */ - __u64 aio_reserved3; + __u32 aio_reserved3; + /* +* If different from 0, this is an eventfd to deliver AIO results to +*/ + __u32 aio_resfd; }; /* 64 bytes */ #undef IFBIG - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 9/13] signal/timer/event fds v6 - timerfd compat code ...
This patch implement the necessary compat code for the timerfd system call. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/compat.c === --- linux-2.6.21-rc3.quilt.orig/fs/compat.c 2007-03-15 15:53:11.0 -0700 +++ linux-2.6.21-rc3.quilt/fs/compat.c 2007-03-15 16:11:52.0 -0700 @@ -2257,3 +2257,23 @@ return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); } + +asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, + const struct compat_itimerspec __user *utmr) +{ + long res; + struct itimerspec t; + struct itimerspec __user *ut; + + res = -EFAULT; + if (get_compat_itimerspec(&t, utmr)) + goto err_exit; + ut = compat_alloc_user_space(sizeof(*ut)); + if (copy_to_user(ut, &t, sizeof(t)) ) + goto err_exit; + + res = sys_timerfd(ufd, clockid, flags, ut); +err_exit: + return res; +} + Index: linux-2.6.21-rc3.quilt/include/linux/compat.h === --- linux-2.6.21-rc3.quilt.orig/include/linux/compat.h 2007-03-15 15:53:11.0 -0700 +++ linux-2.6.21-rc3.quilt/include/linux/compat.h 2007-03-15 16:11:52.0 -0700 @@ -225,6 +225,11 @@ return lhs->tv_nsec - rhs->tv_nsec; } +extern int get_compat_itimerspec(struct itimerspec *dst, +const struct compat_itimerspec __user *src); +extern int put_compat_itimerspec(struct compat_itimerspec __user *dst, +const struct itimerspec *src); + asmlinkage long compat_sys_adjtimex(struct compat_timex __user *utp); extern int compat_printk(const char *fmt, ...); Index: linux-2.6.21-rc3.quilt/kernel/compat.c === --- linux-2.6.21-rc3.quilt.orig/kernel/compat.c 2007-03-15 15:53:11.0 -0700 +++ linux-2.6.21-rc3.quilt/kernel/compat.c 2007-03-15 16:11:52.0 -0700 @@ -475,8 +475,8 @@ return min_length; } -static int get_compat_itimerspec(struct itimerspec *dst, -struct compat_itimerspec __user *src) +int get_compat_itimerspec(struct itimerspec *dst, + const struct compat_itimerspec __user *src) { if (get_compat_timespec(&dst->it_interval, &src->it_interval) || get_compat_timespec(&dst->it_value, &src->it_value)) @@ -484,8 +484,8 @@ return 0; } -static int put_compat_itimerspec(struct compat_itimerspec __user *dst, -struct itimerspec *src) +int put_compat_itimerspec(struct compat_itimerspec __user *dst, + const struct itimerspec *src) { if (put_compat_timespec(&src->it_interval, &dst->it_interval) || put_compat_timespec(&src->it_value, &dst->it_value)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 8/13] signal/timer/event fds v6 - timerfd wire up x86_64 arch ...
This patch wire the timerfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 15:53:13.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 16:11:50.0 -0700 @@ -720,4 +720,5 @@ .quad sys_getcpu .quad sys_epoll_pwait .quad sys_signalfd /* 320 */ + .quad sys_timerfd ia32_syscall_end: Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-15 15:53:13.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-15 16:11:50.0 -0700 @@ -621,8 +621,10 @@ __SYSCALL(__NR_move_pages, sys_move_pages) #define __NR_signalfd 280 __SYSCALL(__NR_signalfd, sys_signalfd) +#define __NR_timerfd 281 +__SYSCALL(__NR_timerfd, sys_timerfd) -#define __NR_syscall_max __NR_signalfd +#define __NR_syscall_max __NR_timerfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 12/13] signal/timer/event fds v6 - eventfd wire up x86_64 arch ...
This patch wire the eventfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 16:11:50.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 16:13:43.0 -0700 @@ -721,4 +721,5 @@ .quad sys_epoll_pwait .quad sys_signalfd /* 320 */ .quad sys_timerfd + .quad sys_eventfd ia32_syscall_end: Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-15 16:11:50.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-15 16:13:43.0 -0700 @@ -623,8 +623,10 @@ __SYSCALL(__NR_signalfd, sys_signalfd) #define __NR_timerfd 281 __SYSCALL(__NR_timerfd, sys_timerfd) +#define __NR_eventfd 282 +__SYSCALL(__NR_eventfd, sys_eventfd) -#define __NR_syscall_max __NR_timerfd +#define __NR_syscall_max __NR_eventfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 11/13] signal/timer/event fds v6 - eventfd wire up i386 arch ...
This patch wire the eventfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S === --- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S 2007-03-15 16:11:47.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 16:13:40.0 -0700 @@ -321,3 +321,4 @@ .long sys_epoll_pwait .long sys_signalfd /* 320 */ .long sys_timerfd + .long sys_eventfd Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h 2007-03-15 16:11:47.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 16:13:40.0 -0700 @@ -327,10 +327,11 @@ #define __NR_epoll_pwait 319 #define __NR_signalfd 320 #define __NR_timerfd 321 +#define __NR_eventfd 322 #ifdef __KERNEL__ -#define NR_syscalls 322 +#define NR_syscalls 323 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thread stacks and strict vm overcommit accounting
> > > With a typical size as a fuzz factor preaccounted in later kernels. > > > > Where's that done? > > I don't know what Alan is referring to there. fs/exec.c - we add 20 pages to the stack vma size initially. > We've no more committed to providing each instance with 8MB of stack, > than we've committed to providing each instance with RLIMIT_AS of > address space. The rlimits are limits, not commitments, surely? Yes, its just that the C programming language is utterly and mindbogglingly broken when it comes to resource exhaustion for the stack. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 7/13] signal/timer/event fds v6 - timerfd wire up i386 arch ...
This patch wire the timerfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S === --- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S 2007-03-15 15:53:15.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 16:11:47.0 -0700 @@ -320,3 +320,4 @@ .long sys_getcpu .long sys_epoll_pwait .long sys_signalfd /* 320 */ + .long sys_timerfd Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h 2007-03-15 15:53:15.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 16:11:47.0 -0700 @@ -326,10 +326,11 @@ #define __NR_getcpu318 #define __NR_epoll_pwait 319 #define __NR_signalfd 320 +#define __NR_timerfd 321 #ifdef __KERNEL__ -#define NR_syscalls 321 +#define NR_syscalls 322 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/13] signal/timer/event fds v6 - signalfd wire up i386 arch ...
This patch wire the signalfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S === --- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 15:34:12.0 -0700 @@ -319,3 +319,4 @@ .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_signalfd /* 320 */ Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 15:34:12.0 -0700 @@ -325,10 +325,11 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_signalfd 320 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 321 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/13] signal/timer/event fds v6 - eventfd core ...
This is a very simple and light file descriptor, that can be used as event wait/dispatch by userspace (both wait and dispatch) and by the kernel (dispatch only). It can be used instead of pipe(2) in all cases where those would simply be used to signal events. Their kernel overhead is much lower than pipes, and they do not consume two fds. When used in the kernel, it can offer an fd-bridge to enable, for example, functionalities like KAIO or syslets/threadlets to signal to an fd the completion of certain operations. But more in general, an eventfd can be used by the kernel to signal readiness, in a POSIX poll/select way, of interfaces that would otherwise be incompatible with it. The API is: int eventfd(unsigned int count); The eventfd API accepts an initial "count" parameter, and returns an eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2). The POLLIN flag is raised when the internal counter is greater than zero. The POLLOUT flag is raised when at least a value of "1" can be written to the internal counter. The POLLERR flag is raised when an overflow in the counter value is detected. The write(2) operation can never overflow the counter, since it blocks (unless O_NONBLOCK is set, in which case -EAGAIN is returned). But the eventfd_signal() function can do it, since it's supposed to not sleep during its operation. The read(2) function reads the __u64 counter value, and reset the internal value to zero. If the value read is equal to (__u64) -1, an overflow happened on the internal counter (due to 2^64 eventfd_signal() posts that has never been retired - unlickely, but possible). The write(2) call writes an __u64 count value, and adds it to the current counter. The eventfd fd supports O_NONBLOCK also. On the kernel side, we have: struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, unsigned int n); The eventfd_fget() should be called to get a struct file* from an eventfd fd (this is an fget() + check of f_op being an eventfd fops pointer). The kernel can then call eventfd_signal() every time it wants to post an event to userspace. The eventfd_signal() function can be called from any context. An eventfd() simple test and bench is available here: http://www.xmailserver.org/eventfd-bench.c This is the eventfd-based version of pipetest-4 (pipe(2) based): http://www.xmailserver.org/pipetest-4.c Not that performance matters much in the eventfd case, but eventfd-bench shows almost as double as performance than pipetest-4. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/Makefile === --- linux-2.6.21-rc3.quilt.orig/fs/Makefile 2007-03-15 15:53:07.0 -0700 +++ linux-2.6.21-rc3.quilt/fs/Makefile 2007-03-15 16:11:54.0 -0700 @@ -11,7 +11,7 @@ attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o drop_caches.o splice.o sync.o utimes.o \ - stack.o anon_inodes.o signalfd.o timerfd.o + stack.o anon_inodes.o signalfd.o timerfd.o eventfd.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o Index: linux-2.6.21-rc3.quilt/include/linux/syscalls.h === --- linux-2.6.21-rc3.quilt.orig/include/linux/syscalls.h2007-03-15 15:53:07.0 -0700 +++ linux-2.6.21-rc3.quilt/include/linux/syscalls.h 2007-03-15 16:11:54.0 -0700 @@ -605,6 +605,7 @@ asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask); asmlinkage long sys_timerfd(int ufd, int clockid, int flags, const struct itimerspec __user *utmr); +asmlinkage long sys_eventfd(unsigned int count); int kernel_execve(const char *filename, char *const argv[], char *const envp[]); Index: linux-2.6.21-rc3.quilt/fs/eventfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.21-rc3.quilt/fs/eventfd.c 2007-03-15 16:11:54.0 -0700 @@ -0,0 +1,271 @@ +/* + * fs/eventfd.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct eventfd_ctx { + spinlock_t lock; + wait_queue_head_t wqh; + __u64 count; +}; + + +static void eventfd_cleanup(struct eventfd_ctx *ctx); +static int eventfd_close(struct inode *inode, struct file *file); +static unsigned int eventfd_poll(struct file *file, poll_table *wait); +static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos); +static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, +
[patch 5/13] signal/timer/event fds v6 - signalfd compat code ...
This patch implement the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/compat.c === --- linux-2.6.21-rc3.quilt.orig/fs/compat.c 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.21-rc3.quilt/fs/compat.c 2007-03-15 15:35:58.0 -0700 @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -2235,3 +2236,24 @@ return sys_ni_syscall(); } #endif + +asmlinkage long compat_sys_signalfd(int ufd, + const compat_sigset_t __user *sigmask, + compat_size_t sigsetsize) +{ + compat_sigset_t ss32; + sigset_t tmp; + sigset_t __user *ksigmask; + + if (sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (copy_from_user(&ss32, sigmask, sizeof(ss32))) + return -EFAULT; + sigset_from_compat(&tmp, &ss32); + ksigmask = compat_alloc_user_space(sizeof(sigset_t)); + if (copy_to_user(ksigmask, &tmp, sizeof(sigset_t))) + return -EFAULT; + + return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); +} + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/13] signal/timer/event fds v6 - anonymous inode source ...
This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/anon_inodes.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.21-rc3.quilt/fs/anon_inodes.c 2007-03-15 15:32:33.0 -0700 @@ -0,0 +1,203 @@ +/* + * fs/anon_inodes.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +static int ainofs_delete_dentry(struct dentry *dentry); +static struct inode *aino_getinode(void); +static struct inode *aino_mkinode(void); +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt); + + + +static struct vfsmount *aino_mnt __read_mostly; +static struct inode *aino_inode; +static struct file_operations aino_fops = { }; +static struct file_system_type aino_fs_type = { + .name = "ainofs", + .get_sb = ainofs_get_sb, + .kill_sb= kill_anon_super, +}; +static struct dentry_operations ainofs_dentry_operations = { + .d_delete = ainofs_delete_dentry, +}; + + + +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile, + char const *name, const struct file_operations *fops, void *priv) +{ + struct qstr this; + struct dentry *dentry; + struct inode *inode; + struct file *file; + int error, fd; + + error = -ENFILE; + file = get_empty_filp(); + if (!file) + goto eexit_1; + + inode = aino_getinode(); + if (IS_ERR(inode)) { + error = PTR_ERR(inode); + goto eexit_2; + } + + error = get_unused_fd(); + if (error < 0) + goto eexit_3; + fd = error; + + /* +* Link the inode to a directory entry by creating a unique name +* using the inode sequence number. +*/ + error = -ENOMEM; + this.name = name; + this.len = strlen(name); + this.hash = 0; + dentry = d_alloc(aino_mnt->mnt_sb->s_root, &this); + if (!dentry) + goto eexit_4; + dentry->d_op = &ainofs_dentry_operations; + /* Do not publish this dentry inside the global dentry hash table */ + dentry->d_flags &= ~DCACHE_UNHASHED; + d_instantiate(dentry, inode); + + file->f_path.mnt = mntget(aino_mnt); + file->f_path.dentry = dentry; + file->f_mapping = inode->i_mapping; + + file->f_pos = 0; + file->f_flags = O_RDWR; + file->f_op = fops; + file->f_mode = FMODE_READ | FMODE_WRITE; + file->f_version = 0; + file->private_data = priv; + + fd_install(fd, file); + + *pfd = fd; + *pinode = inode; + *pfile = file; + return 0; + +eexit_4: + put_unused_fd(fd); +eexit_3: + iput(inode); +eexit_2: + put_filp(file); +eexit_1: + return error; +} + + +static int ainofs_delete_dentry(struct dentry *dentry) +{ + /* +* We faked vfs to believe the dentry was hashed when we created it. +* Now we restore the flag so that dput() will work correctly. +*/ + dentry->d_flags |= DCACHE_UNHASHED; + return 1; +} + + +static struct inode *aino_getinode(void) +{ + return igrab(aino_inode); +} + + +/* + * A single inode exist for all aino files. On the contrary of pipes, + * aino inodes has no per-instance data associated, so we can avoid + * the allocation of multiple of them. + */ +static struct inode *aino_mkinode(void) +{ + int error = -ENOMEM; + struct inode *inode = new_inode(aino_mnt->mnt_sb); + + if (!inode) + goto eexit_1; + + inode->i_fop = &aino_fops; + + /* +* Mark the inode dirty from the very beginning, +* that way it will never be moved to the dirty +* list because mark_inode_dirty() will think +* that it already _is_ on the dirty list. +*/ + inode->i_state = I_DIRTY; + inode->i_mode = S_IRUSR | S_IWUSR; + inode->i_uid = current->fsuid; + inode->i_gid = current->fsgid; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + return inode; + +eexit_1: + return ERR_PTR(error); +} + + +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt) +{ + return get_sb_pseudo(fs_type, "a
[patch 2/13] signal/timer/event fds v6 - signalfd core ...
This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo;/* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint;/* si_int */ __u64 svptr;/* si_ptr */ __u64 utime;/* si_utime */ __u64 stime;/* si_stime */ __u64 addr; /* si_addr */ }; Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/signalfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.21-rc3.quilt/fs/signalfd.c2007-03-15 15:33:52.0 -0700 @@ -0,0 +1,381 @@ +/* + * fs/signalfd.c + * + * Copyright (C) 2003 Linus Torvalds + * + * Mon Mar 5, 2007: Davide Libenzi + * Changed ->read() to return a siginfo strcture instead of signal number. + * Fixed locking in ->poll(). + * Added sighand-detach notification. + * Added fd re-use in sys_signalfd() syscall. + * Now using anonymous inode source. + * Thanks to Oleg Nesterov for useful code review and suggestions. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct signalfd_ctx { + struct list_head lnk; + wait_queue_head_t wqh; + sigset_t sigmask; + struct task_struct *tsk; +}; + + + +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx, + unsigned long *flags); +static void signalfd_put_sighand(struct signalfd_ctx *ctx, +struct sighand_struct *sighand, +unsigned long *flags); +static void signalfd_cleanup(struct signalfd_ctx *ctx); +static int signalfd_close(struct inode *inode, struct file *file); +static unsigned int signalfd_poll(struct file *file, poll_table *wait); +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, +siginfo_t const *kinfo); +static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, +loff_t *ppos); + + + +static const struct file_operations signalfd_fops = { + .release= signalfd_close, + .poll = signalfd_poll, + .read = signalfd_read, +}; +static struct kmem_cache *signalfd_ctx_cachep; + + + +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx, + unsigned long *flags) +{ + struct sighand_struct *sighand; + + rcu_read_lock(); + sighand = lock_task_sighand(ctx->tsk, flags); + rcu_read_unlock(); + + if (sighand && list_empty(&ct
[patch 6/13] signal/timer/event fds v6 - timerfd core ...
This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time specified in the "utmr->it_value" parameter is the expiry time for the timer. If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time, otherwise it's a relative time. If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0, tv_nsec == 0), this is the period at which the following ticks should be generated. The "utmr->it_interval" should be set to zero if only one tick is requested. Setting the "utmr->it_value" to zero will disable the timer, or will create a timerfd without the timer enabled. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/timerfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.21-rc3.quilt/fs/timerfd.c 2007-03-15 16:08:05.0 -0700 @@ -0,0 +1,257 @@ +/* + * fs/timerfd.c + * + * Copyright (C) 2007 Davide Libenzi + * + * + * Thanks to Thomas Gleixner for code reviews and useful comments. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct timerfd_ctx { + struct hrtimer tmr; + ktime_t texp, tintv; + spinlock_t lock; + wait_queue_head_t wqh; + unsigned long ticks; +}; + + +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr); +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags, + const struct itimerspec *ktmr); +static int timerfd_close(struct inode *inode, struct file *file); +static unsigned int timerfd_poll(struct file *file, poll_table *wait); +static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos); + + + +static const struct file_operations timerfd_fops = { + .release= timerfd_close, + .poll = timerfd_poll, + .read = timerfd_read, +}; +static struct kmem_cache *timerfd_ctx_cachep; + + + +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr) +{ + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr); + enum hrtimer_restart rval = HRTIMER_NORESTART; + unsigned long flags; + + spin_lock_irqsave(&ctx->lock, flags); + ctx->ticks++; + wake_up_locked(&ctx->wqh); + if (ctx->tintv.tv64 != 0) { + hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx->tintv); + rval = HRTIMER_RESTART; + } + spin_unlock_irqrestore(&ctx->lock, flags); + + return rval; +} + + +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags, + const struct itimerspec *ktmr) +{ + enum hrtimer_mode htmode; + + htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: HRTIMER_MODE_REL; + + ctx->ticks = 0; + ctx->texp = timespec_to_ktime(ktmr->it_value); + ctx->tintv = timespec_to_ktime(ktmr->it_interval); + hrtimer_init(&ctx->tmr, clockid, htmode); + ctx->tmr.expires = ctx->texp; + ctx->tmr.function = timerfd_tmrproc; + if (ctx->texp.tv64 != 0) + hrtimer_start(&ctx->tmr, ctx->texp, htmode); +} + + +asmlinkage long sys_timerfd(int ufd, int clockid, int flags, + const struct itimerspec __user *utmr) +{ + int error; + struct timerfd_ctx *ctx; + struct file *file; + struct inode *inode; + struct itimerspec ktmr; + + if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) + return -EFAU
[patch 4/13] signal/timer/event fds v6 - signalfd wire up x86_64 arch ...
This patch wire the signalfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-15 15:34:29.0 -0700 @@ -619,8 +619,10 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_signalfd 280 +__SYSCALL(__NR_signalfd, sys_signalfd) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_signalfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 15:19:20.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 15:35:35.0 -0700 @@ -714,9 +714,10 @@ .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad sys_signalfd /* 320 */ ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Thu, Mar 15, 2007 at 01:42:39PM +, Ian Campbell wrote: > On Thu, 2007-03-15 at 18:56 +0530, Vivek Goyal wrote: > > > > Ideal place for this probably should have been arch dependent > > crash_dump.h file. But we don't have one and no point introducing one > > just for this macro. > > Agreed. > > > This change looks good to me. > > Is there a kdump tree which you'll apply to or shall I resend CCing > apkm? (I'll add an Acked-by if that's ok). There isn't a kexec tree at this time (though I am happy to entertain creating one). For now most patches go in either through Andrew or the relevant architecture maintainers. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps
On Thu, Mar 15, 2007 at 06:56:16PM +0530, Vivek Goyal wrote: > On Thu, Mar 15, 2007 at 12:22:57PM +, Ian Campbell wrote: > > On Thu, 2007-03-15 at 11:17 +0530, Vivek Goyal wrote: > > > > > But I think changing this macro might run into issues. It is > > > > > being used at few places in kernel, for example while loading > > > > > module. This will essentially mean that we allow loading 64bit > > > > > x86_64 modules on 32bit i386 systems? > > > > Yes, not sure how I missed that fact... > > > > > Kexec will also not allow loading an x86_64 kernel on a 32bit machine. > > > > For crash kernel only or for regular kexec too? > > > > I think for both. One of the possible reasons I think is that one never > knows is underlying machine has got 64bit extensions or not. So even if > we load the kernel it will never boot. Secondly, we might not be able to > handle 64bit address in 32bit kernel/user space? Perhaps I am miss-understanding what you are saying, but I do recally kexecing from 32->64 and 64->32 bit kernels on x86_64 hardware. I can run these checks again if it helps. > > > So how about something like vmcore_elf_allowed_cross_arch()? Vmcore > > > code can continue to check elf_check_arch() and if that fails it can > > > invoke vmcore_elf_allowed_cross_arch() to find out what cross arch are > > > allowed for vmcore. > > > > Something like this? > > > > Ian. > > > > --- > > > > Allow i386 crash kernels to handle x86_64 dumps. > > > > The specific case I am encountering is kdump under Xen with a 64 bit > > hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit > > due to the hypervisor but the dump kernel is 32 bit in for maximum > > compatibility. > > > > It's possibly less likely to be useful in a purely native scenario but > > I see no reason to disallow it. > > > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> > > > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > > index d960507..523e109 100644 > > --- a/fs/proc/vmcore.c > > +++ b/fs/proc/vmcore.c > > @@ -514,7 +514,7 @@ static int __init parse_crash_elf64_headers(void) > > /* Do some basic Verification. */ > > if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 || > > (ehdr.e_type != ET_CORE) || > > - !elf_check_arch(&ehdr) || > > + !vmcore_elf_check_arch(&ehdr) || > > ehdr.e_ident[EI_CLASS] != ELFCLASS64 || > > ehdr.e_ident[EI_VERSION] != EV_CURRENT || > > ehdr.e_version != EV_CURRENT || > > diff --git a/include/asm-i386/kexec.h b/include/asm-i386/kexec.h > > index 4dfc9f5..c76737e 100644 > > --- a/include/asm-i386/kexec.h > > +++ b/include/asm-i386/kexec.h > > @@ -47,6 +47,9 @@ > > /* The native architecture */ > > #define KEXEC_ARCH KEXEC_ARCH_386 > > > > +/* We can also handle crash dumps from 64 bit kernel. */ > > +#define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) > > + > > Ideal place for this probably should have been arch dependent crash_dump.h > file. But we don't have one and no point introducing one just for this > macro. > > This change looks good to me. Won't the above change break non i386 archtectures as vmcore_elf_check_arch_cross isn't defined for them? -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] swsusp: Do not use page flags
On Thursday, 15 March 2007 23:23, Andrew Morton wrote: > On Thu, 15 Mar 2007 23:19:02 +0100 (CET) > Jiri Kosina <[EMAIL PROTECTED]> wrote: > > > On Thu, 15 Mar 2007, Andrew Morton wrote: > > > > > > > And why _does_ suspend use GFP_ATOMIC all over the place? > > > > Generally, because it cannot sleep. > > > Why not? > > > > I guess it's simply beucase of kswapd being already frozen, so there is no > > chance that once GFP_KERNEL allocation goes to sleep, it is going to get > > any free pages eventually ... ? > > No, things should run fine with a dead kswapd. > > There are reasons why we can't call into filesystems from there, but > GFP_NOIO will ensure that and it is heaps better than GFP_ATOMIC. In fact the role of swsusp_shrink_memory() is to ensure that our subsequent atomic allocations won't fail. Still, the particular allocations in create_basic_memory_bitmaps() are made before we call swsusp_shrink_memory(), so it's better to use GFP_NOIO in there. I'll prepare a patch for that on top of the current series. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thread stacks and strict vm overcommit accounting
On Thu, Mar 15, 2007 at 03:36:13PM -0700, Andrew Morton wrote: > > > > > > Is this the intended behaviour? > > > > > > > > That sounds like a bug to me. > > > > > > I'm suspecting it's an oddity rather than a bug. > > > > It is intended behaviour. > > Each instance of > > main() > { > sleep(100); > } > > appears to increase Committed_AS by around 200kb. But we've committed to > providing it with 8MB for stack. > > How come this is correct? Perhaps it makes a lot of sense if you regard stack growth at the same sense that you regard heap growth by the means of brk(). Just by the fact that the stack is limited on default and RLIMIT_DATA is unlimited, doesn't mean the we need to account for the maximum stack size. Perhaps for embedded systems where you want to have overcommit_memory=2 overcommit_ratio=100 and no swap (for design constraints), just to make sure that allocations fail *always before* OOM gets triggered (and therefore OOM never gets triggered, thankfully), it would have been useful to look at Commited_AS to realize how much the system is close to the maximum memory utilization potential. Learning about this 'oddity' in Commited_AS, I'd guess it would be better for me not to rely on it for measurements and perhaps tweak smaller values of RSS_STACK for processes on that embedded system. -- Dan Aloni XIV LTD, http://www.xivstorage.com da-x (at) monatomic.org, dan (at) xiv.co.il - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [REPOST] x86_64, i386: Add command line length to boot protocol
Alon Bar-Lev wrote: Hello, I really don' t understand why you insist that the boot protocol =2.02 had 255 limit! Please remove this from the description. You want to add size, that's OK, but please don't mess with previous definitions. Boot protocol 2.02 introduced the null terminated string truncated by kernel, which can be at any size. Well, except for a very brief window, the limit *was* 255. If the boot loader wants to verify nontruncation, this is a valid concern. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
On Thu, Mar 15, 2007 at 06:15:45PM -0500, Dave Kleikamp wrote: > On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote: > > On Thu, Mar 15, 2007 at 05:44:01PM +, Hugh Dickins wrote: > > > who removed the !offset condition, he should be consulted on its > > > reintroduction. > > > > the !offset check looks a pretty broken heuristic indeed, it would > > break random I/O. > > I wouldn't call it broken. At worst, I'd say it's imperfect. But > that's the nature of a heuristic. It most likely works in a huge > majority of cases. well, IMHO in the huge majority of cases the prev_page check isn't necessary in the first place (and IMHO it hurts a lot more than it can help, as demonstrated by specweb, since we'll bite on the good guys to help the bad guys). The only case where I can imagine the prev_page to make sense is to handle contiguous I/O made with a small buffer, so clearly an inefficient code in the first place. But if this guy is reading with http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Simplify smp_call_function*() by using common implementation
Andrew Morton wrote: > Hopeless, sorry. It's probably time to start thinking about raising x86 > patches against the x86 tree (at least). > How's this? J Subject: Simplify smp_call_function*() by using common implementation smp_call_function and smp_call_function_single are almost complete duplicates of the same logic. This patch combines them by implementing them in terms of the more general smp_call_function_mask(). [ Jan, Andi: This only changes arch/i386; can x86_64 be changed in the same way? ] [ Rebased onto Jan's x86_64-mm-consolidate-smp_send_stop patch ] Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Jan Beulich <[EMAIL PROTECTED]> Cc: Stephane Eranian <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: "Randy.Dunlap" <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> --- arch/i386/kernel/smp.c | 177 +++- 1 file changed, 86 insertions(+), 91 deletions(-) === --- a/arch/i386/kernel/smp.c +++ b/arch/i386/kernel/smp.c @@ -515,14 +515,26 @@ void unlock_ipi_call_lock(void) static struct call_data_struct *call_data; -static void __smp_call_function(void (*func) (void *info), void *info, - int nonatomic, int wait) + +static int __smp_call_function_mask(cpumask_t mask, + void (*func)(void *), void *info, + int wait) { struct call_data_struct data; - int cpus = num_online_cpus() - 1; + cpumask_t allbutself; + int cpus; + + /* Can deadlock when called with interrupts disabled */ + WARN_ON(irqs_disabled()); + + allbutself = cpu_online_map; + cpu_clear(smp_processor_id(), allbutself); + + cpus_and(mask, mask, allbutself); + cpus = cpus_weight(mask); if (!cpus) - return; + return 0; data.func = func; data.info = info; @@ -533,9 +545,12 @@ static void __smp_call_function(void (*f call_data = &data; mb(); - - /* Send a message to all other CPUs and wait for them to respond */ - send_IPI_allbutself(CALL_FUNCTION_VECTOR); + + /* Send a message to other CPUs */ + if (cpus_equal(mask, allbutself)) + send_IPI_allbutself(CALL_FUNCTION_VECTOR); + else + send_IPI_mask(mask, CALL_FUNCTION_VECTOR); /* Wait for response */ while (atomic_read(&data.started) != cpus) @@ -544,6 +559,34 @@ static void __smp_call_function(void (*f if (wait) while (atomic_read(&data.finished) != cpus) cpu_relax(); + + return 0; +} + +/** + * smp_call_function_mask(): Run a function on a set of other CPUs. + * @mask: The set of cpus to run on. Must not include the current cpu. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait (atomically) until function has completed on other CPUs. + * + * Returns 0 on success, else a negative status code. Does not return until + * remote CPUs are nearly ready to execute <> or are or have finished. + * + * You must not call this function with disabled interrupts or from a + * hardware interrupt handler or from a bottom half handler. + */ +int smp_call_function_mask(cpumask_t mask, +void (*func)(void *), void *info, +int wait) +{ + int ret; + + spin_lock(&call_lock); + ret = __smp_call_function_mask(mask, func, info, wait); + spin_unlock(&call_lock); + + return ret; } /** @@ -559,20 +602,43 @@ static void __smp_call_function(void (*f * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. */ -int smp_call_function (void (*func) (void *info), void *info, int nonatomic, - int wait) -{ - /* Can deadlock when called with interrupts disabled */ - WARN_ON(irqs_disabled()); - - /* Holding any lock stops cpus from going down. */ - spin_lock(&call_lock); - __smp_call_function(func, info, nonatomic, wait); - spin_unlock(&call_lock); - - return 0; +int smp_call_function(void (*func) (void *info), void *info, int nonatomic, + int wait) +{ + return smp_call_function_mask(cpu_online_map, func, info, wait); } EXPORT_SYMBOL(smp_call_function); + +/* + * smp_call_function_single - Run a function on another CPU + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @nonatomic: Currently unused. + * @wait: If true, wait until function has completed on other CPUs. + * + * Retrurns 0 on success, else a negative status code. + * + * Does not retur
Re: [PATCH 10/22 take 3] UBI: EBA unit
On Thu, Mar 15, 2007 at 02:24:10PM -0700, Randy Dunlap wrote: > On Thu, 15 Mar 2007 11:07:03 -0800 Andrew Morton wrote: > > > > > There's way too much code here to expect it to get decently reviewed, alas. > > Yes. > > /me repeats wish that Not Everything Should Be Sent to lkml. :( Just curious, but where would you suggest this be sent to for review then? josh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix cyclades.h for x86_64 (and probably others)
On Thu, Mar 15, 2007 at 11:07:08AM -0800, Andrew Morton wrote: > Looks OK, thanks. > > It would be nice as a followup patch to simply remove ucchar, uclong and > all that gunk altogether from that driver and just use u8, u16 etc. > > But if you decide to do that, please fix your email client first - it is > replacing tabs with spaces. Something like this? Applies & compiles Ok on 2.6.20. I don't have access to the hardware right now, but am pretty sure that the result is the same. BTW, it was a copy-paste which made the spaces ;) Regards, Klaus --- include/linux/cyclades.h.orig 2007-03-15 23:46:00.0 +0100 +++ include/linux/cyclades.h2007-03-15 23:14:26.0 +0100 @@ -67,6 +67,8 @@ #ifndef _LINUX_CYCLADES_H #define _LINUX_CYCLADES_H +#include + struct cyclades_monitor { unsigned long int_count; unsigned long char_count; @@ -149,15 +151,6 @@ * architectures and compilers. */ -#if defined(__alpha__) -typedef unsigned long ucdouble; /* 64 bits, unsigned */ -typedef unsigned int uclong; /* 32 bits, unsigned */ -#else -typedef unsigned long uclong; /* 32 bits, unsigned */ -#endif -typedef unsigned short ucshort;/* 16 bits, unsigned */ -typedef unsigned char ucchar; /* 8 bits, unsigned */ - /* * Memory Window Sizes */ @@ -174,24 +167,24 @@ */ struct CUSTOM_REG { - uclong fpga_id;/* FPGA Identification Register */ - uclong fpga_version; /* FPGA Version Number Register */ - uclong cpu_start; /* CPU start Register (write) */ - uclong cpu_stop; /* CPU stop Register (write) */ - uclong misc_reg; /* Miscelaneous Register */ - uclong idt_mode; /* IDT mode Register */ - uclong uart_irq_status;/* UART IRQ status Register */ - uclong clear_timer0_irq; /* Clear timer interrupt Register */ - uclong clear_timer1_irq; /* Clear timer interrupt Register */ - uclong clear_timer2_irq; /* Clear timer interrupt Register */ - uclong test_register; /* Test Register */ - uclong test_count; /* Test Count Register */ - uclong timer_select; /* Timer select register */ - uclong pr_uart_irq_status; /* Prioritized UART IRQ stat Reg */ - uclong ram_wait_state; /* RAM wait-state Register */ - uclong uart_wait_state;/* UART wait-state Register */ - uclong timer_wait_state; /* timer wait-state Register */ - uclong ack_wait_state; /* ACK wait State Register */ + __u32 fpga_id;/* FPGA Identification Register */ + __u32 fpga_version; /* FPGA Version Number Register */ + __u32 cpu_start; /* CPU start Register (write) */ + __u32 cpu_stop; /* CPU stop Register (write) */ + __u32 misc_reg; /* Miscelaneous Register */ + __u32 idt_mode; /* IDT mode Register */ + __u32 uart_irq_status;/* UART IRQ status Register */ + __u32 clear_timer0_irq; /* Clear timer interrupt Register */ + __u32 clear_timer1_irq; /* Clear timer interrupt Register */ + __u32 clear_timer2_irq; /* Clear timer interrupt Register */ + __u32 test_register; /* Test Register */ + __u32 test_count; /* Test Count Register */ + __u32 timer_select; /* Timer select register */ + __u32 pr_uart_irq_status; /* Prioritized UART IRQ stat Reg */ + __u32 ram_wait_state; /* RAM wait-state Register */ + __u32 uart_wait_state;/* UART wait-state Register */ + __u32 timer_wait_state; /* timer wait-state Register */ + __u32 ack_wait_state; /* ACK wait State Register */ }; /* @@ -201,34 +194,34 @@ */ struct RUNTIME_9060 { - uclong loc_addr_range; /* 00h - Local Address Range */ - uclong loc_addr_base; /* 04h - Local Address Base */ - uclong loc_arbitr; /* 08h - Local Arbitration */ - uclong endian_descr; /* 0Ch - Big/Little Endian Descriptor */ - uclong loc_rom_range; /* 10h - Local ROM Range */ - uclong loc_rom_base; /* 14h - Local ROM Base */ - uclong loc_bus_descr; /* 18h - Local Bus descriptor */ - uclong loc_range_mst; /* 1Ch - Local Range for Master to PCI */ - uclong loc_base_mst; /* 20h - Local Base for Master PCI */ - uclong loc_range_io; /* 24h - Local Range for Master IO */ - uclong pci_base_mst; /* 28h - PCI Base for Master PCI */ - uclong pci_conf_io;/* 2Ch - PCI configuration for Master IO */ - uclong filler1;/* 30h */ - uclong filler2;/* 34h */ - uclong filler3;/* 38h */ - uclong filler4;
Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote: > On Thu, Mar 15, 2007 at 05:44:01PM +, Hugh Dickins wrote: > > who removed the !offset condition, he should be consulted on its > > reintroduction. > > the !offset check looks a pretty broken heuristic indeed, it would > break random I/O. I wouldn't call it broken. At worst, I'd say it's imperfect. But that's the nature of a heuristic. It most likely works in a huge majority of cases. > The real fix is to add a ra.prev_offset along with > ra.prev_page, and if who implements it wants to be stylish he can as > well use a ra.last_contiguous_read structure that has a page and > offset fields (and then of course remove ra.prev_page). I suggested something along these lines, but I wonder if it's overkill. The !offset check is simple and appears to be a decent improvement over the current code. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
On Thu, Mar 15, 2007 at 03:06:01PM -0700, Andrew Morton wrote: > On Thu, 15 Mar 2007 22:49:23 +0100 > Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > > On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote: > > > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <[EMAIL > > > > PROTECTED]> wrote: > > > > I still think the simple fix of removing the > > > > condition is the best approach, but I'm certainly open to alternatives. > > > > > > Yes, the problem of falsely activating pages when the file is read in > > > small > > > hunks is worse than the problem which your patch fixes. > > > > Really? I would have expected all performance sensitive apps to read > > in >=PAGE_SIZE chunks. And if they don't because they split their > > dataset in blocks (like some database), it may not be so wrong to > > activate those pages that have two "hot" blocks more aggressively than > > those pages with a single hot block. > > But the problem which is being fixed here is really obscure: an application > repeatedly reading the first page and only the first page of a file, always > via the same fd. > > I'd expect that the sub-page-size read scenarion happens heaps more often > than that, especially when dealing with larger PAGE_SIZEs. Whatever that app is doing, clearly we have to keep those 4k in cache! Like obviously the specweb demonstrated that as long as you are _repeating_ the same read, it's correct to activate the page even if it was reading from the same page as before. What is wrong is to activate the page more aggressively if it's _different_ parts of the page that are being read in a contiguous way. I thought that the whole point of the ra.prev_page was to detect _contiguous_ (not random) I/O made with a small buffer, anything else doesn't make much sense to me. In short I think taking a ra.prev_offset into account as suggested by Dave Kleikamp is the best, it may actually benefit the obscure app too ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/4] Arch independent quicklists V2
On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote: > There are furthermore distinctions to make between fork() and execve(). > fork() stomps over the entire process address space copying pagetables > en masse. After execve() a process incrementally faults in PTE's one at > a time. It should be clear that if case analyses are of interest at > all, fork() will want cache-hot pages (cache-preloaded pages?) where > such are largely wasted on incremental faults after execve(). The copy > operations in fork() should probably also be examined in the context of > shared pagetables at some point. To make this perfectly clear, we can deal with the varying usage cases with hot/cold flags to the pagetable allocator functions. Where bulk copies such as fork() are happening, it makes perfect sense to precharge the cache by eager zeroing. Where sparse single pte affairs such as incrementally faulting things in after execve() are involved, cache cold preconstructed pagetable pages are ideal. Address hints could furthermore be used to precharge single cachelines (e.g. via prefetch) in the sparse usage case. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thread stacks and strict vm overcommit accounting
On Thu, 15 Mar 2007, Andrew Morton wrote: > On Thu, 15 Mar 2007 23:33:43 + > Alan Cox <[EMAIL PROTECTED]> wrote: > > > > Stack RSS should certainly be included in Committed_AS, > > > but RLIMIT_STACK merely limits how big the stack vma may grow to: > > > at any moment the stack vma is probably very much smaller, > > > and only its current size is accounted in Committed_AS. > > > > With a typical size as a fuzz factor preaccounted in later kernels. > > Where's that done? I don't know what Alan is referring to there. > > > > > > Is this the intended behaviour? > > > > > > > > That sounds like a bug to me. > > > > > > I'm suspecting it's an oddity rather than a bug. > > > > It is intended behaviour. Intended in the way the different stacks are implemented, but odd enough for us to wonder at the difference. > > Each instance of > > main() > { > sleep(100); > } > > appears to increase Committed_AS by around 200kb. But we've committed to > providing it with 8MB for stack. > > How come this is correct? We've no more committed to providing each instance with 8MB of stack, than we've committed to providing each instance with RLIMIT_AS of address space. The rlimits are limits, not commitments, surely? Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/13] signalfd/timerfd/asyncfd v5 - timerfd core ...
On Thu, 15 Mar 2007, Thomas Gleixner wrote: > Davide, > > On Wed, 2007-03-14 at 15:19 -0700, Davide Libenzi wrote: > > > +static int timerfd_tmrproc(struct hrtimer *htmr) > > +{ > > + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr); > > + int rval = HRTIMER_NORESTART; > > + unsigned long flags; > > + > > + spin_lock_irqsave(&ctx->lock, flags); > > + ctx->ticks++; > > + wake_up_locked(&ctx->wqh); > > + if (ctx->tintv.tv64 != 0) { > > + hrtimer_forward(htmr, htmr->base->softirq_time, ctx->tintv); > > Sorry, I missed that in the first reviews. Please use > hrtimer_cb_get_time(htmr) instead of htmr->base->softirq_time, so this > is high res timer safe. Heh, I was actually looking for a function instead of peeking over the tiemr strcture, but 2.6.20 did not have. Rebased over 2.6.21-rc3 now, so I can use it. > > + rval = HRTIMER_RESTART; > > + } > > + spin_unlock_irqrestore(&ctx->lock, flags); > > + > > + return rval; > > +} > > + > > + > > +static int timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags, > > +const struct itimerspec *ktmr) > > +{ > > Make this void, returns 0 anyway Ack > > + enum hrtimer_mode htmode; > > + > > + htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_ABS: HRTIMER_REL; > > + > > + ctx->ticks = 0; > > + ctx->clockid = clockid; > > + ctx->flags = flags; > > + ctx->texp = timespec_to_ktime(ktmr->it_value); > > clockid is stored in the timer on setup, so no need to store it again. > expiry time and flags are not used after setup. > > Please remove those fields. Ack > > + if (ufd == -1) { > > + ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL); > > + if (!ctx) > > + return -ENOMEM; > > + > > + init_waitqueue_head(&ctx->wqh); > > + spin_lock_init(&ctx->lock); > > + ctx->clockid = -1; > > + > > + error = timerfd_setup(ctx, clockid, flags, &ktmr); > > + if (error) > > + goto err_ctxfree; > > Timer setup can not fail Ack, the new version can't. > > + /* > > +* When we call this, the initialization must be complete, since > > +* aino_getfd() will install the fd. > > +*/ > > + error = aino_getfd(&ufd, &inode, &file, "[timerfd]", > > + &timerfd_fops, ctx); > > + if (error) > > + goto err_ctxfree; > > Again: Please turn this around. No need to start the timer before we > know, that everything works. The timerfd_setup() is not locked, so we need to make sure everything is setup, before advertising the fd (and aino_getfd does that). > > + kmem_cache_free(timerfd_ctx_cachep, ctx); > > +} > > + > > + > > +static int timerfd_close(struct inode *inode, struct file *file) > > +{ > > + timerfd_cleanup(file->private_data); > > + return 0; > > +} > > + > > Please move the timerfd_cleanup code into close(). I usually prefer to have a cleanup function that works on the file's data, but I moved the code in the release function now. Thx for the review! I'll repost a new version based on 2.6.21-rc3 ... - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] swsusp: Do not use page flags
Hi! > > > > On Mon, 12 Mar 2007 22:19:20 +0100 "Rafael J. Wysocki" <[EMAIL > > > > PROTECTED]> wrote: > > > > +int create_basic_memory_bitmaps(void) > > > > +{ > > > > + struct memory_bitmap *bm1, *bm2; > > > > + int error = 0; > > > > + > > > > + BUG_ON(forbidden_pages_map || free_pages_map); > > > > + > > > > + bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC); > > > > + if (!bm1) > > > > + return -ENOMEM; > > > > + > > > > + error = memory_bm_create(bm1, GFP_ATOMIC | __GFP_COLD, PG_ANY); > > > > + if (error) > > > > + goto Free_first_object; > > > > + > > > > + bm2 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC); > > > > + if (!bm2) > > > > + goto Free_first_bitmap; > > > > + > > > > + error = memory_bm_create(bm2, GFP_ATOMIC | __GFP_COLD, PG_ANY); > > > > + if (error) > > > > > > What is the risk that we'll go OOM here? GFP_ATOMIC is rather unreliable. > > > > Well, this can be called after processes (including kswapd) has been frozen. > > We can't go to sleep at this point. > > So it _is_ unreliable? We are careful to leave some memory aside for suspend... We actually free memory at beggining of suspend, and there's some simple "add few percent for our overhead" there. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] FUTEX : introduce private hashtables
On Fri, Mar 16, 2007 at 07:25:53AM +1100, Nick Piggin wrote: > I would just avoid the complexity and setup/teardown costs, and just > use a vmalloc'ed global hash for NUMA. This patch is not the way to go, but neither are vmalloc()'d global hashtables. When you just happen to hash to the wrong node, you're in for quasi-unreproducible poor performance. The size is never right, at which point RCU resizing is required with all its overhead and memory freeing delays and failure to resize (even if only to contract) under pressure. Better would be to use a different data structure admitting locality of reference and adaptively sizing itself, furthermore localized to the appropriate sharing domain. For file-backed futexes, this would be the struct address_space. For anonymous-backed futexes, this would be the COW sharing group, which an anon_vma could almost be used to represent. Using an object to properly represent the COW sharing group (i.e. Hugh's struct anon) would do the trick, and one might as well move the rmap code over to it while we're at it since the anon_vma scanning tricks are all pointless overhead once the COW sharing group is accurately tracked (the scanning around for nearby vmas with ->anon_vma set is not great anyway, though the overhead is hidden in the noise of large teardown and setup operations; inheriting on fork() is much simpler and faster). In such a manner localization is accomplished while no interface extensions are required. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/