Re: [kvm-devel] [PATCH 3/3] KVM paravirt-ops implementation
On Wed, 2007-08-29 at 04:31 +1000, Rusty Russell wrote: > On Mon, 2007-08-27 at 10:16 -0500, Anthony Liguori wrote: > > @@ -569,6 +570,7 @@ asmlinkage void __init start_kernel(void) > > } > > sort_main_extable(); > > trap_init(); > > + kvm_guest_init(); > > rcu_init(); > > init_IRQ(); > > pidhash_init(); > > Hi Anthony, > > This placement seems arbitrary. Why not earlier from setup_arch, or as > a normal initcall? The placement is important if we wish to have a paravirt_ops hook for the interrupt controller. This is the latest possible spot we can do it. A comment is probably appropriate here. Regards, Anthony Liguori > Rusty. > > > > - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ___ > kvm-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/kvm-devel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS woes again
On 8/28/07, Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote: > > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote: > > > On 8/27/07, Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang > > > > > > regression' > > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6 > > > > > > > > > > Yes, it certainly does -- all the symptoms match! > > > > > > > > Could you and Bret please check if the attached patch fixes the hang? > > > > > > no good for me still hangs after ~30minutes > > > > I just booted into the new kernel > > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs > > in 10-15 minutes. > > > > Process traces available at > > http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz > > > > Regards, > > florin > > Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be > called recursively. > > The following patch should be correct. Please just discard the previous > one... > > Trond > uptime of 3 hours and keyboard is still working fine I'll hopefully get to test this on the mini tomorrow for at least 3 hours also > > -- Forwarded message -- > From: Trond Myklebust <[EMAIL PROTECTED]> > To: > Date: Mon, 27 Aug 2007 09:14:56 -0400 > Subject: No Subject > Doh! We can't use cancel_delayed_work_sync because we may have been called > from an unmount that was being performed by nfs_automount_task. > > Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]> > --- > > fs/nfs/namespace.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c > index aea76d0..acfc56f 100644 > --- a/fs/nfs/namespace.c > +++ b/fs/nfs/namespace.c > @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct > *work) > void nfs_release_automount_timer(void) > { > if (list_empty(_automount_list)) > - cancel_delayed_work_sync(_automount_task); > + cancel_delayed_work(_automount_task); > } > > /* > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 2/3] Refactor hypercall infrastructure
On Wed, 2007-08-29 at 04:12 +1000, Rusty Russell wrote: > On Mon, 2007-08-27 at 10:16 -0500, Anthony Liguori wrote: > > This patch refactors the current hypercall infrastructure to better support > > live > > migration and SMP. It eliminates the hypercall page by trapping the UD > > exception that would occur if you used the wrong hypercall instruction for > > the > > underlying architecture and replacing it with the right one lazily. > > It also reduces the number of hypercall args, which you don't mention > here. Oh yes, sorry. > > + er = emulate_instruction(>vcpu, kvm_run, 0, 0); > > + > > + /* we should only succeed here in the case of hypercalls which > > + cannot generate an MMIO event. MMIO means that the emulator > > + is mistakenly allowing an instruction that should generate > > + a UD fault so it's a bug. */ > > + BUG_ON(er == EMULATE_DO_MMIO); > > This seems... unwise. Firstly we know our emulator is incomplete. > Secondly an SMP guest can exploit this to crash the host. This code is gone in v2. > (Code is in two places). > > > +#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1" Good point. > A nice big comment would be nice here, I think. Note that this is big > enough for both "int $0x1f" and "sysenter", so I'm happy. I need to add a comment somewhere mentioning that if you patch with something less than 3 bytes, then you should pad with nop but the hypervisor must treat the whole instruction (including the padding) as atomic (that is, regardless of hypercall size, eip += 3) or you run the risk of breakage during migration. Regards, Anthony Liguori > Cheers, > Rusty. > > > - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ___ > kvm-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/kvm-devel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Send quota messages via netlink
Andrew Morton <[EMAIL PROTECTED]> writes: > On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I'm sending rediffed patch implementing sending of quota messages via netlink >> interface (some rationale in patch description). I've already posted it to >> LKML some time ago and there were no objections, so I guess it's fine to put >> it to -mm. Andrew, would you be so kind? Thanks. >> Userspace deamon reading the messages from the kernel and sending them to >> dbus and/or user console is also written (it's part of quota-tools). The >> only remaining problem is there are a few changes needed to libnl needed for >> the userspace daemon. They were basically acked by the maintainer but it >> seems he has not merged the patches yet. So this will take a bit more time. >> > > So it's a new kernel->userspace interface. > > But we have no description of the interface :( > >> +/* Send warning to userspace about user which exceeded quota */ >> +static void send_warning(const struct dquot *dquot, const char warntype) >> +{ >> +static unsigned long seq; >> +struct sk_buff *skb; >> +void *msg_head; >> +int ret; >> + >> +skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS); >> +if (!skb) { >> +printk(KERN_ERR >> + "VFS: Not enough memory to send quota warning.\n"); >> +return; >> +} >> + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0, > QUOTA_NL_C_WARNING); >> +if (!msg_head) { >> +printk(KERN_ERR >> + "VFS: Cannot store netlink header in quota warning.\n"); >> +goto err_out; >> +} >> +ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot->dq_type); >> +if (ret) >> +goto attr_err_out; >> +ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot->dq_id); >> +if (ret) >> +goto attr_err_out; >> +ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype); >> +if (ret) >> +goto attr_err_out; >> +ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR, >> +MAJOR(dquot->dq_sb->s_dev)); >> +if (ret) >> +goto attr_err_out; >> +ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR, >> +MINOR(dquot->dq_sb->s_dev)); >> +if (ret) >> +goto attr_err_out; >> +ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current->user->uid); >> +if (ret) >> +goto attr_err_out; >> +genlmsg_end(skb, msg_head); >> + >> +ret = genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS); >> +if (ret < 0 && ret != -ESRCH) >> +printk(KERN_ERR >> +"VFS: Failed to send notification message: %d\n", ret); >> +return; >> +attr_err_out: >> +printk(KERN_ERR "VFS: Failed to compose quota message: %d\n", ret); >> +err_out: >> +kfree_skb(skb); >> +} >> +#endif > > This is it. Normally netlink payloads are represented as a struct. How > come this one is built-by-hand? No netlink fields (unless I'm confused) are represented as a struct, not the entire netlink payload. > It doesn't appear to be versioned. Should it be? Well. If it is using netlink properly each field should have a tag. So it should not need to be versioned, because each field is strictly controlled. > Does it have (or need) reserved-set-to-zero space for expansion? Again, > hard to tell.. Not if netlink is used properly. Just another nested tag. > I guess it's OK to send a major and minor out of the kernel like this. > What's it for? To represent a filesytem? I wonder if there's a more > modern and useful way of describing the fs. Path to mountpoint or > something? Or perhaps the string the fs was mounted with. > I suspect the namespace virtualisation guys would be interested in a new > interface which is sending current->user->uid up to userspace. uids are > per-namespace now. What are the implications? (cc's added) That we definitely would be. Although the user namespaces is rather strongly incomplete at the moment. > Is it worth adding a comment explaining why GFP_NOFS is used here? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23 0/2] cxgb3 - Fix dev->priv usage
Looks OK to me but I would just roll up the second patch into the first patch and let Jeff merge it as one commit. There's no point in creating an intermediate tree that doesn't build -- it just breaks git bisect for no useful purpose. Also as a side note, when trying to test this I got the message could not load TP SRAM: unable to load t3a_protocol_sram-1.0.44.bin and you guys seem to only have t3b protocol sram images on your web site. Could you send me the t3a file (or swap out my T3A boards for T3B boards ;)? Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
Andrew Morton <[EMAIL PROTECTED]> writes: > On Tue, 28 Aug 2007 16:40:15 -0600 [EMAIL PROTECTED] (Eric W. Biederman) > wrote: > >> +static int deprecated_sysctl_warning(struct __sysctl_args *args) >> +{ >> +static int msg_count; >> +int name[CTL_MAXNAME]; >> +int i; >> + >> +/* Read in the sysctl name for better debug message logging */ >> +for (i = 0; i < args->nlen; i++) >> +if (get_user(name[i], args->name + i)) >> +return -EFAULT; >> + >> +/* Ignore accesses to kernel.version */ >> + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == >> KERN_VERSION)) >> +return 0; > > Do we want to do all the above if msg_count>=5? Well. It won't really change order of the algorithm because we have to read the data in any way. So an earlier short circuit exit would speed things up by a little bit, but it really shouldn't matter either way. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4: maxcpus still broken
On Wed, Aug 29, 2007 at 06:03:34AM +0100, Hugh Dickins wrote: > On Wed, 29 Aug 2007, Alexey Dobriyan wrote: > > On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote: > > > On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > > > > Every time I try to boot with maxcpus=1 it dies show_stat(): > > > > > > Is this a regression? > > > > yep > > A regression since when, I wonder? Anything before "ACPI: boot correctly with "nosmp" or "maxcpus=0"" is fine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4: maxcpus still broken
On Wed, 29 Aug 2007, Alexey Dobriyan wrote: > On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote: > > On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > > > Every time I try to boot with maxcpus=1 it dies show_stat(): > > > > Is this a regression? > > yep A regression since when, I wonder? Please do NOT waste any time bisecting, but I'd be interested to know which release or -rc you previously found it worked on. When I gave the code a quick look, it appeared to be something which has long been wrong; but I didn't investigate whether per-cpu allocation has changed recently. My _suspicion_, no more than that, is that it might be a regression to you because you're now forced to have CONFIG_HOTPLUG_CPU=y where you didn't need it before. Anyway, it doesn't matter too much what it's a regression since: it's a bug that needs fixing whatever, and should be simple. My x86_64 was running other tests yesterday which I didn't want to interrupt, but I'll take a look later on today. > > > Hugh fixed some issues on x86-64 commit > > 813409771731d80e6fa94199adf99f2269a4afc0 > > This is 2.6.23-rc4, which has this fix, yes. > > And I have second box with exactly same behaviour: x86_64 E6400, it also has > ACPI=n > Turning on ACPI doesn't make it any better, though. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
* Al Boldi <[EMAIL PROTECTED]> wrote: > I have narrowed it down a bit to add_wait_runtime. the scheduler is a red herring here. Could you "strace -ttt -TTT" one of the glxgears instances (and send us the cfs-debug-info.sh output, with CONFIG_SCHED_DEBUG=y and CONFIG_SCHEDSTATS=y as requested before) so that we can have a closer look? i reproduced something similar and there the stall is caused by 1+ second select() delays on the X client<->server socket. The scheduler stats agree with that: se.sleep_max : 2194711437 se.block_max : 0 se.exec_max : 977446 se.wait_max : 1912321 the scheduler itself had a worst-case scheduling delay of 1.9 milliseconds for that glxgears instance (which is perfectly good - in fact - excellent interactivity) - but the task had a maximum sleep time of 2.19 seconds. So the 'glitch' was not caused by the scheduler. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Send quota messages via netlink
From: Andrew Morton <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 21:13:35 -0700 > This is it. Normally netlink payloads are represented as a struct. How > come this one is built-by-hand? He is using attributes, which is perfect and arbitrarily extensible with zero backwards compatability concerns. If he wants to provide a new attribute, he just adds it without any issues. When new attributes are added, older apps simply ignore the attributes they don't understand. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Send quota messages via netlink
On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote: > +static void send_warning(const struct dquot *dquot, const char warntype) > +{ > + static unsigned long seq; > + struct sk_buff *skb; > + void *msg_head; > + int ret; > + > + skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS); > + if (!skb) { > + printk(KERN_ERR > + "VFS: Not enough memory to send quota warning.\n"); > + return; > + } > + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0, > QUOTA_NL_C_WARNING); The access to seq is racy, isn't it? If so, that can be solved with a lock, or with atomic_add_return(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
On Wed, 29 Aug 2007 00:04:59 +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote: > On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote: > > +When: September 2010 > > +Option: CONFIG_SYSCTL_SYSCALL > > +Why: The same information is available in a more convenient from > > + /proc/sys, and none of the sysctl variables appear to be > > + important performance wise. > > + > > + Binary sysctls are a long standing source of subtle kernel > > + bugs and security issues. > > + > > + When I looked several months ago all I could find after > > + searching several distributions were 5 user space programs and > > + glibc (which falls back to /proc/sys) using this syscall. > > Umm, no way we're ever going to remove a syscall like this. Please > stop this deprecration crap. Just make sure no ones adds more binary > sysctls. I think it's worth a try. It might take two, three or five years, who knows? If it turns out to be impractical then we we can just change our minds later, no big loss. It's just too early to say right now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
On Tue, 28 Aug 2007 16:40:15 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > +static int deprecated_sysctl_warning(struct __sysctl_args *args) > +{ > + static int msg_count; > + int name[CTL_MAXNAME]; > + int i; > + > + /* Read in the sysctl name for better debug message logging */ > + for (i = 0; i < args->nlen; i++) > + if (get_user(name[i], args->name + i)) > + return -EFAULT; > + > + /* Ignore accesses to kernel.version */ > + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == > KERN_VERSION)) > + return 0; Do we want to do all the above if msg_count>=5? > + if (msg_count < 5) { > + msg_count++; > + printk(KERN_INFO > + "warning: process `%s' used the deprecated sysctl " > + "system call with ", current->comm); > + for (i = 0; i < args->nlen; i++) > + printk("%d.", name[i]); > + printk("\n"); > + } > + return 0; > +} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4: BAD regression
Daniel, Does this patch help you, or do we need to revert the whole thing? Sorry for the trouble, Alex. Daniel Ritz wrote: > tried that one on my old toshiba tecra 8000 laptop, almost killing it. > the fan doesn't work any more...type 'make' and see the box dying. > luckily my CPU doesn't commit suicide...bisected it to that one: > > cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit > commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b > Author: Alexey Starikovskiy <[EMAIL PROTECTED]> > Date: Fri Aug 3 17:52:48 2007 -0400 > > ACPI: EC: If ECDT is not found, look up EC in DSDT. > > Some ASUS laptops access EC space from device _INI methods, but do not > provide ECDT for early EC setup. In order to make them function properly, > there is a need to find EC is DSDT before any _INI is called. > > Similar functionality was turned on by acpi_fake_ecdt=1 command line > before. Now it is on all the time. > > http://bugzilla.kernel.org/show_bug.cgi?id=8598 > > Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]> > Signed-off-by: Len Brown <[EMAIL PROTECTED]> > Drop early init of EC from DSDT patch From: Alexey Starikovskiy <[EMAIL PROTECTED]> --- drivers/acpi/ec.c | 21 +++-- 1 files changed, 7 insertions(+), 14 deletions(-) diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 43749c8..e28f5b2 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -876,20 +876,13 @@ int __init acpi_ec_ecdt_probe(void) */ status = acpi_get_table(ACPI_SIG_ECDT, 1, (struct acpi_table_header **)_ptr); - if (ACPI_SUCCESS(status)) { - printk(KERN_INFO PREFIX "EC description table is found, configuring boot EC\n\n"); - boot_ec->command_addr = ecdt_ptr->control.address; - boot_ec->data_addr = ecdt_ptr->data.address; - boot_ec->gpe = ecdt_ptr->gpe; - boot_ec->handle = ACPI_ROOT_OBJECT; - } else { - printk(KERN_DEBUG PREFIX "Look up EC in DSDT\n"); - status = acpi_get_devices(ec_device_ids[0].id, ec_parse_device, - boot_ec, NULL); - if (ACPI_FAILURE(status)) - goto error; - } - + if (ACPI_FAILURE(status)) + goto error; + printk(KERN_INFO PREFIX "EC description table is found, configuring boot EC\n"); + boot_ec->command_addr = ecdt_ptr->control.address; + boot_ec->data_addr = ecdt_ptr->data.address; + boot_ec->gpe = ecdt_ptr->gpe; + boot_ec->handle = ACPI_ROOT_OBJECT; ret = ec_install_handlers(boot_ec); if (!ret) { first_ec = boot_ec;
Re: CFS review
On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote: > * Al Boldi <[EMAIL PROTECTED]> wrote: > > > No need for framebuffer. All you need is X using the X.org > > vesa-driver. Then start gears like this: > > > > # gears & gears & gears & > > > > Then lay them out side by side to see the periodic stallings for > > ~10sec. > > i just tried something similar (by adding Option "NoDRI" to xorg.conf) > and i'm wondering how it can be smooth on vesa-driver at all. I tested > it on a Core2Duo box and software rendering manages to do about 3 frames > per second. (although glxgears itself thinks it does ~600 fps) If i > start 3 glxgears then they do ~1 frame per second each. This is on > Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and > xorg-x11-drv-i810-2.0.0-4.fc7. At least you can run the darn test... the third instance of glxgears here means say bye bye to GUI instantly. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4: maxcpus still broken
On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote: > On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > > Every time I try to boot with maxcpus=1 it dies show_stat(): > > Is this a regression? yep > Hugh fixed some issues on x86-64 commit > 813409771731d80e6fa94199adf99f2269a4afc0 This is 2.6.23-rc4, which has this fix, yes. And I have second box with exactly same behaviour: x86_64 E6400, it also has ACPI=n Turning on ACPI doesn't make it any better, though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote: > > Then lay them out side by side to see the periodic stallings for > > ~10sec. The X scheduling code isn't really designed to handle software GL well; the requests can be very expensive to execute, and yet are specified as atomic operations (sigh). > i just tried something similar (by adding Option "NoDRI" to xorg.conf) > and i'm wondering how it can be smooth on vesa-driver at all. I tested > it on a Core2Duo box and software rendering manages to do about 3 frames > per second. (although glxgears itself thinks it does ~600 fps) If i > start 3 glxgears then they do ~1 frame per second each. This is on > Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and > xorg-x11-drv-i810-2.0.0-4.fc7. Are you attempting to measure the visible updates by eye? Or are you using some other metric? In any case, attempting to measure anything using glxgears is a bad idea; it's not representative of *any* real applications. And then using software GL on top of that... What was the question again? -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: CFS review
Ingo Molnar wrote: > * Linus Torvalds <[EMAIL PROTECTED]> wrote: > > On Tue, 28 Aug 2007, Al Boldi wrote: > > > I like your analysis, but how do you explain that these stalls > > > vanish when __update_curr is disabled? > > > > It's entirely possible that what happens is that the X scheduling is > > just a slightly unstable system - which effectively would turn a small > > scheduling difference into a *huge* visible difference. > > i think it's because disabling __update_curr() in essence removes the > ability of scheduler to preempt tasks - that hack in essence results in > a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous > pair of tasks in essence - and thus gears cannot "overload" X. I have narrowed it down a bit to add_wait_runtime. Patch 2.6.22.5-v20.4 like this: 346- * the two values are equal) 347- * [Note: delta_mine - delta_exec is negative]: 348- */ 349:// add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec); 350-} 351- 352-static void update_curr(struct cfs_rq *cfs_rq) When disabling add_wait_runtime the stalls are gone. With this change the scheduler is still usable, but it does not constitute a fix. Now, even with this hack, uneven nice-levels between X and gears causes a return of the stalls, so make sure both X and gears run on the same nice-level when testing. Again, the whole point of this workload is to expose scheduler glitches regardless of whether X is broken or not, and my hunch is that this problem looks suspiciously like an ia-boosting bug. What's important to note is that by adjusting the scheduler we can effect a correction in behaviour, and as such should yield this problem as fixable. It's probably a good idea to look further into add_wait_runtime. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
* Al Boldi <[EMAIL PROTECTED]> wrote: > No need for framebuffer. All you need is X using the X.org > vesa-driver. Then start gears like this: > > # gears & gears & gears & > > Then lay them out side by side to see the periodic stallings for > ~10sec. i just tried something similar (by adding Option "NoDRI" to xorg.conf) and i'm wondering how it can be smooth on vesa-driver at all. I tested it on a Core2Duo box and software rendering manages to do about 3 frames per second. (although glxgears itself thinks it does ~600 fps) If i start 3 glxgears then they do ~1 frame per second each. This is on Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and xorg-x11-drv-i810-2.0.0-4.fc7. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Send quota messages via netlink
On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote: > Hello, > > I'm sending rediffed patch implementing sending of quota messages via > netlink > interface (some rationale in patch description). I've already posted it to > LKML some time ago and there were no objections, so I guess it's fine to put > it to -mm. Andrew, would you be so kind? Thanks. > Userspace deamon reading the messages from the kernel and sending them to > dbus and/or user console is also written (it's part of quota-tools). The > only remaining problem is there are a few changes needed to libnl needed for > the userspace daemon. They were basically acked by the maintainer but it > seems he has not merged the patches yet. So this will take a bit more time. > So it's a new kernel->userspace interface. But we have no description of the interface :( > +/* Send warning to userspace about user which exceeded quota */ > +static void send_warning(const struct dquot *dquot, const char warntype) > +{ > + static unsigned long seq; > + struct sk_buff *skb; > + void *msg_head; > + int ret; > + > + skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS); > + if (!skb) { > + printk(KERN_ERR > + "VFS: Not enough memory to send quota warning.\n"); > + return; > + } > + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0, > QUOTA_NL_C_WARNING); > + if (!msg_head) { > + printk(KERN_ERR > + "VFS: Cannot store netlink header in quota warning.\n"); > + goto err_out; > + } > + ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot->dq_type); > + if (ret) > + goto attr_err_out; > + ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot->dq_id); > + if (ret) > + goto attr_err_out; > + ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype); > + if (ret) > + goto attr_err_out; > + ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR, > + MAJOR(dquot->dq_sb->s_dev)); > + if (ret) > + goto attr_err_out; > + ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR, > + MINOR(dquot->dq_sb->s_dev)); > + if (ret) > + goto attr_err_out; > + ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current->user->uid); > + if (ret) > + goto attr_err_out; > + genlmsg_end(skb, msg_head); > + > + ret = genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS); > + if (ret < 0 && ret != -ESRCH) > + printk(KERN_ERR > + "VFS: Failed to send notification message: %d\n", ret); > + return; > +attr_err_out: > + printk(KERN_ERR "VFS: Failed to compose quota message: %d\n", ret); > +err_out: > + kfree_skb(skb); > +} > +#endif This is it. Normally netlink payloads are represented as a struct. How come this one is built-by-hand? It doesn't appear to be versioned. Should it be? Does it have (or need) reserved-set-to-zero space for expansion? Again, hard to tell.. I guess it's OK to send a major and minor out of the kernel like this. What's it for? To represent a filesytem? I wonder if there's a more modern and useful way of describing the fs. Path to mountpoint or something? I suspect the namespace virtualisation guys would be interested in a new interface which is sending current->user->uid up to userspace. uids are per-namespace now. What are the implications? (cc's added) Is it worth adding a comment explaining why GFP_NOFS is used here? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4, maxcpus=1 regression
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > reverting that commit makes the system boot again. I've attached the > > > .config. > > > > Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in > > particular ("fix maxcpus=N parsing") was supposed to fix that commit. > > ah ... indeed my tree is a few commits ahead of rc4. Checking. indeed that was it, it boots fine now :-/ Sorry about the noise. [ I guess i should make it a policy to not mail bugreports before 6am ;-) ] Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] iw_cxgb3 - dev->priv fix follow up
From: Divy Le Ray <[EMAIL PROTECTED]> The RDMA driver sitting on top of cxgb3 now uses the exported function dev2t3cdev() to retrieve the the t3cdev associated with a net_device. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/infiniband/hw/cxgb3/cxio_hal.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 1518b41..beb2a38 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -916,7 +916,7 @@ int cxio_rdev_open(struct cxio_rdev *rdev_p) PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name); memset(_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp)); if (!rdev_p->t3cdev_p) - rdev_p->t3cdev_p = T3CDEV(netdev_p); + rdev_p->t3cdev_p = dev2t3cdev(netdev_p); rdev_p->t3cdev_p->ulp = (void *) rdev_p; err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS, &(rdev_p->rnic_info)); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] i386 visws: "extern inline" -> "static inline"
On 239, 08 27, 2007 at 11:28:19PM +0200, Adrian Bunk wrote: > "extern inline" will have different semantics with gcc 4.3. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Looks good. Acked-by: Andrey Panin <[EMAIL PROTECTED]> > --- > > This patch has been sent on: > - 14 Aug 2007 > > include/asm-i386/mach-visws/cobalt.h |8 > include/asm-i386/mach-visws/lithium.h |8 > 2 files changed, 8 insertions(+), 8 deletions(-) > > e12d2e797af72524f53a0ef3a7dd3cf91f58c542 > diff --git a/include/asm-i386/mach-visws/cobalt.h > b/include/asm-i386/mach-visws/cobalt.h > index 33c3622..9952588 100644 > --- a/include/asm-i386/mach-visws/cobalt.h > +++ b/include/asm-i386/mach-visws/cobalt.h > @@ -94,22 +94,22 @@ > #define CO_IRQ_8259 CO_IRQ(CO_APIC_8259) > > #ifdef CONFIG_X86_VISWS_APIC > -extern __inline void co_cpu_write(unsigned long reg, unsigned long v) > +static inline void co_cpu_write(unsigned long reg, unsigned long v) > { > *((volatile unsigned long *)(CO_CPU_VADDR+reg))=v; > } > > -extern __inline unsigned long co_cpu_read(unsigned long reg) > +static inline unsigned long co_cpu_read(unsigned long reg) > { > return *((volatile unsigned long *)(CO_CPU_VADDR+reg)); > } > > -extern __inline void co_apic_write(unsigned long reg, unsigned long v) > +static inline void co_apic_write(unsigned long reg, unsigned long v) > { > *((volatile unsigned long *)(CO_APIC_VADDR+reg))=v; > } > > -extern __inline unsigned long co_apic_read(unsigned long reg) > +static inline unsigned long co_apic_read(unsigned long reg) > { > return *((volatile unsigned long *)(CO_APIC_VADDR+reg)); > } > diff --git a/include/asm-i386/mach-visws/lithium.h > b/include/asm-i386/mach-visws/lithium.h > index d443e68..dfcd4f0 100644 > --- a/include/asm-i386/mach-visws/lithium.h > +++ b/include/asm-i386/mach-visws/lithium.h > @@ -29,22 +29,22 @@ > #define LI_INTD 0x0080 > > /* More special purpose macros... */ > -extern __inline void li_pcia_write16(unsigned long reg, unsigned short v) > +static inline void li_pcia_write16(unsigned long reg, unsigned short v) > { > *((volatile unsigned short *)(LI_PCIA_VADDR+reg))=v; > } > > -extern __inline unsigned short li_pcia_read16(unsigned long reg) > +static inline unsigned short li_pcia_read16(unsigned long reg) > { >return *((volatile unsigned short *)(LI_PCIA_VADDR+reg)); > } > > -extern __inline void li_pcib_write16(unsigned long reg, unsigned short v) > +static inline void li_pcib_write16(unsigned long reg, unsigned short v) > { > *((volatile unsigned short *)(LI_PCIB_VADDR+reg))=v; > } > > -extern __inline unsigned short li_pcib_read16(unsigned long reg) > +static inline unsigned short li_pcib_read16(unsigned long reg) > { > return *((volatile unsigned short *)(LI_PCIB_VADDR+reg)); > } > > -- Andrey Panin| Linux and UNIX system administrator [EMAIL PROTECTED] | PGP key: wwwkeys.pgp.net signature.asc Description: Digital signature
[PATCH 2.6.23 1/2] cxgb3 - Fix dev->priv usage
From: Divy Le Ray <[EMAIL PROTECTED]> cxgb3 used netdev_priv() and dev->priv for different purposes. In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix. This patch is a partial backport of Dave Miller's changes in the net-2.6.24 git branch. Without this fix, cxgb3 crashes on 2.6.23. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/net/cxgb3/adapter.h |2 + drivers/net/cxgb3/cxgb3_main.c| 126 + drivers/net/cxgb3/cxgb3_offload.c | 16 - drivers/net/cxgb3/cxgb3_offload.h |2 + drivers/net/cxgb3/sge.c | 23 --- drivers/net/cxgb3/t3cdev.h|3 - 6 files changed, 104 insertions(+), 68 deletions(-) diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h index ab72563..20e887d 100644 --- a/drivers/net/cxgb3/adapter.h +++ b/drivers/net/cxgb3/adapter.h @@ -50,7 +50,9 @@ typedef irqreturn_t(*intr_handler_t) (int, void *); struct vlan_group; +struct adapter; struct port_info { + struct adapter *adapter; struct vlan_group *vlan_grp; const struct port_type_info *port_type; u8 port_id; diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index dc5d269..f3bf128 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -358,11 +358,14 @@ static int init_dummy_netdevs(struct adapter *adap) for (j = 0; j < pi->nqsets - 1; j++) { if (!adap->dummy_netdev[dummy_idx]) { - nd = alloc_netdev(0, "", ether_setup); + struct port_info *p; + + nd = alloc_netdev(sizeof(*p), "", ether_setup); if (!nd) goto free_all; - nd->priv = adap; + p = netdev_priv(nd); + p->adapter = adap; nd->weight = 64; set_bit(__LINK_STATE_START, >state); adap->dummy_netdev[dummy_idx] = nd; @@ -482,7 +485,8 @@ static ssize_t attr_store(struct device *d, struct device_attribute *attr, #define CXGB3_SHOW(name, val_expr) \ static ssize_t format_##name(struct net_device *dev, char *buf) \ { \ - struct adapter *adap = dev->priv; \ + struct port_info *pi = netdev_priv(dev); \ + struct adapter *adap = pi->adapter; \ return sprintf(buf, "%u\n", val_expr); \ } \ static ssize_t show_##name(struct device *d, struct device_attribute *attr, \ @@ -493,7 +497,8 @@ static ssize_t show_##name(struct device *d, struct device_attribute *attr, \ static ssize_t set_nfilters(struct net_device *dev, unsigned int val) { - struct adapter *adap = dev->priv; + struct port_info *pi = netdev_priv(dev); + struct adapter *adap = pi->adapter; int min_tids = is_offload(adap) ? MC5_MIN_TIDS : 0; if (adap->flags & FULL_INIT_DONE) @@ -515,7 +520,8 @@ static ssize_t store_nfilters(struct device *d, struct device_attribute *attr, static ssize_t set_nservers(struct net_device *dev, unsigned int val) { - struct adapter *adap = dev->priv; + struct port_info *pi = netdev_priv(dev); + struct adapter *adap = pi->adapter; if (adap->flags & FULL_INIT_DONE) return -EBUSY; @@ -556,9 +562,10 @@ static struct attribute_group cxgb3_attr_group = {.attrs = cxgb3_attrs }; static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr, char *buf, int sched) { - ssize_t len; + struct port_info *pi = netdev_priv(to_net_dev(d)); + struct adapter *adap = pi->adapter; unsigned int v, addr, bpt, cpt; - struct adapter *adap = to_net_dev(d)->priv; + ssize_t len; addr = A_TP_TX_MOD_Q1_Q0_RATE_LIMIT - sched / 2; rtnl_lock(); @@ -581,10 +588,11 @@ static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr, static ssize_t tm_attr_store(struct device *d, struct device_attribute *attr, const char *buf, size_t len, int sched) { + struct port_info *pi = netdev_priv(to_net_dev(d)); + struct adapter *adap = pi->adapter; + unsigned int val; char *endp; ssize_t ret; - unsigned int val; - struct adapter *adap = to_net_dev(d)->priv; if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -858,8 +866,9 @@ static void schedule_chk_task(struct adapter *adap) static int offload_open(struct net_device *dev) { - struct adapter *adapter = dev->priv; - struct t3cdev *tdev = T3CDEV(dev); + struct port_info *pi = netdev_priv(dev); + struct adapter *adapter = pi->adapter; + struct t3cdev *tdev = dev2t3cdev(dev); int adap_up = adapter->open_device_map & PORT_MASK; int err = 0;
[PATCH 2.6.23 0/2] cxgb3 - Fix dev->priv usage
Jeff/Roland, I'm resubmitting the cxgb3 dev->priv fix for inclusion in 2.6.23. I also submit a follow-up patch for the iw_cxgb3 driver that fixes the previous infiniband breakage. Cheers, Divy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4, maxcpus=1 regression
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > reverting that commit makes the system boot again. I've attached the > > .config. > > Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in > particular ("fix maxcpus=N parsing") was supposed to fix that commit. ah ... indeed my tree is a few commits ahead of rc4. Checking. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4, maxcpus=1 regression
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup > (right after setting up the local APICs). I bisected it down to this > recent commit: > > | commit 61ec7567db103d537329b0db9a887db570431ff4 > | Author: Len Brown <[EMAIL PROTECTED]> > | Date: Thu Aug 16 03:34:22 2007 -0400 > | > | ACPI: boot correctly with "nosmp" or "maxcpus=0" > > reverting that commit makes the system boot again. I've attached the > .config. i suspect it's due to this: -early_param("maxcpus=", maxcpus); +__setup("maxcpus=", maxcpus); i'm quite sure maxcpus still needs to be an early-param. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4, maxcpus=1 regression
On Wed, 29 Aug 2007, Ingo Molnar wrote: > > maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup > (right after setting up the local APICs). I bisected it down to this > recent commit: > > | commit 61ec7567db103d537329b0db9a887db570431ff4 > | Author: Len Brown <[EMAIL PROTECTED]> > | Date: Thu Aug 16 03:34:22 2007 -0400 > | > | ACPI: boot correctly with "nosmp" or "maxcpus=0" > > reverting that commit makes the system boot again. I've attached the > .config. Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in particular ("fix maxcpus=N parsing") was supposed to fix that commit. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4, maxcpus=1 regression
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup > (right after setting up the local APICs). I bisected it down to this > recent commit: maxcpus=0 fails to boot as well. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
* Bill Davidsen <[EMAIL PROTECTED]> wrote: > > There is another way to show the problem visually under X > > (vesa-driver), by starting 3 gears simultaneously, which after > > laying them out side-by-side need some settling time before > > smoothing out. Without __update_curr it's absolutely smooth from > > the start. > > I posted a LOT of stuff using the glitch1 script, and finally found a > set of tuning values which make the test script run smooth. See back > posts, I don't have them here. but you have real 3D hw and DRI enabled, correct? In that case X uses up almost no CPU time and glxgears makes most of the processing. That is quite different from the above software-rendering case, where X spends most of the CPU time. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
drm: VIA invalid device IDs removal
Remove 2 invalid device ids from in-kernel drm tree. 0x1106, 0x7204 is unknown and thus is not an IGP/GPU. 0x1106, 0x3304 is K8M800 hostbridge, not an IGP/GPU. None of them are in drm git tree. --- a/drivers/char/drm/drm_pciids.h 2007-08-28 14:08:27.0 +0200 +++ b/drivers/char/drm/drm_pciids.h 2007-08-28 14:17:12.0 +0200 @@ -236,10 +236,8 @@ {0x1106, 0x3022, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x3118, PCI_ANY_ID, PCI_ANY_ID, 0, 0, VIA_PRO_GROUP_A}, \ {0x1106, 0x3122, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ - {0x1106, 0x7204, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x7205, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x3108, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ - {0x1106, 0x3304, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x3344, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x3343, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \ {0x1106, 0x3230, PCI_ANY_ID, PCI_ANY_ID, 0, 0, VIA_DX9_0}, \
Re: CFS review
Ingo Molnar wrote: * Al Boldi <[EMAIL PROTECTED]> wrote: ok. I think i might finally have found the bug causing this. Could you try the fix below, does your webserver thread-startup test work any better? It seems to help somewhat, but the problem is still visible. Even v20.3 on 2.6.22.5 didn't help. It does look related to ia-boosting, so I turned off __update_curr like Roman mentioned, which had an enormous smoothing effect, but then nice levels completely break down and lockup the system. you can turn sleeper-fairness off via: echo 28 > /proc/sys/kernel/sched_features another thing to try would be: echo 12 > /proc/sys/kernel/sched_features 14, and drop the granularity to 50. (that's the new-task penalty turned off.) Another thing to try would be to edit this: if (sysctl_sched_features & SCHED_FEAT_START_DEBIT) p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2); to: if (sysctl_sched_features & SCHED_FEAT_START_DEBIT) p->se.wait_runtime = -(sched_granularity(cfs_rq); and could you also check 20.4 on 2.6.22.5 perhaps, or very latest -git? (Peter has experienced smaller spikes with that.) Ingo -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: fix broken smt/mc optimizations with CFS
* Siddha, Suresh B <[EMAIL PROTECTED]> wrote: > On Mon, Aug 27, 2007 at 12:31:03PM -0700, Siddha, Suresh B wrote: > > Essentially I observed that nice 0 tasks still endup on two cores of same > > package, with out getting spread out to two different packages. This > > behavior > > is same with out this fix and this fix doesn't help in any way. > > Ingo, Appended patch seems to fix the issue and as far as I can test, > seems ok to me. thanks! I've queued your fix up for .23 merge. I've done a quick test and it indeed seems to work well. > This is a quick fix for .23. Peter Williams and myself plan to look at > code cleanups in this area (HT/MC optimizations) post .23 > > BTW, with this fix, do you want to retain the current FUZZ value? what value would you suggest? I was thinking about using busiest_rq->curr->load.weight instead, to always keep rotating tasks. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
Al Boldi wrote: Ingo Molnar wrote: * Al Boldi <[EMAIL PROTECTED]> wrote: The problem is that consecutive runs don't give consistent results and sometimes stalls. You may want to try that. well, there's a natural saturation point after a few hundred tasks (depending on your CPU's speed), at which point there's no idle time left. From that point on things get slower progressively (and the ability of the shell to start new ping tasks is impacted as well), but that's expected on an overloaded system, isnt it? Of course, things should get slower with higher load, but it should be consistent without stalls. To see this problem, make sure you boot into /bin/sh with the normal VGA console (ie. not fb-console). Then try each loop a few times to show different behaviour; loops like: # for ((i=0; i<; i++)); do ping 10.1 -A > /dev/null & done # for ((i=0; i<; i++)); do nice -99 ping 10.1 -A > /dev/null & done # { for ((i=0; i<; i++)); do ping 10.1 -A > /dev/null & done } > /dev/null 2>&1 Especially the last one sometimes causes a complete console lock-up, while the other two sometimes stall then surge periodically. ok. I think i might finally have found the bug causing this. Could you try the fix below, does your webserver thread-startup test work any better? It seems to help somewhat, but the problem is still visible. Even v20.3 on 2.6.22.5 didn't help. It does look related to ia-boosting, so I turned off __update_curr like Roman mentioned, which had an enormous smoothing effect, but then nice levels completely break down and lockup the system. There is another way to show the problem visually under X (vesa-driver), by starting 3 gears simultaneously, which after laying them out side-by-side need some settling time before smoothing out. Without __update_curr it's absolutely smooth from the start. I posted a LOT of stuff using the glitch1 script, and finally found a set of tuning values which make the test script run smooth. See back posts, I don't have them here. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
Daniel Phillips wrote: On Friday 24 August 2007 03:45, Theodore Tso wrote: As I said; what's wrong with just using SPI membership? It's not like it is remotely hard for kernel hackers to gain membership in SPI. And somebody else takes care of the bureaucracy for you. Given the huge overlap between SPI membership and Debian membership, and then taking a look at the craziness that takes place on various Debian mailing lists, such as but not limited to debian-legal, I'm quite convinced that this would be a baad idea. Hi Ted, Ever watched a legislative assembly at work? A bad idea perhaps, but the best that has been discovered so far. Given that there is already some charter that says KS attendees vote... isn't it best to retain that? Directives from above aside, you need specifications on how to change voting procedure before changing it, no? If those don't exist, then something vaguely similar in my country would require a referendum I think. Hasn't the KS committee / TAB board vote rigging conspiracy theory been raised yet? Given they're not running a country, it would be great fun to see the board getting corrupted and go off the rails ;) I'd vote for them because if Ted has anything to do with it, I *know* we'll be having KS in Hawaii ;) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS woes again
On Tue, Aug 28, 2007 at 09:28:43AM -0400, Trond Myklebust wrote: > Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be > called recursively. > > The following patch should be correct. Please just discard the previous > one... So far so good. This patch got one hour uptime... I'll stay with this kernel for a few days, to keep an eye on it. Thanks, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 signature.asc Description: Digital signature
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Tue, Aug 28, 2007 at 03:59:09PM -0700, Daniel Phillips wrote: > Ever watched a legislative assembly at work? A bad idea perhaps, but > the best that has been discovered so far. Sure, but a Debian mailing list where fanatics who have no job, no life, but huge amounts of free time to post literally hundreds of messages a day indulging in Debian's "last post wins" style of argumentation have far more power to influence the decision making process than those who have to work at a real job has very little in common with a legislative assembly. That's why any kind of election for the TAB should happen, IMHO, in "real space", at some conference where there is a gross filter of people being able to afford travel expenses or be paid by some company for their expenses (thus showing that someone felt that they were doing enough good work that they should be given the resources to pay for travel expenses and the conference registration fees). If that's an elitist attitude; I plead guilty --- Linux and OSS is *not* a democracy. Linus doesn't obey the whims of majority voting to decide which patches to accept or reject. The Linux kernel community is very much a meritocracy, which is why I don't believe that some kind of pure democracy such as using the SPI voting membership is the right thing for electing the TAB. Just remember, in the United States, a democracy where around 50% of Americans believe that Saddam Hussein was personally responsible for 9/11 elected George W. Bush to the US presidency. It's statistics like that which make you want to impose some kind of comptency test on who is allowed to vote. The kernel summit is one such place where we can hold such a vote, and if people thought that a BOF at some conference like Linux.conf.au or OLS would be a better place, those might be other alternatives. I'll note that most of this discussion is mostly moot, though, given that at this point we have 5 candidates for 5 slots, for positions which is really more about service than about any kind of power or benefits. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oops at sr_block_release [Re: 2.6.23-rc3-mm1]
On Tue, 28 Aug 2007 13:32:57 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: > Andrew Morton napsal(a): > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc3/2.6.23-rc3-mm1/ > > I got this during gxine initialization of ocko.tv live stream without any cd > in > cdroms: > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 005c > printing eip: f88fbe7a *pde = > Oops: [#1] SMP > Modules linked in: ath5k arc4 ecb blkcipher cryptomgr crypto_algapi > rc80211_simple mac80211 cfg80211 nls_cp437 vfat fat usb_storage tun ipv6 > floppy > parport_pc parport ohci1394 ieee1394 usbhid sr_mod ehci_hcd cdrom ff_memless > > Pid: 2809, comm: hald-addon-stor Not tainted (2.6.23-rc3-mm1 #315) > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > EIP is at sr_block_release+0xb/0x2c [sr_mod] > EAX: EBX: ECX: f88fbe6f EDX: > ESI: c21c36c0 EDI: c289a780 EBP: c3729f18 ESP: c3729f10 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process hald-addon-stor (pid: 2809, ti=c3729000 task=c1c2be40 > task.ti=c3729000) > Stack: c21c36c0 c3729f38 c018d7ad c21c36cc c1f9ff80 c21c3730 c21c36c0 >c2a6ada0 dcbb3f80 c3729f40 c018d7dc c3729f4c c018e103 0010 c3729f74 >c016bc5f c217fa80 c1f9ff80 c2a6ada0 dcbb3f80 c1cc6900 > Call Trace: > [] show_trace_log_lvl+0x1a/0x30 > [] show_stack_log_lvl+0xa5/0xca > [] show_registers+0xd0/0x1c1 > [] die+0x10a/0x24d > [] do_page_fault+0x496/0x608 > [] error_code+0x72/0x78 > [] __blkdev_put+0x125/0x14a > [] blkdev_put+0xa/0xc > [] blkdev_close+0x29/0x2c > [] __fput+0xa6/0x161 > [] fput+0x22/0x3b > [] filp_close+0x41/0x67 > [] sys_close+0x60/0x9f > [] syscall_call+0x7/0xb > === > Code: 0c 81 c3 4c 01 00 00 89 5c 24 08 89 44 24 04 c7 04 24 88 cd 8f f8 e8 99 > 84 > 82 c7 e9 04 fe ff ff 55 89 e5 56 53 8b 80 04 01 00 00 <8b> 40 5c 8b 70 3c 8d > 46 > 18 e8 cf f6 fe ff 89 c3 85 c0 75 07 89 > EIP: [] sr_block_release+0xb/0x2c [sr_mod] SS:ESP 0068:c3729f10 > Possibly due to remove-bdput-from-do_open-in-fs-block_devc.patch. That patch is "wrong" and I think the problem which it attempts to address actually lies in the cdrom code. viro was taking a look at it but appears to have recoiled in horror. I'll drop remove-bdput-from-do_open-in-fs-block_devc.patch so let's just watch out for any reoccurrence, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Tue, Aug 28, 2007 at 07:18:36PM -0700, Daniel Walker wrote: > Just out of curiosity , have you had anyone nominate a really really > large group ? Like say, anyone that has every send an email to lkml ? Nope; I suspect someone who did that would just be ignored by the program committee. We might publically mock someone who did that, just to discourage that kind of behavior, but it's wouldn't be a particularly effective denial of service attack, precisely because the program committee has discretion about how to handle that sort of thing. There have been people nominating 5-10 people in previous years, and in general the set of people that were nominated overlapped with suggestions made by others --- and that's the process working as it's supposed to. But that's not a "really, really large group". - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NBD: set uninitialized devices to size 0
Andrew Morton wrote: On Fri, 24 Aug 2007 13:06:39 -0400 Paul Clements <[EMAIL PROTECTED]> wrote: This fixes errors with utilities (such as LVM's vgscan) that try to scan all devices. Previously this would generate read errors when uninitialized nbd devices were scanned: I somewhat randomly marked both these as 2.6.24 material. If you think that was incorrect, please shout out. I have the feeling that I mentioned nbd issues several releases ago, but never got to getting more info on reproducing them. I try not to submit bugs I can't reproduce, oftem they're my fault :-( -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [RFC][PATCH 0/2 -mm] kexec based hibernation
Huang, Ying wrote: On Mon, 2007-08-27 at 09:28 +0800, Hu, Fenghua wrote: One quick question is, can it improve hiberation/wakeup time? In general, for kexec based hibernation, what increases hibernation/wakeup time: - One extra Linux boot is needed to hibernate and wakeup. What decreases hibernation/wakeup time: - Most hibernation/wakeup work is done in full functional user space program, so it is possible to do some optimization, such as parallel compression. - It does not have to reclaim pagecache before suspend? - It does not have to restore working set afterwards? (You could do this to reduce image size, of course, but it can be optional which is nice). So, I think the kexec based hibernation may be slower than original implementation in general. In this prototype implementation, the hibernation/wakeup time is much longer than original hibernation/wakeup implementation. But it has much to be optimized and I think it can approach the speed of the original implementation after optimization. Also, don't just look at the time to do a simple suspend/resume cycle, but the full cost of going from working state to working state (eg. grep a kernel tree or two!). Although the kexec details are out of my league, I really like everything about the concept :) Nice work. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Tue, 2007-08-28 at 22:18 -0400, Theodore Tso wrote: > On Mon, Aug 27, 2007 at 02:12:56PM +0200, Jes Sorensen wrote: > > Yes, as well as 12 committee members, of which 5 didn't even comply with > > their own git commit requirement last time I checked. > > Note that the git commit metric is not a "requirement", but a way of > seeding the list of people to be considered. The current selection > process is that we *start* with that list, and then accept nominations > from anyone for anyone (including self-nominations) that should be > considered that weren't automatically included by the git selection > criteria. Just out of curiosity , have you had anyone nominate a really really large group ? Like say, anyone that has every send an email to lkml ? Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Mon, Aug 27, 2007 at 02:12:56PM +0200, Jes Sorensen wrote: > Yes, as well as 12 committee members, of which 5 didn't even comply with > their own git commit requirement last time I checked. Note that the git commit metric is not a "requirement", but a way of seeding the list of people to be considered. The current selection process is that we *start* with that list, and then accept nominations from anyone for anyone (including self-nominations) that should be considered that weren't automatically included by the git selection criteria. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
"H. Peter Anvin" <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Christoph Hellwig <[EMAIL PROTECTED]> writes: >> >>> Umm, no way we're ever going to remove a syscall like this. >> >> If someone besides me cares about more then rhetoric I will be happy >> to reconsider and several years is plenty of time to find that out. >> >> I aborted the removal last time precisely because we had not done an >> adequate job of warning our users. A printk when we run a program >> that uses the binary interface and an long enough interval the warning >> makes it to the Enterprise kernels before we remove the interface >> should be sufficient. >> > > glibc uses it, and it uses it in contexts where access to the filesystem isn't > functional (e.g. in chroot.) Yes. But (a) It doesn't affect correctness what answer it gets back. (b) It should be using uname. Or are you thinking about something besides the pthreads usage? Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Understanding I/O behaviour - next try
On Tue, Aug 28, 2007 at 08:53:07AM -0700, Martin Knoblauch wrote: [...] > The basic setup is a dual x86_64 box with 8 GB of memory. The DL380 > has a HW RAID5, made from 4x72GB disks and about 100 MB write cache. > The performance of the block device with O_DIRECT is about 90 MB/sec. > > The problematic behaviour comes when we are moving large files through > the system. The file usage in this case is mostly "use once" or > streaming. As soon as the amount of file data is larger than 7.5 GB, we > see occasional unresponsiveness of the system (e.g. no more ssh > connections into the box) of more than 1 or 2 minutes (!) duration > (kernels up to 2.6.19). Load goes up, mainly due to pdflush threads and > some other poor guys being in "D" state. [...] > Just by chance I found out that doing all I/O inc sync-mode does > prevent the load from going up. Of course, I/O throughput is not > stellar (but not much worse than the non-O_DIRECT case). But the > responsiveness seem OK. Maybe a solution, as this can be controlled via > mount (would be great for O_DIRECT :-). > > In general 2.6.22 seems to bee better that 2.6.19, but this is highly > subjective :-( I am using the following setting in /proc. They seem to > provide the smoothest responsiveness: > > vm.dirty_background_ratio = 1 > vm.dirty_ratio = 1 > vm.swappiness = 1 > vm.vfs_cache_pressure = 1 You are apparently running into the sluggish kupdate-style writeback problem with large files: huge amount of dirty pages are getting accumulated and flushed to the disk all at once when dirty background ratio is reached. The current -mm tree has some fixes for it, and there are some more in my tree. Martin, I'll send you the patch if you'd like to try it out. > Another thing I saw during my tests is that when writing to NFS, the > "dirty" or "nr_dirty" numbers are always 0. Is this a conceptual thing, > or a bug? What are the nr_unstable numbers? Fengguang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
Eric W. Biederman wrote: Christoph Hellwig <[EMAIL PROTECTED]> writes: Umm, no way we're ever going to remove a syscall like this. If someone besides me cares about more then rhetoric I will be happy to reconsider and several years is plenty of time to find that out. I aborted the removal last time precisely because we had not done an adequate job of warning our users. A printk when we run a program that uses the binary interface and an long enough interval the warning makes it to the Enterprise kernels before we remove the interface should be sufficient. glibc uses it, and it uses it in contexts where access to the filesystem isn't functional (e.g. in chroot.) -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] i386, apic: fix 4 bit apicid assumption of mach-default
Andi/Andrew, Can you pick this up for your trees and if there are no issues, can you please push it to mainline before .23 gets released. We have seen a boot failure with fewer cpu sockets populated on a MP platform. Similar problem can happen on a fully populated system, if # of cpus <= 8 and any of the apic id's is > 16 thanks, suresh --- Fix get_apic_id() in mach-default, so that it uses 8 bits incase of xAPIC case and 4 bits for legacy APIC case. This fixes the i386 kernel assumption that apic id is less than 16 for xAPIC platforms with 8 cpus or less and makes the kernel boot on such platforms. Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> --- diff --git a/include/asm-i386/mach-default/mach_apicdef.h b/include/asm-i386/mach-default/mach_apicdef.h index 7bcb350..ae98413 100644 --- a/include/asm-i386/mach-default/mach_apicdef.h +++ b/include/asm-i386/mach-default/mach_apicdef.h @@ -1,11 +1,17 @@ #ifndef __ASM_MACH_APICDEF_H #define __ASM_MACH_APICDEF_H +#include + #defineAPIC_ID_MASK(0xF<<24) static inline unsigned get_apic_id(unsigned long x) { - return (((x)>>24)&0xF); + unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR)); + if (APIC_XAPIC(ver)) + return (((x)>>24)&0xFF); + else + return (((x)>>24)&0xF); } #defineGET_APIC_ID(x) get_apic_id(x) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFSv4 client OOPS on 2.6.22-rc3 - I meant 2.6.23-rc3
On Wed, 2007-08-29 at 01:41 +0200, Michal Piotrowski wrote: > Hi Harry, > > On 28/08/07, Harry Edmon <[EMAIL PROTECTED]> wrote: > > Typo in my last message - I meant 2.6.23-rc3, not 2.6.22-rc3. Here it > > is again with correction > > > > I had a kernel oops on my x86_64 dual quad-core Xeon system running > > 2.6.23-rc3. The system is an NFSv4 client to another 2.6.23-rc3 > > system. The OOPS text is attached and the config file. > > > > Is this a regression? Does 2.6.22 work fine? Yes and yes. It is due to a typo when I was working on correcting the NFSv4 open() state tracking in 2.6.23-rc1. A patch is available and I'm planning on merging it soon. Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
miss list_del(), bug ?
Shouldn't this code also do a list_del(e) ? in drivers/infiniband/core/iwcm.c: static void dealloc_work_entries(struct iwcm_id_private *cm_id_priv) { struct list_head *e, *tmp; list_for_each_safe(e, tmp, _id_priv->work_free_list) kfree(list_entry(e, struct iwcm_work, free_list)); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Lguest] [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport
>> >> Nice driver. I'm hoping we can do a virtio driver using a similar >> concept. >> >> > +#define PCI_VENDOR_ID_9P 0x5002 >> > +#define PCI_DEVICE_ID_9P 0x000D >> >> Where do these numbers come from? Can we be sure they don't conflict >with >> actual hardware? > >I stole the VENDOR_ID from kvm's hypercall driver. There are no any >guarantees that it doesn't conflict with actual hardware. As it was >discussed before, there is still no ID assigned for the virtual >devices. Currently 5002 does not registered to Qumranet nor KVM. We will do something about it pretty soon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd _page_from_freelist() : No more no page failures. (WHY????)
Mitchell Erblich wrote: Nick Piggin wrote: Nick Piggin, et al, First diffs would generate alot of noise, since I rip and insert alot of code based on whether I think the code is REALLY needed for MY TEST environment. These suggestions are basicly minimal merge suggestions between my development envir and the public Linux tree. That's OK. So long as the patch is against a well known tree, it is just less ambiguous even if it doesn't actually compile :) Now the why for this SUGGESTION/PATCH... When we're in the (min,low) watermark range, we'll wake up kswapd _before_ allocating anything, so what is better about the change to wake up kswapd after allocating? Can you perhaps come up with an example situation also to make this more clear? Answer Will GFP_ATOMIC alloc be failing at that point? If yes, then why not allow kswapd attempt to prevent this condition from occuring? The existing code reads that the first call to get_page_from_freelist() has returned no page. Now you are going to start up something that is at best going to take millisecs to start helping out. Won't it first grab some pages to do its work? So we are going to be lower in free memory right when it starts up. Right? GFP_ATOMIC will not be failing at this point (also, kswapd could probably have reclaimed several hundred or thousand pages in 1ms, but that's besides the point -- we do have correct buffering here). The watermarks go roughly like this: high -- kswapd stops reclaiming low -- kswapd is started by any allocation, nothing else happens min -- non-GFP_ATOMIC can't go below this point; enter direct reclaim min/X-- GFP_ATOMIC allocations fail below this point 0-- PF_MEMALLOC fails. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport
>> > This adds a shared memory transport for a synthetic 9p device for >> > paravirtualized file system support under KVM/QEMU. >> >> Nice driver. I'm hoping we can do a virtio driver using a similar >> concept. >> > >Yes. I'm looking at the patches from Dor now, it should be pretty >straight forward. The PCI is interesting in its own right for other >(non-virtual) projects we've been playing with > > -eric Great, we can add lots of pci bus shared functionality into the kvm_pci_bus.c --Dor - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash report 2.6.22.5
On 8/28/07, Michal Piotrowski <[EMAIL PROTECTED]> wrote: > Hi Pete, > > On 28/08/07, Pete Monroe <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Sorry there's not more to go on here. > > > > A 32-bit firewall running the kernel LVS virtual server to fan out to > > a dozen webservers ran fine for a year using 2.6.17.13, but won't > > last more than four hours or so with 2.6.22.5. Another server, > > different hardware and vendor but same purpose, also crashed with > > 2.6.22.5 after a few hours. It had previously run 2.6.20.11. Nothing > > on the screen, nothing in the logs. > > > > I'm attaching zipped dmesg (both kernel versions), > > Could you capture the bug with serial/netconsole etc.? The servers are remote, production servers and it's a PITA when they crash. But I'll see what I can do. Thanks for the pointer. -- Pete > > "Collecting kernel messages" > http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf > for more info. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -rt 1/8] introduce PICK_FUNCTION
On Wed, 2007-08-29 at 09:44 +1000, Nick Piggin wrote: > Daniel Walker wrote: > > PICK_FUNCTION() is similar to the other PICK_OP style macros, and was > > created to replace them all. I used variable argument macros to handle > > PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the > > original macros used for semaphores. The entire system is used to do a > > compile time switch between two different locking APIs. For example, > > real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks). > > > > This new macro replaces all the duplication from lock type to lock type. > > The result of this patch, and the next two, is a fairly nice simplification, > > and consolidation. Although the seqlock changes are larger than the > > originals > > I think over all the patchset is worth while. > > > > Incorporated peterz's suggestion to not require TYPE_EQUAL() to only > > use pointers. > > How come this is cc'ed to lkml? Is it something that is relevant to > the mainline kernel... or? The real time changes are usually developed on lkml , that's how it's been in the past. I personally like CC'ing lkml since real time can sometimes touch lots of different subsystems .. So it good to have a diverse set of people reviewing .. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
Christoph Hellwig <[EMAIL PROTECTED]> writes: > Umm, no way we're ever going to remove a syscall like this. If someone besides me cares about more then rhetoric I will be happy to reconsider and several years is plenty of time to find that out. I aborted the removal last time precisely because we had not done an adequate job of warning our users. A printk when we run a program that uses the binary interface and an long enough interval the warning makes it to the Enterprise kernels before we remove the interface should be sufficient. > stop this deprecration crap. Just make sure no ones adds more binary > sysctls. The sysctl_check_table function should keep out most of the problem cases and especially it should ensure we don't add any new binary sysctls by accident. However given our atrocious record at catching these kinds of problems via code review and testing and the fact that no one uses these things anyway, I don't see an argument for keeping dead code in the kernel. Over the long term the goal is to not break user space binaries. I see a better chance of achieving the goal of not breaking user space binaries if we remove interfaces that no known user space applications use, in a way a well written application can handle, then to let the user space interface code succumb to bit rot, and start returning the wrong values to user space. That is where we are at with sys_sysctl. Almost all of the binary paths have no known users and the implementations are succumbing to bit rot. The binary interface and the proc interface go through two completely separate paths so there is little to ensure those paths don't diverge over time. It is also true that the non-generic helper functions are diverging over time. Currently these things are not an issue because no one actually uses the binary interfaces. The empirical evidence seems overwhelming on this point. So just freezing us at our current set of non-broken binary sysctls does not seem sufficient to ensure we don't break user space binaries. Although it does seem to be a good start. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash report 2.6.22.5
Hi Pete, On 28/08/07, Pete Monroe <[EMAIL PROTECTED]> wrote: > Hi, > > Sorry there's not more to go on here. > > A 32-bit firewall running the kernel LVS virtual server to fan out to > a dozen webservers ran fine for a year using 2.6.17.13, but won't > last more than four hours or so with 2.6.22.5. Another server, > different hardware and vendor but same purpose, also crashed with > 2.6.22.5 after a few hours. It had previously run 2.6.20.11. Nothing > on the screen, nothing in the logs. > > I'm attaching zipped dmesg (both kernel versions), Could you capture the bug with serial/netconsole etc.? "Collecting kernel messages" http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf for more info. > .config and lspci > -v output for one of the machines, a Dell Intel dual-Xeon box. The > other machine is a dual Athlon box. Both use SCSI drives (the > attached Dell uses MPT Fusion, the other one Adaptec.) Intel ethernet > on both. > > I did enable the Slub allocator in 2.6.22.5, figuring that if it is > going to be the default in 2.6.23 that it's probably solid in .22.5. > > PLMK if any more info would be useful. > > Thanks, > Pete > > Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/28] Fall back on interrupt disable in cmpxchg8b on 80386 and 80486
Mathieu Desnoyers wrote: * Nick Piggin ([EMAIL PROTECTED]) wrote: Mathieu Desnoyers wrote: Q: What's the reason to have cmpxchg64_local on 32 bit architectures? Without that need all this would just be a few simple defines. A: cmpxchg64_local on 32 bits architectures takes unsigned long long parameters, but cmpxchg_local only takes longs. Since we have cmpxchg8b to execute a 8 byte cmpxchg atomically on pentium and +, it makes sense to provide a flavor of cmpxchg and cmpxchg_local using this instruction. Also, for 32 bits architectures lacking the 64 bits atomic cmpxchg, it makes sense _not_ to define cmpxchg64 while cmpxchg could still be available. Moreover, the fallback for cmpxchg8b on i386 for 386 and 486 is a different case than cmpxchg (which is only required for 386). Using different code makes this easier. However, cmpxchg64_local will be emulated by disabling interrupts on all architectures where it is not supported atomically. Therefore, we *could* turn cmpxchg64_local into a cmpxchg_local, but it would make the 386/486 fallbacks ugly, make its design different from cmpxchg/cmpxchg64 (which really depends on atomic operations and cannot be emulated) and require the __cmpxchg_local to be expressed as a macro rather than an inline function so the parameters would not be fixed to unsigned long long in every case. So I think cmpxchg64_local makes sense there, but I am open to suggestions. Every new thing like this (especially 64 bit operation on 32 bit architectures) adds a tiny bit more burden for maintainers. Are there any callers? If not, don't add it. It's simple to add if we do get a good reason. I am actually using it in LTTng in my timestamping code. I use it to work around CPUs with asynchronous TSCs. I need to update 64 bits values atomically on this 32 bits architecture. I plan to submit this timestamping code soon. OK fair enough. So long as there is a user (and you are sure said user is going to get upstream -- sometimes it is easier to put this patchset in with the one that is going to call it, but OTOH that can turn people off reviewing). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [07/36] Use page_cache_xxx in mm/filemap_xip.c
Christoph Hellwig wrote: On Tue, Aug 28, 2007 at 09:49:38PM +0200, J??rn Engel wrote: On Tue, 28 August 2007 12:05:58 -0700, [EMAIL PROTECTED] wrote: - index = *ppos >> PAGE_CACHE_SHIFT; - offset = *ppos & ~PAGE_CACHE_MASK; + index = page_cache_index(mapping, *ppos); + offset = page_cache_offset(mapping, *ppos); Part of me feels inclined to marge this patch now because it makes the code more readable, even if page_cache_index() is implemented as #define page_cache_index(mapping, pos) ((pos) >> PAGE_CACHE_SHIFT) I know there is little use in yet another global search'n'replace wankfest and Andrew might wash my mouth just for mentioning it. Still, hard to dislike this part of your patch. Yes, I I suggested that before. Andrew seems to somehow hate this patchset, but even if we don;'t get it in the lowercase macros are much much better then the current PAGE_CACHE_* confusion. I don't mind the change either. The open coded macros are very recognisable, but it isn't hard to have a typo and get one slightly wrong. If it goes upstream now it wouldn't have the mapping argument though, would it? Or the need to replace PAGE_CACHE_SIZE I guess. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -rt 1/8] introduce PICK_FUNCTION
Daniel Walker wrote: PICK_FUNCTION() is similar to the other PICK_OP style macros, and was created to replace them all. I used variable argument macros to handle PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the original macros used for semaphores. The entire system is used to do a compile time switch between two different locking APIs. For example, real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks). This new macro replaces all the duplication from lock type to lock type. The result of this patch, and the next two, is a fairly nice simplification, and consolidation. Although the seqlock changes are larger than the originals I think over all the patchset is worth while. Incorporated peterz's suggestion to not require TYPE_EQUAL() to only use pointers. How come this is cc'ed to lkml? Is it something that is relevant to the mainline kernel... or? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4: BAD regression
Len? Should we just revert it? That commit has been very painful. First it lost all registration of the query methods, and now this. Daniel - can we please have a before/after dmesg on your machine, preferably with ACPI debugging enabled? And for ACPI stuff, it usually does help to fill in a bugzilla entry, since the ACPI people actually do track things there... Linus On Wed, 29 Aug 2007, Daniel Ritz wrote: > > tried that one on my old toshiba tecra 8000 laptop, almost killing it. > the fan doesn't work any more...type 'make' and see the box dying. > luckily my CPU doesn't commit suicide...bisected it to that one: > > cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit > commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b > Author: Alexey Starikovskiy <[EMAIL PROTECTED]> > Date: Fri Aug 3 17:52:48 2007 -0400 > > ACPI: EC: If ECDT is not found, look up EC in DSDT. > > Some ASUS laptops access EC space from device _INI methods, but do not > provide ECDT for early EC setup. In order to make them function properly, > there is a need to find EC is DSDT before any _INI is called. > > Similar functionality was turned on by acpi_fake_ecdt=1 command line > before. Now it is on all the time. > > http://bugzilla.kernel.org/show_bug.cgi?id=8598 > > Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]> > Signed-off-by: Len Brown <[EMAIL PROTECTED]> > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFSv4 client OOPS on 2.6.22-rc3 - I meant 2.6.23-rc3
Hi Harry, On 28/08/07, Harry Edmon <[EMAIL PROTECTED]> wrote: > Typo in my last message - I meant 2.6.23-rc3, not 2.6.22-rc3. Here it > is again with correction > > I had a kernel oops on my x86_64 dual quad-core Xeon system running > 2.6.23-rc3. The system is an NFSv4 client to another 2.6.23-rc3 > system. The OOPS text is attached and the config file. > Is this a regression? Does 2.6.22 work fine? -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4: maxcpus still broken
Hi Alexey, On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > Every time I try to boot with maxcpus=1 it dies show_stat(): Is this a regression? Hugh fixed some issues on x86-64 commit 813409771731d80e6fa94199adf99f2269a4afc0 Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc4: BAD regression
tried that one on my old toshiba tecra 8000 laptop, almost killing it. the fan doesn't work any more...type 'make' and see the box dying. luckily my CPU doesn't commit suicide...bisected it to that one: cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b Author: Alexey Starikovskiy <[EMAIL PROTECTED]> Date: Fri Aug 3 17:52:48 2007 -0400 ACPI: EC: If ECDT is not found, look up EC in DSDT. Some ASUS laptops access EC space from device _INI methods, but do not provide ECDT for early EC setup. In order to make them function properly, there is a need to find EC is DSDT before any _INI is called. Similar functionality was turned on by acpi_fake_ecdt=1 command line before. Now it is on all the time. http://bugzilla.kernel.org/show_bug.cgi?id=8598 Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]> Signed-off-by: Len Brown <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] v3 of IBM power meter driver
Dave Hansen complained about the magic numbers, repetitive code, and various other minor problems with the driver code, so here's a v3 with the magic numbers migrated to the top of the file and #define'd, helper macros taking place of the bit shifting/masking activities, and the compression of the value/min/max sysfs code into parameterized functions. -- ibm_pex: Driver to export IBM PowerExecutive power meter sensors. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/hwmon/Kconfig | 12 + drivers/hwmon/Makefile |1 drivers/hwmon/ibmpex.c | 564 3 files changed, 577 insertions(+), 0 deletions(-) diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index 555f470..41ffa2e 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -275,6 +275,18 @@ config SENSORS_CORETEMP sensor inside your CPU. Supported all are all known variants of Intel Core family. +config SENSORS_IBMPEX + tristate "IBM PowerExecutive temperature/power sensors" + depends on IPMI_SI + help + If you say yes here you get support for the temperature and + power sensors in various IBM System X servers that support + PowerExecutive. So far this includes the x3550, x3650, x3655, + x3755, and certain HS20 blades. + + This driver can also be built as a module. If so, the module + will be called ibmpex. + config SENSORS_IT87 tristate "ITE IT87xx and compatibles" select HWMON_VID diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index a133981..31da6fe 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -35,6 +35,7 @@ obj-$(CONFIG_SENSORS_FSCPOS) += fscpos.o obj-$(CONFIG_SENSORS_GL518SM) += gl518sm.o obj-$(CONFIG_SENSORS_GL520SM) += gl520sm.o obj-$(CONFIG_SENSORS_HDAPS)+= hdaps.o +obj-$(CONFIG_SENSORS_IBMPEX) += ibmpex.o obj-$(CONFIG_SENSORS_IT87) += it87.o obj-$(CONFIG_SENSORS_K8TEMP) += k8temp.o obj-$(CONFIG_SENSORS_LM63) += lm63.o diff --git a/drivers/hwmon/ibmpex.c b/drivers/hwmon/ibmpex.c new file mode 100644 index 000..632f897 --- /dev/null +++ b/drivers/hwmon/ibmpex.c @@ -0,0 +1,564 @@ +/* + * A hwmon driver for the IBM PowerExecutive temperature/power sensors + * Copyright (C) 2007 IBM + * + * Author: Darrick J. Wong <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include + +#define REFRESH_INTERVAL (5 * HZ) +#define DRVNAME"ibmpex" + +#define PEX_GET_VERSION1 +#define PEX_GET_SENSOR_COUNT 2 +#define PEX_GET_SENSOR_NAME3 +#define PEX_GET_SENSOR_DATA6 + +#define PEX_NET_FUNCTION 0x3A +#define PEX_COMMAND0x3C + +static inline u16 extract_value(const char *data, int offset) +{ + u16 val = *(u16*)[offset]; + return be16_to_cpu(val); +} + +#define PEX_INTERFACE(idx) ((idx) >> 16) +#define PEX_SENSOR(idx)(((idx) >> 8) & 0xFF) +#define PEX_FUNC(idx) ((idx) & 0xFF) +#define PEX_INDEX(iface, num, fn) (((iface) << 16) | ((num) << 8) | (fn)) + +#define PEX_SENSOR_TYPE_LEN3 +static char power_sensor_sig[] = {0x70, 0x77, 0x72}; +static char temp_sensor_sig[] = {0x74, 0x65, 0x6D}; + +#define PEX_MULT_LEN 2 +static char watt_sensor_sig[] = {0x41, 0x43}; + +#define PEX_NUM_SENSOR_FUNCS 3 +static char *sensor_name_templates[] = { + "%s%d_input", + "%s%d_min_input", + "%s%d_max_input" +}; + +static void ibmpex_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data); +static void ibmpex_register_bmc(int iface, struct device *dev); +static void ibmpex_bmc_gone(int iface); + +struct ibmpex_sensor_data { + int in_use; + s16 values[PEX_NUM_SENSOR_FUNCS]; + int multiplier; + + struct sensor_device_attribute attr[PEX_NUM_SENSOR_FUNCS]; +}; + +struct ibmpex_bmc_data { + struct list_headlist; + struct class_device *class_dev; + struct device *bmc_device; + struct mutexlock; + charvalid; + unsigned long last_updated;
Re: 2.6.23-rc3 USB segfaults + urb status -32
Hi Lasse, On 25/08/07, Lasse Kärkkäinen <[EMAIL PROTECTED]> wrote: > My system is unusably unstable using this kernel. Does 2.6.22 work fine? > On last boot it > started flooding urb status -32 to kernel log at a rate of several > megabytes per second. Now it printed segfaults before the system had > finished booting and then some other errors... The full log is here: > > I couldn't find information on these bugs. If you need more debug info, > please contact me. I can also reproduce the errors without the Nvidia > kernel module, Yes, please reproduce this error without nvidia binary crap and CC to [EMAIL PROTECTED] Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/4] 2.6.23-rc3: known regressions v3
Hi Stephen, On 24/08/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > O > > Subject : New wake ups from sky2 > > References : http://lkml.org/lkml/2007/7/20/386 > > Last known good : ? > > Submitter : Thomas Meyer <[EMAIL PROTECTED]> > > Caused-By : Stephen Hemminger <[EMAIL PROTECTED]> > > commit eb35cf60e462491249166182e3e755d3d5d91a28 > > Handled-By : Stephen Hemminger <[EMAIL PROTECTED]> > > Status : unknown > > > > > > Fix posted to netdev (sky2 1.17 series), but Jeff hasn't > applied it. > commit 32c2c30085324aef9699934295281cca0161ef7e I guess Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote: > +When:September 2010 > +Option: CONFIG_SYSCTL_SYSCALL > +Why: The same information is available in a more convenient from > + /proc/sys, and none of the sysctl variables appear to be > + important performance wise. > + > + Binary sysctls are a long standing source of subtle kernel > + bugs and security issues. > + > + When I looked several months ago all I could find after > + searching several distributions were 5 user space programs and > + glibc (which falls back to /proc/sys) using this syscall. Umm, no way we're ever going to remove a syscall like this. Please stop this deprecration crap. Just make sure no ones adds more binary sysctls. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Friday 24 August 2007 03:45, Theodore Tso wrote: > > As I said; what's wrong with just using SPI membership? It's not > > like it is remotely hard for kernel hackers to gain membership in > > SPI. And somebody else takes care of the bureaucracy for you. > > Given the huge overlap between SPI membership and Debian membership, > and then taking a look at the craziness that takes place on various > Debian mailing lists, such as but not limited to debian-legal, I'm > quite convinced that this would be a baad idea. Hi Ted, Ever watched a legislative assembly at work? A bad idea perhaps, but the best that has been discovered so far. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix find_next_best_node (Re: [BUG] 2.6.23-rc3-mm1 Kernel panic - not syncing: DMA: Memory would be corrupted)
On Fri, 2007-08-24 at 15:53 +0900, Yasunori Goto wrote: > I found find_next_best_node() was wrong. > I confirmed boot up by the following patch. > Mel-san, Kamalesh-san, could you try this? FYI: This patch also allows the alloc-instantiate-race testcase in libhugetlbfs to pass again :) -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] netlink: use container_of instead
From: Denis Cheng <[EMAIL PROTECTED]> Date: Wed, 29 Aug 2007 03:12:04 +0800 > this could make future redesign of struct netlink_sock easier. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Seems reasonable, patch applied, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.
After adding checking to register_sysctl_table and finding a whole new set of bugs. Missed by countless code reviews and testers I have finally lost patience with the binary sysctl interface. The binary sysctl interface has been sort of deprecated for years and finding a user space program that uses the syscall is more difficult then finding a needle in a haystack. Problems continue to crop up, with the in kernel implementation. So since supporting something that no one uses is silly, deprecate sys_sysctl with a sufficient grace period and notice that the handful of user space applications that care can be fixed or replaced. The /proc/sys sysctl interface that people use will continue to be supported indefinitely. This patch moves the tested warning about sysctls from the path where sys_sysctl to a separate path called from both implementations of sys_sysctl, and it adds a proper entry into Documentation/feature-removal-schedule. Allowing us to revisit this in a couple years time and actually kill sys_sysctl. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- Documentation/feature-removal-schedule.txt | 35 kernel/sysctl.c| 62 +-- 2 files changed, 74 insertions(+), 23 deletions(-) diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index a43d287..4d3097e 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -290,3 +290,38 @@ Why: All mthca hardware also supports MSI-X, which provides Who: Roland Dreier <[EMAIL PROTECTED]> --- + +What: sys_sysctl +When: September 2010 +Option: CONFIG_SYSCTL_SYSCALL +Why: The same information is available in a more convenient from + /proc/sys, and none of the sysctl variables appear to be + important performance wise. + + Binary sysctls are a long standing source of subtle kernel + bugs and security issues. + + When I looked several months ago all I could find after + searching several distributions were 5 user space programs and + glibc (which falls back to /proc/sys) using this syscall. + + The man page for sysctl(2) documents it as unusable for user + space programs. + + sysctl(2) is not generally ABI compatible to a 32bit user + space application on a 64bit and a 32bit kernel. + + For the last several months the policy has been no new binary + sysctls and no one has put forward an argument to use them. + + Binary sysctls issues seem to keep happening appearing so + properly deprecating them (with a warning to user space) and a + 2 year grace warning period will mean eventually we can kill + them and end the pain. + + In the mean time individual binary sysctls can be dealt with + in a piecewise fashion. + +Who: Eric Biederman <[EMAIL PROTECTED]> + +--- diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 6d01497..792e6fe 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1275,6 +1275,33 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev) return NULL; } +static int deprecated_sysctl_warning(struct __sysctl_args *args) +{ + static int msg_count; + int name[CTL_MAXNAME]; + int i; + + /* Read in the sysctl name for better debug message logging */ + for (i = 0; i < args->nlen; i++) + if (get_user(name[i], args->name + i)) + return -EFAULT; + + /* Ignore accesses to kernel.version */ + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == KERN_VERSION)) + return 0; + + if (msg_count < 5) { + msg_count++; + printk(KERN_INFO + "warning: process `%s' used the deprecated sysctl " + "system call with ", current->comm); + for (i = 0; i < args->nlen; i++) + printk("%d.", name[i]); + printk("\n"); + } + return 0; +} + #ifdef CONFIG_SYSCTL_SYSCALL int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) @@ -1310,10 +1337,15 @@ asmlinkage long sys_sysctl(struct __sysctl_args __user *args) if (copy_from_user(, args, sizeof(tmp))) return -EFAULT; + error = deprecated_sysctl_warning(); + if (error) + goto out; + lock_kernel(); error = do_sysctl(tmp.name, tmp.nlen, tmp.oldval, tmp.oldlenp, tmp.newval, tmp.newlen); unlock_kernel(); +out: return error; } #endif /* CONFIG_SYSCTL_SYSCALL */ @@ -2503,35 +2535,19 @@ int sysctl_ms_jiffies(struct ctl_table *table, int __user *name, int nlen, asmlinkage long sys_sysctl(struct __sysctl_args __user *args) { -
Re: [patch] sched: fix broken smt/mc optimizations with CFS
On Mon, Aug 27, 2007 at 12:31:03PM -0700, Siddha, Suresh B wrote: > Essentially I observed that nice 0 tasks still endup on two cores of same > package, with out getting spread out to two different packages. This behavior > is same with out this fix and this fix doesn't help in any way. Ingo, Appended patch seems to fix the issue and as far as I can test, seems ok to me. This is a quick fix for .23. Peter Williams and myself plan to look at code cleanups in this area (HT/MC optimizations) post .23 BTW, with this fix, do you want to retain the current FUZZ value? thanks, suresh -- Try to fix MC/HT scheduler optimization breakage again, with out breaking the FUZZ logic. First fix the check if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task) with this if (*imbalance < busiest_load_per_task) As the current check is always false for nice 0 tasks (as SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0 tasks). With the above change, imbalance was getting reset to 0 in the corner case condition, making the FUZZ logic fail. Fix it by not corrupting the imbalance and change the imbalance, only when it finds that the HT/MC optimization is needed. Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> --- diff --git a/kernel/sched.c b/kernel/sched.c index 9fe473a..03e5e8d 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2511,7 +2511,7 @@ group_next: * a think about bumping its value to force at least one task to be * moved */ - if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task) { + if (*imbalance < busiest_load_per_task) { unsigned long tmp, pwr_now, pwr_move; unsigned int imbn; @@ -2563,10 +2563,8 @@ small_imbalance: pwr_move /= SCHED_LOAD_SCALE; /* Move if we gain throughput */ - if (pwr_move <= pwr_now) - goto out_balanced; - - *imbalance = busiest_load_per_task; + if (pwr_move > pwr_now) + *imbalance = busiest_load_per_task; } return busiest; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpufreq affects traffic control rates
On Tue, 28 Aug 2007 09:51:55 +0200 DervishD <[EMAIL PROTECTED]> wrote: > Hi all :) > > I noticed lately that my traffic control rates were being very slow, > about 40% less than expected, and finally spotted the problem: cpufreq. > > Looks like HTB puts buckets according to the requested rate but > assuming that the CPU is running at its default clock or something like > that. > > Any way of fixing this without disabling cpufreq? > > I'm using kernel 2.6.20.14, Athlon64 1000/1800MHz, HZ=1000 and a > combination of HTB/SFQ in my traffic control. > > Thanks a lot in advance :) > > Raúl Núñez de Arenas Coronado > Is the problem configuration of network scheduler clock? In 2.6.20 and earlier, you could use CPU cycle counter (later kernels only use time of day). So try switching to jiffies or gettimeofday. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections
On Mon, 27 Aug 2007, Jes Sorensen wrote: > Right now it looks like we have a list of sane candidates up, which I > certainly would be willing to vote for. However, it would be a shame > that the credibility of the election is lost because of sticking to an > undemocratic voting procedure. A procedure which it in fact was stated > when the board was created last year, would be replaced this year. Democracy is an ideal that is not attainable. A representative democracy is usually the best you can get. So you need people that have some competence to contribute to the endeavor. And AFAICT we approximate that reasonably. Many of the people that were not subject to the git commit quota are experienced hands that are valuable because of their experience with Linux and the Summit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 3/3] PM: Improve handling of ACPI system state indicator (rev. 3)
On Tuesday, 28 August 2007 21:57, Moore, Robert wrote: > Since these changes appear to affect the ACPICA core in a fairly big > way, I would like to see a short, concise description of each change and > why it is necessary. All right. I'll describe the changes made by the current version of the patches, but please note that if it's safe to run the AML interpreter with IRQs disabled, it's better to do some of them in a different way. 1. Remove the execution of _GTS from acpi_enter_sleep_state_prep() acpi_enter_sleep_state_prep() is called before disabling the nonboot CPUs and _GTS should be executed after that, according to the spec. 2. Introduce acpi_enter_sleep_state_prep_late() that will execute _GTS Necessary because of 1. 3. Split acpi_leave_sleep_state() into two functions: acpi_leave_sleep_state_prep() and acpi_leave_sleep_state(). acpi_leave_sleep_state_prep() contains the code that should be executed before enabling the nonboot CPUs, most importantly the execution of _BFS, and acpi_leave_sleep_state() contains the remaining code (the enabling of GPEs, the execution of _WAK and the enabling of power buttons) 4. Change the code ordering in acpi_leave_sleep_state_prep() (introduced in 3.) so that _SST is executed after _BFS According to the spec, _BFS should be the first ACPI method executed after leaving a sleep state 5. Introduce acpi_set_sleep_state_indicator() that will execute _SST for given ACPI sleep state Needed so that we can set the state indicator independently of the other lower-level operations. 6. Remove the execution of _SST from acpi_leave_sleep_state() No longer needed, because we can use acpi_set_sleep_state_indicator() to set the state indicator appropriately from higher level routines. The other changes affect only drivers/acpi/sleep/main.c and the files in kernel/power . Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt 2/8] spinlocks/rwlocks: use PICK_FUNCTION()
Reaplace old PICK_OP style macros with the new PICK_FUNCTION macro. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- include/linux/sched.h| 13 - include/linux/spinlock.h | 345 ++- kernel/rtmutex.c |2 lib/dec_and_lock.c |2 4 files changed, 111 insertions(+), 251 deletions(-) Index: linux-2.6.22/include/linux/sched.h === --- linux-2.6.22.orig/include/linux/sched.h +++ linux-2.6.22/include/linux/sched.h @@ -2022,17 +2022,8 @@ extern int __cond_resched_raw_spinlock(r extern int __cond_resched_spinlock(spinlock_t *spinlock); #define cond_resched_lock(lock) \ -({ \ - int __ret; \ - \ - if (TYPE_EQUAL((lock), raw_spinlock_t)) \ - __ret = __cond_resched_raw_spinlock((raw_spinlock_t *)lock);\ - else if (TYPE_EQUAL(lock, spinlock_t)) \ - __ret = __cond_resched_spinlock((spinlock_t *)lock); \ - else __ret = __bad_spinlock_type(); \ - \ - __ret; \ -}) + PICK_SPIN_OP_RET(__cond_resched_raw_spinlock, __cond_resched_spinlock,\ +lock) extern int cond_resched_softirq(void); extern int cond_resched_softirq_context(void); Index: linux-2.6.22/include/linux/spinlock.h === --- linux-2.6.22.orig/include/linux/spinlock.h +++ linux-2.6.22/include/linux/spinlock.h @@ -91,6 +91,7 @@ #include #include #include +#include #include @@ -162,7 +163,7 @@ extern void __lockfunc rt_spin_unlock_wa extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags); extern int __lockfunc rt_spin_trylock(spinlock_t *lock); -extern int _atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock); +extern int _atomic_dec_and_spin_lock(spinlock_t *lock, atomic_t *atomic); /* * lockdep-less calls, for derived types like rwlock: @@ -243,54 +244,6 @@ do { \ # define _spin_trylock_irqsave(l,f) TSNBCONRT(l) #endif -#undef TYPE_EQUAL -#define TYPE_EQUAL(lock, type) \ - __builtin_types_compatible_p(typeof(lock), type *) - -#define PICK_OP(op, lock) \ -do { \ - if (TYPE_EQUAL((lock), raw_spinlock_t)) \ - __spin##op((raw_spinlock_t *)(lock)); \ - else if (TYPE_EQUAL(lock, spinlock_t)) \ - _spin##op((spinlock_t *)(lock));\ - else __bad_spinlock_type(); \ -} while (0) - -#define PICK_OP_RET(op, lock...) \ -({ \ - unsigned long __ret;\ - \ - if (TYPE_EQUAL((lock), raw_spinlock_t)) \ - __ret = __spin##op((raw_spinlock_t *)(lock)); \ - else if (TYPE_EQUAL(lock, spinlock_t)) \ - __ret = _spin##op((spinlock_t *)(lock));\ - else __ret = __bad_spinlock_type(); \ - \ - __ret; \ -}) - -#define PICK_OP2(op, lock, flags) \ -do { \ - if (TYPE_EQUAL((lock), raw_spinlock_t)) \ - __spin##op((raw_spinlock_t *)(lock), flags);\ - else if (TYPE_EQUAL(lock, spinlock_t)) \ - _spin##op((spinlock_t *)(lock), flags); \ - else __bad_spinlock_type(); \ -} while (0) - -#define PICK_OP2_RET(op, lock, flags) \ -({ \ - unsigned long __ret;\ - \ - if (TYPE_EQUAL((lock), raw_spinlock_t)) \ - __ret = __spin##op((raw_spinlock_t *)(lock), flags);\ - else if (TYPE_EQUAL(lock, spinlock_t)) \ - __ret =
[PATCH -rt 1/8] introduce PICK_FUNCTION
PICK_FUNCTION() is similar to the other PICK_OP style macros, and was created to replace them all. I used variable argument macros to handle PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the original macros used for semaphores. The entire system is used to do a compile time switch between two different locking APIs. For example, real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks). This new macro replaces all the duplication from lock type to lock type. The result of this patch, and the next two, is a fairly nice simplification, and consolidation. Although the seqlock changes are larger than the originals I think over all the patchset is worth while. Incorporated peterz's suggestion to not require TYPE_EQUAL() to only use pointers. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- include/linux/pickop.h | 36 + include/linux/rt_lock.h | 129 +++- 2 files changed, 77 insertions(+), 88 deletions(-) Index: linux-2.6.22/include/linux/pickop.h === --- /dev/null +++ linux-2.6.22/include/linux/pickop.h @@ -0,0 +1,36 @@ +#ifndef _LINUX_PICKOP_H +#define _LINUX_PICKOP_H + +#undef TYPE_EQUAL +#define TYPE_EQUAL(var, type) \ + __builtin_types_compatible_p(typeof(var), type *) + +#undef PICK_TYPE_EQUAL +#define PICK_TYPE_EQUAL(var, type) \ + __builtin_types_compatible_p(typeof(var), type) + +extern int __bad_func_type(void); + +#define PICK_FUNCTION(type1, type2, func1, func2, arg0, ...) \ +do { \ + if (PICK_TYPE_EQUAL((arg0), type1)) \ + func1((type1)(arg0), ##__VA_ARGS__);\ + else if (PICK_TYPE_EQUAL((arg0), type2))\ + func2((type2)(arg0), ##__VA_ARGS__);\ + else __bad_func_type(); \ +} while (0) + +#define PICK_FUNCTION_RET(type1, type2, func1, func2, arg0, ...) \ +({ \ + unsigned long __ret;\ + \ + if (PICK_TYPE_EQUAL((arg0), type1)) \ + __ret = func1((type1)(arg0), ##__VA_ARGS__);\ + else if (PICK_TYPE_EQUAL((arg0), type2))\ + __ret = func2((type2)(arg0), ##__VA_ARGS__);\ + else __ret = __bad_func_type(); \ + \ + __ret; \ +}) + +#endif /* _LINUX_PICKOP_H */ Index: linux-2.6.22/include/linux/rt_lock.h === --- linux-2.6.22.orig/include/linux/rt_lock.h +++ linux-2.6.22/include/linux/rt_lock.h @@ -156,76 +156,40 @@ extern void fastcall rt_up(struct semaph extern int __bad_func_type(void); -#undef TYPE_EQUAL -#define TYPE_EQUAL(var, type) \ - __builtin_types_compatible_p(typeof(var), type *) - -#define PICK_FUNC_1ARG(type1, type2, func1, func2, arg) \ -do { \ - if (TYPE_EQUAL((arg), type1)) \ - func1((type1 *)(arg)); \ - else if (TYPE_EQUAL((arg), type2)) \ - func2((type2 *)(arg)); \ - else __bad_func_type(); \ -} while (0) +#include -#define PICK_FUNC_1ARG_RET(type1, type2, func1, func2, arg)\ -({ \ - unsigned long __ret;\ - \ - if (TYPE_EQUAL((arg), type1)) \ - __ret = func1((type1 *)(arg)); \ - else if (TYPE_EQUAL((arg), type2)) \ - __ret = func2((type2 *)(arg)); \ - else __ret = __bad_func_type(); \ - \ - __ret; \ -}) - -#define PICK_FUNC_2ARG(type1, type2, func1, func2, arg0, arg1) \ -do { \ - if (TYPE_EQUAL((arg0), type1)) \ - func1((type1 *)(arg0), arg1);
[PATCH -rt 7/8] latency hist: add resetting for all timing options
I dropped parts of the prior reset method, and added a file called "reset" into the /proc/latency_hist/ timing directories. It allows any of the timing options to get their histograms reset. I also fixed a couple of oddities in the code. Instead of creating a file for all NR_CPUS , I just used num_possible_cpus() . I also drop a string which only hold "CPU" and just inserted it where it was used. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- include/linux/latency_hist.h |1 kernel/latency_hist.c| 119 --- kernel/latency_trace.c | 13 3 files changed, 80 insertions(+), 53 deletions(-) Index: linux-2.6.22/include/linux/latency_hist.h === --- linux-2.6.22.orig/include/linux/latency_hist.h +++ linux-2.6.22/include/linux/latency_hist.h @@ -23,7 +23,6 @@ enum { #ifdef CONFIG_LATENCY_HIST extern void latency_hist(int latency_type, int cpu, unsigned long latency); -extern void latency_hist_reset(void); # define latency_hist_flag 1 #else # define latency_hist(a,b,c) do { (void)(cpu); } while (0) Index: linux-2.6.22/kernel/latency_hist.c === --- linux-2.6.22.orig/kernel/latency_hist.c +++ linux-2.6.22/kernel/latency_hist.c @@ -16,6 +16,7 @@ #include #include #include +#include typedef struct hist_data_struct { atomic_t hist_mode; /* 0 log, 1 don't log */ @@ -31,8 +32,6 @@ typedef struct hist_data_struct { static struct proc_dir_entry * latency_hist_root = NULL; static char * latency_hist_proc_dir_root = "latency_hist"; -static char * percpu_proc_name = "CPU"; - #ifdef CONFIG_INTERRUPT_OFF_HIST static DEFINE_PER_CPU(hist_data_t, interrupt_off_hist); static char * interrupt_off_hist_proc_dir = "interrupt_off_latency"; @@ -56,7 +55,7 @@ static inline u64 u64_div(u64 x, u64 y) return x; } -void latency_hist(int latency_type, int cpu, unsigned long latency) +void notrace latency_hist(int latency_type, int cpu, unsigned long latency) { hist_data_t * my_hist; @@ -205,6 +204,69 @@ static struct file_operations latency_hi .release = seq_release, }; +static void hist_reset(hist_data_t *hist) +{ + atomic_dec(>hist_mode); + + memset(hist->hist_array, 0, sizeof(hist->hist_array)); + hist->beyond_hist_bound_samples = 0UL; + hist->min_lat = 0xUL; + hist->max_lat = 0UL; + hist->total_samples = 0UL; + hist->accumulate_lat = 0UL; + hist->avg_lat = 0UL; + + atomic_inc(>hist_mode); +} + +ssize_t latency_hist_reset(struct file *file, const char __user *a, size_t size, loff_t *off) +{ + int cpu; + hist_data_t *hist; + struct proc_dir_entry *entry_ptr = PDE(file->f_dentry->d_inode); + int latency_type = (int)entry_ptr->data; + + switch (latency_type) { + +#ifdef CONFIG_WAKEUP_LATENCY_HIST + case WAKEUP_LATENCY: + for_each_online_cpu(cpu) { + hist = _cpu(wakeup_latency_hist, cpu); + hist_reset(hist); + } + break; +#endif + +#ifdef CONFIG_PREEMPT_OFF_HIST + case PREEMPT_LATENCY: + for_each_online_cpu(cpu) { + hist = _cpu(preempt_off_hist, cpu); + hist_reset(hist); + } + break; +#endif + +#ifdef CONFIG_INTERRUPT_OFF_HIST + case INTERRUPT_LATENCY: + for_each_online_cpu(cpu) { + hist = _cpu(interrupt_off_hist, cpu); + hist_reset(hist); + } + break; +#endif + } + + return size; +} + +static struct file_operations latency_hist_reset_seq_fops = { + .write = latency_hist_reset, +}; + +static struct proc_dir_entry *interrupt_off_reset; +static struct proc_dir_entry *preempt_off_reset; +static struct proc_dir_entry *wakeup_latency_reset; + static __init int latency_hist_init(void) { struct proc_dir_entry *tmp_parent_proc_dir; @@ -214,11 +276,10 @@ static __init int latency_hist_init(void latency_hist_root = proc_mkdir(latency_hist_proc_dir_root, NULL); - #ifdef CONFIG_INTERRUPT_OFF_HIST tmp_parent_proc_dir = proc_mkdir(interrupt_off_hist_proc_dir, latency_hist_root); - for (i = 0; i < NR_CPUS; i++) { - len = sprintf(procname, "%s%d", percpu_proc_name, i); + for (i = 0; i < num_possible_cpus(); i++) { + len = sprintf(procname, "CPU%d", i); procname[len] = '\0'; entry[INTERRUPT_LATENCY][i] = create_proc_entry(procname, 0, tmp_parent_proc_dir); @@ -228,12 +289,15 @@ static __init int latency_hist_init(void
[PATCH -rt 3/8] seqlocks: use PICK_FUNCTION
Replace the old PICK_OP style macros with PICK_FUNCTION. Although, seqlocks has some alien code, which I also replaced as can be seen from the line count below. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- include/linux/pickop.h |4 include/linux/seqlock.h | 235 +++- 2 files changed, 135 insertions(+), 104 deletions(-) Index: linux-2.6.22/include/linux/pickop.h === --- linux-2.6.22.orig/include/linux/pickop.h +++ linux-2.6.22/include/linux/pickop.h @@ -1,10 +1,6 @@ #ifndef _LINUX_PICKOP_H #define _LINUX_PICKOP_H -#undef TYPE_EQUAL -#define TYPE_EQUAL(var, type) \ - __builtin_types_compatible_p(typeof(var), type *) - #undef PICK_TYPE_EQUAL #define PICK_TYPE_EQUAL(var, type) \ __builtin_types_compatible_p(typeof(var), type) Index: linux-2.6.22/include/linux/seqlock.h === --- linux-2.6.22.orig/include/linux/seqlock.h +++ linux-2.6.22/include/linux/seqlock.h @@ -90,6 +90,12 @@ static inline void __write_seqlock(seqlo smp_wmb(); } +static __always_inline unsigned long __write_seqlock_irqsave(seqlock_t *sl) +{ + __write_seqlock(sl); + return 0; +} + static inline void __write_sequnlock(seqlock_t *sl) { smp_wmb(); @@ -97,6 +103,8 @@ static inline void __write_sequnlock(seq spin_unlock(>lock); } +#define __write_sequnlock_irqrestore(sl, flags)__write_sequnlock(sl) + static inline int __write_tryseqlock(seqlock_t *sl) { int ret = spin_trylock(>lock); @@ -149,6 +157,28 @@ static __always_inline void __write_seql smp_wmb(); } +static __always_inline unsigned long +__write_seqlock_irqsave_raw(raw_seqlock_t *sl) +{ + unsigned long flags; + + local_irq_save(flags); + __write_seqlock_raw(sl); + return flags; +} + +static __always_inline void __write_seqlock_irq_raw(raw_seqlock_t *sl) +{ + local_irq_disable(); + __write_seqlock_raw(sl); +} + +static __always_inline void __write_seqlock_bh_raw(raw_seqlock_t *sl) +{ + local_bh_disable(); + __write_seqlock_raw(sl); +} + static __always_inline void __write_sequnlock_raw(raw_seqlock_t *sl) { smp_wmb(); @@ -156,6 +186,27 @@ static __always_inline void __write_sequ spin_unlock(>lock); } +static __always_inline void +__write_sequnlock_irqrestore_raw(raw_seqlock_t *sl, unsigned long flags) +{ + __write_sequnlock_raw(sl); + local_irq_restore(flags); + preempt_check_resched(); +} + +static __always_inline void __write_sequnlock_irq_raw(raw_seqlock_t *sl) +{ + __write_sequnlock_raw(sl); + local_irq_enable(); + preempt_check_resched(); +} + +static __always_inline void __write_sequnlock_bh_raw(raw_seqlock_t *sl) +{ + __write_sequnlock_raw(sl); + local_bh_enable(); +} + static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl) { int ret = spin_trylock(>lock); @@ -182,60 +233,93 @@ static __always_inline int __read_seqret extern int __bad_seqlock_type(void); -#define PICK_SEQOP(op, lock) \ +/* + * PICK_SEQ_OP() is a small redirector to allow less typing of the lock + * types raw_seqlock_t, seqlock_t, at the front of the PICK_FUNCTION + * macro. + */ +#define PICK_SEQ_OP(...) \ + PICK_FUNCTION(raw_seqlock_t *, seqlock_t *, ##__VA_ARGS__) +#define PICK_SEQ_OP_RET(...) \ + PICK_FUNCTION_RET(raw_seqlock_t *, seqlock_t *, ##__VA_ARGS__) + +#define write_seqlock(sl) PICK_SEQ_OP(__write_seqlock_raw, __write_seqlock, sl) + +#define write_sequnlock(sl)\ + PICK_SEQ_OP(__write_sequnlock_raw, __write_sequnlock, sl) + +#define write_tryseqlock(sl) \ + PICK_SEQ_OP_RET(__write_tryseqlock_raw, __write_tryseqlock, sl) + +#define read_seqbegin(sl) \ + PICK_SEQ_OP_RET(__read_seqbegin_raw, __read_seqbegin, sl) + +#define read_seqretry(sl, iv) \ + PICK_SEQ_OP_RET(__read_seqretry_raw, __read_seqretry, sl, iv) + +#define write_seqlock_irqsave(lock, flags) \ do { \ - if (TYPE_EQUAL((lock), raw_seqlock_t)) \ - op##_raw((raw_seqlock_t *)(lock)); \ - else if (TYPE_EQUAL((lock), seqlock_t)) \ - op((seqlock_t *)(lock));\ - else __bad_seqlock_type(); \ + flags = PICK_SEQ_OP_RET(__write_seqlock_irqsave_raw,\ + __write_seqlock_irqsave, lock); \ } while (0) -#define PICK_SEQOP_RET(op, lock) \ -({ \ - unsigned long __ret;\ - \ - if
[PATCH -rt 8/8] stop critical timing in idle.
without this the idle routine still gets traced.. This is done already for ACPI idle , but it should also be done for other idle routines. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- arch/i386/kernel/process.c |9 + arch/x86_64/kernel/process.c | 10 ++ 2 files changed, 19 insertions(+) Index: linux-2.6.22/arch/i386/kernel/process.c === --- linux-2.6.22.orig/arch/i386/kernel/process.c +++ linux-2.6.22/arch/i386/kernel/process.c @@ -197,8 +197,17 @@ void cpu_idle(void) if (cpu_is_offline(cpu)) play_dead(); + /* +* We have irqs disabled here, so stop latency tracing +* at this point and restart it after we return: +*/ + stop_critical_timing(); + __get_cpu_var(irq_stat).idle_timestamp = jiffies; idle(); + + touch_critical_timing(); + } local_irq_disable(); trace_preempt_exit_idle(); Index: linux-2.6.22/arch/x86_64/kernel/process.c === --- linux-2.6.22.orig/arch/x86_64/kernel/process.c +++ linux-2.6.22/arch/x86_64/kernel/process.c @@ -223,8 +223,18 @@ void cpu_idle (void) * Otherwise, idle callbacks can misfire. */ local_irq_disable(); + + /* +* We have irqs disabled here, so stop latency tracing +* at this point and restart it after we return: +*/ + stop_critical_timing(); + enter_idle(); idle(); + + touch_critical_timing(); + /* In many cases the interrupt that ended idle has already called exit_idle. But some idle loops can be woken up without interrupt. */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt 6/8] preempt_max_latency in all modes
This enables the /proc/preempt_max_latency facility for timing modes, even if event tracing is disabled. Wakeup latency was the only one that had this feature in the past. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- kernel/sysctl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.22/kernel/sysctl.c === --- linux-2.6.22.orig/kernel/sysctl.c +++ linux-2.6.22/kernel/sysctl.c @@ -392,7 +392,7 @@ static ctl_table kern_table[] = { .proc_handler = _dointvec, }, #endif -#if defined(CONFIG_WAKEUP_TIMING) || defined(CONFIG_EVENT_TRACE) +#if defined(CONFIG_CRITICAL_TIMING) { .ctl_name = CTL_UNNUMBERED, .procname = "preempt_max_latency", -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt 5/8] latency tracing: use now() consistently
Just get_monotonic_cycles() switched to now() .. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- kernel/latency_trace.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) Index: linux-2.6.22/kernel/latency_trace.c === --- linux-2.6.22.orig/kernel/latency_trace.c +++ linux-2.6.22/kernel/latency_trace.c @@ -1751,7 +1751,7 @@ check_critical_timing(int cpu, struct cp * as long as possible: */ T0 = tr->preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(); delta = T1-T0; local_save_flags(flags); @@ -1765,7 +1765,7 @@ check_critical_timing(int cpu, struct cp * might change it (it can only get larger so the latency * is fair to be reported): */ - T2 = get_monotonic_cycles(); + T2 = now(); delta = T2-T0; @@ -1815,7 +1815,7 @@ check_critical_timing(int cpu, struct cp printk(" => ended at timestamp %lu: ", t1); print_symbol("<%s>\n", tr->critical_end); dump_stack(); - t1 = cycles_to_usecs(get_monotonic_cycles()); + t1 = cycles_to_usecs(now()); printk(" => dump-end timestamp %lu\n\n", t1); #endif @@ -1825,7 +1825,7 @@ check_critical_timing(int cpu, struct cp out: tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(); tr->early_warning = 0; reset_trace_idx(cpu, tr); _trace_cmdline(cpu, tr); @@ -1874,7 +1874,7 @@ __start_critical_timing(unsigned long ei atomic_inc(>disabled); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(); tr->critical_start = eip; reset_trace_idx(cpu, tr); tr->latency_type = latency_type; @@ -2196,7 +2196,7 @@ check_wakeup_timing(struct cpu_trace *tr goto out; T0 = tr->preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(); /* * Any wraparound or time warp and we are out: */ @@ -2314,7 +2314,7 @@ void __trace_start_sched_wakeup(struct t // if (!atomic_read(>disabled)) { atomic_inc(>disabled); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(); tr->latency_type = WAKEUP_LATENCY; tr->critical_start = CALLER_ADDR0; _trace_cmdline(raw_smp_processor_id(), tr); @@ -2426,7 +2426,7 @@ long user_trace_start(void) atomic_inc(>disabled); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(); tr->critical_start = CALLER_ADDR0; _trace_cmdline(cpu, tr); atomic_dec(>disabled); @@ -2486,7 +2486,7 @@ long user_trace_stop(void) unsigned long long tmp0; T0 = tr->preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(); tmp0 = preempt_max_latency; if (T1 < T0) T0 = T1; -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt 4/8] fork: desched_thread comment rework.
Lines are too long.. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> --- kernel/fork.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6.22/kernel/fork.c === --- linux-2.6.22.orig/kernel/fork.c +++ linux-2.6.22/kernel/fork.c @@ -1787,8 +1787,10 @@ static int desched_thread(void * __bind_ continue; schedule(); - /* This must be called from time to time on ia64, and is a no-op on other archs. -* Used to be in cpu_idle(), but with the new -rt semantics it can't stay there. + /* +* This must be called from time to time on ia64, and is a +* no-op on other archs. Used to be in cpu_idle(), but with +* the new -rt semantics it can't stay there. */ check_pgt_cache(); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 2/3] PM: More fine grained ACPI handling during suspend and hibernation (rev. 3)
On Tuesday, 28 August 2007 21:48, Len Brown wrote: > On Monday 27 August 2007 17:51, Rafael J. Wysocki wrote: > > According to the ACPI specification (eg. ACPI 2.0c, sec. 7.3.1, 7.3.3, > > ACPI 3.0b, sec. 7.3.1, 7.3.3) the _GTS and _BFS global control methods > > should > > be executed, respectively, right before entering a sleep state (S1-S4) and > > right > > after leaving it, but we don't follow this reqirement. Namely, in our > > implementation the nonboot CPUs are disabled after executing _GTS and > > enabled > > before executing _BFS, which doesn't seem to be correct. > > I've never encountered a BIOS that actually implements _GTS or _BFS, > so I expect that changing how they are invoked may be somewhat academic. It is for now, but once we have a system that implements them, we'd most probably need to change the current code, so I think it's better to consider that in advance. > > [In fact, the ACPI > > specification requires that no physical I/O and interrupt servicing be > > performed > > after the sleep state has been left and before _BFS is executed as well as > > after > > executing _GTS and before the sleep state is entered, but we can't follow > > this > > requirement literally, > > > since our AML interpreter needs to run with interrupts > > enabled and we need to carry out some operations with interrupts disabled > > before > > entering the sleep state and after leaving it.] > > This is sort of a myth. > > The real requirement is that the ACPI interpreter must be able to call > kmalloc(). > It does this today via acpi_os_allocate(), which does this: > > kmalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL); > > No, we don't actually run the interpreter during device interrupts, > but we need to be able to run it with interrupts off for boot, > suspend, and resume. At present, during suspend and resume we always call the AML interpreter with interrupts enabled. Frankly, I'd like _BFS and _GTS to be executed with interrupts disabled, just as the specification tells us to do. If you think that's safe, I'll change the patch to work this way. > So how did boot work before this hack was added? > kmalloc() does a might_sleep(), but deep down in > cond_sleep, there is a handy little check for > if (system_state == SYSTEM_RUNNING) > to disable the run-time oops. > > I suggested that since it works during boot, and resume is in many > ways similar to boot, we should just re-use system_state for early resume. > But at the time, akpm told me not to use system_state, and so we have the > hack above. > > I don't recall his reasoning -- it might be something that should > be re-visited. I don't like disabling the may_sleep() check all the time, > I'd rather just disable it during the critical boot/suspend/resume states. > > > Moreover, acpi_enable() called > > after restoring the system memory state from a hibernation image should > > really > > be executed before enabling the nonboot CPUs, since functional ACPI may be > > needed for that. All of this means that we need to handle ACPI in a more > > fine > > grained manner during suspend and hibernation. > > I don't follow the requirement to boot an ACPI-enabled resume image > from a non-ACPI-enabled boot kernel. Certainly this isn't a scenario > described by the ACPI spec, which transitions between G1(S4) and G0(S0) > without > going through an ACPI-disabled state. That actually depends on which version of the ACPI specification you consider. In ACPI 3.0 (and later) there's section 15 "Waking and Sleeping" that describes, among other things, the supposed system start sequence (in 15.3.3). It clearly states that we're supposed to check if ACPI is enabled (and enable if not), only _after_ the hibernation image has been loaded. After that, in turn, we should execute _BFS and subsequently _WAK, so my interpretation is that we should not execute any ACPI methods before that point. Anyway, however, if the user passes acpi=off to the boot kernel, ACPI may not be enabled until the image kernel gets control. Thus, it should always check if ACPI is enabled (and enable it, if need be) before doing anything ACPI-related and that should happen before the nonboot CPUs are enabled. Preferrably, with interrupts off, as that should be done before we attempt to execute _BFS. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage
Roland Dreier wrote: > I take that back. Rejected -- it breaks infiniband build. To be more precise: drivers/infiniband/hw/cxgb3/cxio_hal.c: In function 'cxio_rdev_open': drivers/infiniband/hw/cxgb3/cxio_hal.c:919: error: implicit declaration of function 'T3CDEV' it seems the problem is that T3CDEV() has been deleted and been replaced with the dev2t3cdev() inline function. However a simple replacement s/T3CDEV/dev2t3cdev/ in drivers/infiniband/hw/cxgb3 doesn't work because the function has moved from t3cdev.h to adapter.h; and moving the function back to t3cdev.h doesn't work because it depends on more structure definitions now. And at that point I gave up... Sorry about the compilation issue and the delay to reply. I'll post a follow up for the iw_cxgb3 driver later this evening. I plan to move the inlined dev2t3cdev() from adapter.h to an exported dev2t3cdev() in cxgb3_offload.c. Divy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport
On 8/28/07, Arnd Bergmann <[EMAIL PROTECTED]> wrote: > On Tuesday 28 August 2007, Eric Van Hensbergen wrote: > > > This adds a shared memory transport for a synthetic 9p device for > > paravirtualized file system support under KVM/QEMU. > > Nice driver. I'm hoping we can do a virtio driver using a similar > concept. > Yes. I'm looking at the patches from Dor now, it should be pretty straight forward. The PCI is interesting in its own right for other (non-virtual) projects we've been playing with -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/4] Linux Kernel Markers - Documentation
* Christoph Hellwig ([EMAIL PROTECTED]) wrote: > On Mon, Aug 20, 2007 at 04:27:07PM -0400, Mathieu Desnoyers wrote: > > Here is some documentation explaining what is/how to use the Linux > > Kernel Markers. > > While porting my code from an older markers version I noticed the > marker callbacks have grown a void *private argument. Add it to > the documentation aswell. > > > Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> > Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> Thanks! > Index: linux-2.6/Documentation/marker.txt > === > --- linux-2.6.orig/Documentation/marker.txt 2007-08-28 22:50:37.0 > +0200 > +++ linux-2.6/Documentation/marker.txt2007-08-28 22:51:07.0 > +0200 > @@ -115,7 +115,7 @@ struct probe_data { > }; > > void probe_subsystem_event(const struct __mark_marker *mdata, > - const char *format, ...) > + void *private, const char *format, ...) > { > va_list ap; > /* Declare args */ -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage
> I take that back. Rejected -- it breaks infiniband build. To be more precise: drivers/infiniband/hw/cxgb3/cxio_hal.c: In function 'cxio_rdev_open': drivers/infiniband/hw/cxgb3/cxio_hal.c:919: error: implicit declaration of function 'T3CDEV' it seems the problem is that T3CDEV() has been deleted and been replaced with the dev2t3cdev() inline function. However a simple replacement s/T3CDEV/dev2t3cdev/ in drivers/infiniband/hw/cxgb3 doesn't work because the function has moved from t3cdev.h to adapter.h; and moving the function back to t3cdev.h doesn't work because it depends on more structure definitions now. And at that point I gave up... - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/1] Block device throttling [Re: Distributed storage.]
On Tuesday 28 August 2007 10:54, Evgeniy Polyakov wrote: > On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL PROTECTED]) > wrote: > > > We do not care about one cpu being able to increase its counter > > > higher than the limit, such inaccuracy (maximum bios in flight > > > thus can be more than limit, difference is equal to the number of > > > CPUs - 1) is a price for removing atomic operation. I thought I > > > pointed it in the original description, but might forget, that if > > > it will be an issue, that atomic operations can be introduced > > > there. Any uber-precise measurements in the case when we are > > > close to the edge will not give us any benefit at all, since were > > > are already in the grey area. > > > > This is not just inaccurate, it is suicide. Keep leaking throttle > > counts and eventually all of them will be gone. No more IO > > on that block device! > > First, because number of increased and decreased operations are the > same, so it will dance around limit in both directions. No. Please go and read it the description of the race again. A count gets irretrievably lost because the write operation of the first decrement is overwritten by the second. Data gets lost. Atomic operations exist to prevent that sort of thing. You either need to use them or have a deep understanding of SMP read and write ordering in order to preserve data integrity by some equivalent algorithm. > Let's solve problems in order of their appearence. If bio structure > will be allowed to grow, then the whole patches can be done better. How about like the patch below. This throttles any block driver by implementing a throttle metric method so that each block driver can keep track of its own resource consumption in units of its choosing. As an (important) example, it implements a simple metric for device mapper devices. Other block devices will work as before, because they do not define any metric. Short, sweet and untested, which is why I have not posted it until now. This patch originally kept its accounting info in backing_dev_info, however that structure seems to be in some and it is just a part of struct queue anyway, so I lifted the throttle accounting up into struct queue. We should be able to report on the efficacy of this patch in terms of deadlock prevention pretty soon. --- 2.6.22.clean/block/ll_rw_blk.c 2007-07-08 16:32:17.0 -0700 +++ 2.6.22/block/ll_rw_blk.c2007-08-24 12:07:16.0 -0700 @@ -3237,6 +3237,15 @@ end_io: */ void generic_make_request(struct bio *bio) { + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + + if (q && q->metric) { + int need = bio->bi_reserved = q->metric(bio); + bio->queue = q; + wait_event_interruptible(q->throttle_wait, atomic_read(>available) >= need); + atomic_sub(>available, need); + } + if (current->bio_tail) { /* make_request is active */ *(current->bio_tail) = bio; --- 2.6.22.clean/drivers/md/dm.c2007-07-08 16:32:17.0 -0700 +++ 2.6.22/drivers/md/dm.c 2007-08-24 12:14:23.0 -0700 @@ -880,6 +880,11 @@ static int dm_any_congested(void *conges return r; } +static unsigned dm_metric(struct bio *bio) +{ + return bio->bi_vcnt; +} + /*- * An IDR is used to keep track of allocated minor numbers. *---*/ @@ -997,6 +1002,10 @@ static struct mapped_device *alloc_dev(i goto bad1_free_minor; md->queue->queuedata = md; + md->queue->metric = dm_metric; + atomic_set(>queue->available, md->queue->capacity = 1000); + init_waitqueue_head(>queue->throttle_wait); + md->queue->backing_dev_info.congested_fn = dm_any_congested; md->queue->backing_dev_info.congested_data = md; blk_queue_make_request(md->queue, dm_request); --- 2.6.22.clean/fs/bio.c 2007-07-08 16:32:17.0 -0700 +++ 2.6.22/fs/bio.c 2007-08-24 12:10:41.0 -0700 @@ -1025,7 +1025,12 @@ void bio_endio(struct bio *bio, unsigned bytes_done = bio->bi_size; } - bio->bi_size -= bytes_done; + if (!(bio->bi_size -= bytes_done) && bio->bi_reserved) { + struct request_queue *q = bio->queue; + atomic_add(>available, bio->bi_reserved); + bio->bi_reserved = 0; /* just in case */ + wake_up(>throttle_wait); + } bio->bi_sector += (bytes_done >> 9); if (bio->bi_end_io) --- 2.6.22.clean/include/linux/bio.h2007-07-08 16:32:17.0 -0700 +++ 2.6.22/include/linux/bio.h 2007-08-24 11:53:51.0 -0700 @@ -109,6 +109,9 @@ struct bio { bio_end_io_t*bi_end_io; atomic_tbi_cnt; /* pin count */ + struct request_queue
Re: [PATCH 1/2] sysctl: Properly register the irda binary sysctl numbers.
[EMAIL PROTECTED] writes: > On Sat, 25 Aug 2007 11:59:53 MDT, Eric W. Biederman said: > >> It looks like you don't have CONFIG_SYSCTL_SYSCALL defined, and it >> appears utsname_syscall and ipcdata_syscall both become NULL pointers >> if they aren't needed. So the complaint is a false positive. > > Yep. Nothing I actually use needs SYSCTL_SYSCALL, so I turned it off to > see what breaks... Other then glibc (which uses it to see if we are on a SMP system, and has a fallback to /proc/sys) I only found 5 other applications binaries when I was looking hard. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/4] Linux Kernel Markers - Documentation
On Mon, Aug 20, 2007 at 04:27:07PM -0400, Mathieu Desnoyers wrote: > Here is some documentation explaining what is/how to use the Linux > Kernel Markers. While porting my code from an older markers version I noticed the marker callbacks have grown a void *private argument. Add it to the documentation aswell. Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> Index: linux-2.6/Documentation/marker.txt === --- linux-2.6.orig/Documentation/marker.txt 2007-08-28 22:50:37.0 +0200 +++ linux-2.6/Documentation/marker.txt 2007-08-28 22:51:07.0 +0200 @@ -115,7 +115,7 @@ struct probe_data { }; void probe_subsystem_event(const struct __mark_marker *mdata, - const char *format, ...) + void *private, const char *format, ...) { va_list ap; /* Declare args */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] drm: introduce drm_zalloc
On Tue, 28 Aug 2007, Christoph Hellwig wrote: On Mon, Aug 27, 2007 at 10:57:50PM +0200, [EMAIL PROTECTED] wrote: Hello, As there are many places in drm code where drm_alloc + memset is used this patch series introduces drm_zalloc and also makes use of drm_calloc where needed. Most of these patches save some bytes so the benefit is a few kB saved (gcc 4.1.2) with patch applied. Also some small (style, etc.) things are fixed. This patch series does the conversion drm tree-wide. All patches were compile tested. Please just convert it to plain kzalloc/kcalloc and kill these utterly useless wrappers instead. The wrappers aren't useless the drm alloc/free passes in a memory space for debugging purposes so we can track memory abuse when developing, but drm_zalloc shouldjust alias to drm_calloc really.. Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd _page_from_freelist() : No more no page failures. (WHY????)
Nick Piggin wrote: > > [EMAIL PROTECTED] wrote: > > [EMAIL PROTECTED] > > Sent: Friday, August 24, 2007 3:11 PM > > Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd & > > get_page_from_freelist() : No more no page failures. > > > > Mailer added a HTML subpart and chopped the earlier email :^( > > Hi Mitchell, > > Is it possible to send suggestions in the form of a unified diff, even > if you haven't even compiled it (just add a note to let people know). > > Secondly, we already have a (supposedly working) system of asynch > reclaim, with buffering and hysteresis. I don't exactly understand > what problem you think it has that would be solved by rechecking > watermarks after allocating a page. > > When we're in the (min,low) watermark range, we'll wake up kswapd > _before_ allocating anything, so what is better about the change to > wake up kswapd after allocating? Can you perhaps come up with an > example situation also to make this more clear? > > Overhead of wakeup_kswapd isn't too much of a problem: if we _should_ > be waking it up when we currently aren't, then we should be calling > it. However the extra checking in the allocator fastpath is something > we want to avoid if possible, because this can be a really hot path. > > Thanks, > Nick > > -- > SUSE Labs, Novell Inc. > - Nick Piggin, et al, First diffs would generate alot of noise, since I rip and insert alot of code based on whether I think the code is REALLY needed for MY TEST environment. These suggestions are basicly minimal merge suggestions between my development envir and the public Linux tree. Now the why for this SUGGESTION/PATCH... > When we're in the (min,low) watermark range, we'll wake up kswapd > _before_ allocating anything, so what is better about the change to > wake up kswapd after allocating? Can you perhaps come up with an > example situation also to make this more clear? Answer Will GFP_ATOMIC alloc be failing at that point? If yes, then why not allow kswapd attempt to prevent this condition from occuring? The existing code reads that the first call to get_page_from_freelist() has returned no page. Now you are going to start up something that is at best going to take millisecs to start helping out. Won't it first grab some pages to do its work? So we are going to be lower in free memory right when it starts up. Right? So, before the change, with high memory consumption/pressure, various GFP_xxx allocations would fail or take an excessive amount of time due to the simple fact of low memory and/or Slub/slab consumption and/or first failure of get_page_from_freelist() when in a low free memory condition. Once the above condition occurs the perception is that the current mainline Linux code then on demand increases its effort to find some memory. However, while this is happening the system is in a low memory bind and various performance parameters are being effected and some allocations are sleeping or being delayed or outright failing. What I could see is that CURR suggestions allow a new class of GFP_xxx allocations to succeed while in low memory, try again philosophy, wake-up kswapd , etc, are all AFTER the fact while something is WAITING for the memory. This wait is in effect a SYNCHRONOUS wait for memory. Assuming that kswapd is really what is mostly needed. Execute it BEFORE (JUST IN TIME) to PREVENT low memory since I/O needs pages and GFP_ATOMIC allocs fails and other GFP allocs sleping and The SUGGESTION is to take the fraction of microsec longer in the fast path to see if it is needed to be started up and to ATTEMPT to prevent the SLOW-PATH and low/min memory from occuring. The 2x low memory is to allow some scalability and to allow it ENOUGH time to do what it needs to do, since I expect a minimum number of millisecs before it can move us away from low free memory. As the amount of memory increases in a system this probably could be decreased somewhat to maybe 1.25x. IF the above is good then the issue is how to optimize the heck out of the check. Mitchell Erblich - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS review
On Mon, 27 Aug 2007 22:05:37 PDT, Linus Torvalds said: > > > On Tue, 28 Aug 2007, Al Boldi wrote: > > > > No need for framebuffer. All you need is X using the X.org vesa-driver. > > Then start gears like this: > > > > # gears & gears & gears & > > > > Then lay them out side by side to see the periodic stallings for ~10sec. > > I don't think this is a good test. > > Why? > > If you're not using direct rendering, what you have is the X server doing > all the rendering, which in turn means that what you are testing is quite > possibly not so much about the *kernel* scheduling, but about *X-server* > scheduling! I wonder - can people who are doing this as a test please specify whether they're using an older X that has the libX11 or the newer libxcb code? That may have a similar impact as well. (libxcb is pretty new - it landed in Fedora Rawhide just about a month ago, after Fedora 7 shipped. Not sure what other distros have it now...) pgpI8maTCY4aR.pgp Description: PGP signature
Re: [PATCH] Immediate Values - Powerpc Optimization Fix
On Tue, Aug 28, 2007 at 04:40:06PM -0400, Mathieu Desnoyers wrote: > Immediate Values Powerpc Optimization Fix > > Fix a bad call to flush_icache_range(). The second parameter is the end > address > of the range, not the length. > > Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> > CC: Christoph Hellwig <[EMAIL PROTECTED]> If've just verified that this works for me, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 12:38:07 -0700 > It seems that the NIC would also have to look into a TCP stream (and > handle out of order segments etc) to find message boundaries for this > to be equivalent to what an RDMA NIC does. It would work for data that accumulates in-order, give or take a small window, just like LRO does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport
On 8/28/07, Arnd Bergmann <[EMAIL PROTECTED]> wrote: > On Tuesday 28 August 2007, Eric Van Hensbergen wrote: > > > This adds a shared memory transport for a synthetic 9p device for > > paravirtualized file system support under KVM/QEMU. > > Nice driver. I'm hoping we can do a virtio driver using a similar > concept. > > > +#define PCI_VENDOR_ID_9P 0x5002 > > +#define PCI_DEVICE_ID_9P 0x000D > > Where do these numbers come from? Can we be sure they don't conflict with > actual hardware? I stole the VENDOR_ID from kvm's hypercall driver. There are no any guarantees that it doesn't conflict with actual hardware. As it was discussed before, there is still no ID assigned for the virtual devices. > > +struct p9pci_trans { > > + struct pci_dev *pdev; > > + void __iomem*ioaddr; > > + void __iomem*tx; > > + void __iomem*rx; > > + int irq; > > + int pos; > > + int len; > > + wait_queue_head_t wait; > > +}; > > I would expect the data structure to contain an embedded struct p9_trans, > which is how most drivers work nowadays. > > > +static struct p9pci_trans *p9pci_trans; /* single channel for now */ > > As a result, it should be easier to get rid of this global. My feeling is > that it really should not be here. > > > +static irqreturn_t p9pci_interrupt(int irq, void *dev) > > +{ > > + p9pci_trans = dev; > > This can simply use a local variable. > > > + p9pci_trans->len = le32_to_cpu(readl(p9pci_trans->rx)); > > readl implies le32_to_cpu. Doing it twice on a PCI device is broken > on big-endian hardware. > > > + P9_DPRINTK(P9_DEBUG_TRANS, "%p len %d\n", p9pci_trans->pdev, > > + p9pci_trans->len); > > + iowrite32(0, p9pci_trans->ioaddr + 4); > > Also, you should not mix iowriteXX/ioreadXX and writeX/readX calls in one > driver. Since you use pci_iomap, iowriteXX/ioreadXX are the correct functions. > > > + wake_up_interruptible(_trans->wait); > > + return IRQ_HANDLED; > > +} > > + > > +static int p9pci_read(struct p9_trans *trans, void *v, int len) > > +{ > > + struct p9pci_trans *ts; > > + > > + if (!trans || trans->status == Disconnected || !trans->priv) > > + return -EREMOTEIO; > > + > > + ts = trans->priv; > > + > > + P9_DPRINTK(P9_DEBUG_TRANS, "trans %p rx %p tx %p buf %p len %d\n", > > + trans, ts->rx, ts->tx, v, > > len); > > + if (len > ts->len) > > + len = ts->len; > > + > > + if (len) { > > + memcpy_fromio(v, ts->rx, len); > > + ts->len = 0; > > + /* let the host knows the message is consumed */ > > + writel(0, ts->rx); > > + iowrite32(0, p9pci_trans->ioaddr + 4); > > + P9_DPRINTK(P9_DEBUG_TRANS, "zero rxlen %d txlen %d\n", > > + readl(ts->rx), readl(ts->tx)); > > + } > > + > > + return len; > > +} > > I would expect memcpy_fromio and memcpy_toio to be relatively inefficient > compared to virtual DMA, depending on the hypervisor. Do you have plans > to change that, or did you have specific reasons to do the memcpy here? No specific reasons. We wanted to start with simple and easy transport and make things work before we start optimizing it. There are many areas where the transport can be improved, using virtual DMA sounds like a good suggestion. > > > + P9_DPRINTK(P9_DEBUG_TRANS, "trans %p rx %p tx %p buf %p len %d\n", > > + trans, ts->rx, ts->tx, v, > > len); > > + P9_DPRINTK(P9_DEBUG_TRANS, "rxlen %d\n", readl(ts->rx)); > > + if (readb(ts->tx) != 0) > > + return 0; > > + > > + P9_DPRINTK(P9_DEBUG_TRANS, "tx addr %p io addr %p\n", ts->tx, > > + ts->ioaddr); > > All these P9_DPRINTK statements somewhat limit readability. I would suggest > you kill them as soon as the driver is considered stable. > > > +static int __devinit p9pci_probe(struct pci_dev *pdev, > > + const struct pci_device_id *ent) > > +{ > > + int err; > > + u8 pci_rev; > > + > > + if (p9pci_trans) > > + return -1; > > probe should return -EBUSY or similar, not -1. > > > + pci_read_config_byte(pdev, PCI_REVISION_ID, _rev); > > + > > + if (pdev->vendor == PCI_VENDOR_ID_9P && > > + pdev->device == PCI_DEVICE_ID_9P) > > + printk(KERN_INFO "pci dev %s (id %04x:%04x rev %02x) is a > > 9P\n", > > +pci_name(pdev), pdev->vendor, pdev->device, pci_rev); > > You wouldn't be here for a different vendor/device code, so the check is > bogus. > > > + P9_DPRINTK(P9_DEBUG_TRANS, "%p\n", pdev); > > + p9pci_trans = kzalloc(sizeof(*p9pci_trans), GFP_KERNEL); > > + p9pci_trans->irq = -1; > > Use
Re: [PATCH] Add documentation to some preprocessor directives in init/*.c.
On Tue, Aug 28, 2007 at 01:56:24AM -0400, Robert P. J. Day wrote: > Add some documentation to potentially confusing preprocessor > directives in some source files in the init/ directory to show their > proper association and nesting. > --- a/init/calibrate.c > +++ b/init/calibrate.c > @@ -101,7 +101,7 @@ static unsigned long __devinit > calibrate_delay_direct(void) > "estimate for loops_per_jiffy.\nProbably due to long platform > interrupts. Consider using \"lpj=\" boot option.\n"); > return 0; > } > -#else > +#else /* !ARCH_HAS_READ_CURRENT_TIMER */ > static unsigned long __devinit calibrate_delay_direct(void) {return 0;} > #endif I'm sorry, but this is not useful. You're adding comments and compiler ignores comments. So, one can't rely on such comments being accurate and will go to start of section for recheck anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Immediate Values - Powerpc Optimization Fix
Immediate Values Powerpc Optimization Fix Fix a bad call to flush_icache_range(). The second parameter is the end address of the range, not the length. Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> CC: Christoph Hellwig <[EMAIL PROTECTED]> --- arch/powerpc/kernel/immediate.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6-lttng/arch/powerpc/kernel/immediate.c === --- linux-2.6-lttng.orig/arch/powerpc/kernel/immediate.c2007-08-28 16:36:10.0 -0400 +++ linux-2.6-lttng/arch/powerpc/kernel/immediate.c 2007-08-28 16:36:40.0 -0400 @@ -67,7 +67,7 @@ int arch_immediate_update(const struct _ memcpy((void*)immediate->immediate, (void*)immediate->var, immediate->size); flush_icache_range((unsigned long)immediate->immediate, - immediate->size); + (unsigned long)immediate->immediate + immediate->size); return 0; } @@ -99,5 +99,5 @@ void __init arch_immediate_update_early( memcpy((void*)immediate->immediate, (void*)immediate->var, immediate->size); flush_icache_range((unsigned long)immediate->immediate, - immediate->size); + (unsigned long)immediate->immediate + immediate->size); } -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog=2 regression in 2.6.21
Here's a simpler patch that fixes the boot hang .. We have to call off the IPI looping regardless of the check_nmi_watchdog outcome.. Signed-off-by: Daniel Walker <[EMAIL PROTECTED]> Index: linux-2.6.22/arch/i386/kernel/nmi.c === --- linux-2.6.22.orig/arch/i386/kernel/nmi.c2007-08-15 00:51:12.0 + +++ linux-2.6.22/arch/i386/kernel/nmi.c 2007-08-28 20:27:56.0 + @@ -122,12 +122,12 @@ static int __init check_nmi_watchdog(voi atomic_dec(_active); } } + endflag = 1; if (!atomic_read(_active)) { kfree(prev_nmi_count); atomic_set(_active, -1); return -1; } - endflag = 1; printk("OK.\n"); /* now that we know it works we can reduce NMI frequency to - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/