date:20070828

Re: [kvm-devel] [PATCH 3/3] KVM paravirt-ops implementation

2007-08-28 Thread Anthony Liguori

On Wed, 2007-08-29 at 04:31 +1000, Rusty Russell wrote:
> On Mon, 2007-08-27 at 10:16 -0500, Anthony Liguori wrote:
> > @@ -569,6 +570,7 @@ asmlinkage void __init start_kernel(void)
> > }
> > sort_main_extable();
> > trap_init();
> > +   kvm_guest_init();
> > rcu_init();
> > init_IRQ();
> > pidhash_init();
> 
> Hi Anthony,
> 
>   This placement seems arbitrary.  Why not earlier from setup_arch, or as
> a normal initcall?

The placement is important if we wish to have a paravirt_ops hook for
the interrupt controller.  This is the latest possible spot we can do
it.  A comment is probably appropriate here.

Regards,

Anthony Liguori

> Rusty.
> 
> 
> 
> -
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> ___
> kvm-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS woes again

2007-08-28 Thread Bret Towe

On 8/28/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > > On 8/27/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang 
> > > > > > regression'
> > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > > >
> > > > > Yes, it certainly does -- all the symptoms match!
> > > >
> > > > Could you and Bret please check if the attached patch fixes the hang?
> > >
> > > no good for me still hangs after ~30minutes
> >
> > I just booted into the new kernel
> > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> > in 10-15 minutes.
> >
> > Process traces available at 
> > http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> >
> > Regards,
> > florin
>
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
>
> The following patch should be correct. Please just discard the previous
> one...
>
> Trond
>

uptime of 3 hours and keyboard is still working fine
I'll hopefully get to test this on the mini tomorrow for at least 3 hours also

>
> -- Forwarded message --
> From: Trond Myklebust <[EMAIL PROTECTED]>
> To:
> Date: Mon, 27 Aug 2007 09:14:56 -0400
> Subject: No Subject
> Doh! We can't use cancel_delayed_work_sync because we may have been called
> from an unmount that was being performed by nfs_automount_task.
>
> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> ---
>
>  fs/nfs/namespace.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index aea76d0..acfc56f 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct 
> *work)
>  void nfs_release_automount_timer(void)
>  {
> if (list_empty(_automount_list))
> -   cancel_delayed_work_sync(_automount_task);
> +   cancel_delayed_work(_automount_task);
>  }
>
>  /*
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kvm-devel] [PATCH 2/3] Refactor hypercall infrastructure

2007-08-28 Thread Anthony Liguori

On Wed, 2007-08-29 at 04:12 +1000, Rusty Russell wrote:
> On Mon, 2007-08-27 at 10:16 -0500, Anthony Liguori wrote:
> > This patch refactors the current hypercall infrastructure to better support 
> > live
> > migration and SMP.  It eliminates the hypercall page by trapping the UD
> > exception that would occur if you used the wrong hypercall instruction for 
> > the
> > underlying architecture and replacing it with the right one lazily.
> 
> It also reduces the number of hypercall args, which you don't mention
> here.

Oh yes, sorry.

> > +   er = emulate_instruction(>vcpu, kvm_run, 0, 0);
> > +
> > +   /* we should only succeed here in the case of hypercalls which
> > +  cannot generate an MMIO event.  MMIO means that the emulator
> > +  is mistakenly allowing an instruction that should generate
> > +  a UD fault so it's a bug. */
> > +   BUG_ON(er == EMULATE_DO_MMIO);
> 
> This seems... unwise.  Firstly we know our emulator is incomplete.
> Secondly an SMP guest can exploit this to crash the host.

This code is gone in v2.

> (Code is in two places).
> 
> > +#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"

Good point.

> A nice big comment would be nice here, I think.  Note that this is big
> enough for both "int $0x1f" and "sysenter", so I'm happy.

I need to add a comment somewhere mentioning that if you patch with
something less than 3 bytes, then you should pad with nop but the
hypervisor must treat the whole instruction (including the padding) as
atomic (that is, regardless of hypercall size, eip += 3) or you run the
risk of breakage during migration.

Regards,

Anthony Liguori

> Cheers,
> Rusty.
> 
> 
> -
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> ___
> kvm-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Send quota messages via netlink

2007-08-28 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

> On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:
>
>>   Hello,
>> 
>> I'm sending rediffed patch implementing sending of quota messages via netlink
>> interface (some rationale in patch description). I've already posted it to
>> LKML some time ago and there were no objections, so I guess it's fine to put
>> it to -mm. Andrew, would you be so kind? Thanks.
>>   Userspace deamon reading the messages from the kernel and sending them to
>> dbus and/or user console is also written (it's part of quota-tools). The
>> only remaining problem is there are a few changes needed to libnl needed for
>> the userspace daemon. They were basically acked by the maintainer but it
>> seems he has not merged the patches yet. So this will take a bit more time.
>> 
>
> So it's a new kernel->userspace interface.
>
> But we have no description of the interface :(
>
>> +/* Send warning to userspace about user which exceeded quota */
>> +static void send_warning(const struct dquot *dquot, const char warntype)
>> +{
>> +static unsigned long seq;
>> +struct sk_buff *skb;
>> +void *msg_head;
>> +int ret;
>> +
>> +skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS);
>> +if (!skb) {
>> +printk(KERN_ERR
>> +  "VFS: Not enough memory to send quota warning.\n");
>> +return;
>> +}
>> + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0,
> QUOTA_NL_C_WARNING);
>> +if (!msg_head) {
>> +printk(KERN_ERR
>> +  "VFS: Cannot store netlink header in quota warning.\n");
>> +goto err_out;
>> +}
>> +ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot->dq_type);
>> +if (ret)
>> +goto attr_err_out;
>> +ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot->dq_id);
>> +if (ret)
>> +goto attr_err_out;
>> +ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype);
>> +if (ret)
>> +goto attr_err_out;
>> +ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR,
>> +MAJOR(dquot->dq_sb->s_dev));
>> +if (ret)
>> +goto attr_err_out;
>> +ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR,
>> +MINOR(dquot->dq_sb->s_dev));
>> +if (ret)
>> +goto attr_err_out;
>> +ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current->user->uid);
>> +if (ret)
>> +goto attr_err_out;
>> +genlmsg_end(skb, msg_head);
>> +
>> +ret = genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS);
>> +if (ret < 0 && ret != -ESRCH)
>> +printk(KERN_ERR
>> +"VFS: Failed to send notification message: %d\n", ret);
>> +return;
>> +attr_err_out:
>> +printk(KERN_ERR "VFS: Failed to compose quota message: %d\n", ret);
>> +err_out:
>> +kfree_skb(skb);
>> +}
>> +#endif
>
> This is it.  Normally netlink payloads are represented as a struct.  How
> come this one is built-by-hand?

No netlink fields (unless I'm confused) are represented as a struct,
not the entire netlink payload.

> It doesn't appear to be versioned.  Should it be?

Well.  If it is using netlink properly each field should have a tag.
So it should not need to be versioned, because each field is strictly
controlled.

> Does it have (or need) reserved-set-to-zero space for expansion?  Again,
> hard to tell..

Not if netlink is used properly.  Just another nested tag.

> I guess it's OK to send a major and minor out of the kernel like this. 
> What's it for?  To represent a filesytem?  I wonder if there's a more
> modern and useful way of describing the fs.  Path to mountpoint or
> something?

Or perhaps the string the fs was mounted with.

> I suspect the namespace virtualisation guys would be interested in a new
> interface which is sending current->user->uid up to userspace.  uids are
> per-namespace now.  What are the implications?  (cc's added)

That we definitely would be.  Although the user namespaces is rather
strongly incomplete at the moment.

> Is it worth adding a comment explaining why GFP_NOFS is used here?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23 0/2] cxgb3 - Fix dev->priv usage

2007-08-28 Thread Roland Dreier

Looks OK to me but I would just roll up the second patch into the
first patch and let Jeff merge it as one commit.  There's no point in
creating an intermediate tree that doesn't build -- it just breaks git
bisect for no useful purpose.

Also as a side note, when trying to test this I got the message

could not load TP SRAM: unable to load t3a_protocol_sram-1.0.44.bin

and you guys seem to only have t3b protocol sram images on your web
site.  Could you send me the t3a file (or swap out my T3A boards for
T3B boards ;)?

Thanks,
  Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

> On Tue, 28 Aug 2007 16:40:15 -0600 [EMAIL PROTECTED] (Eric W. Biederman)
> wrote:
>
>> +static int deprecated_sysctl_warning(struct __sysctl_args *args)
>> +{
>> +static int msg_count;
>> +int name[CTL_MAXNAME];
>> +int i;
>> +
>> +/* Read in the sysctl name for better debug message logging */
>> +for (i = 0; i < args->nlen; i++)
>> +if (get_user(name[i], args->name + i))
>> +return -EFAULT;
>> +
>> +/* Ignore accesses to kernel.version */
>> + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == 
>> KERN_VERSION))
>> +return 0;
>
> Do we want to do all the above if msg_count>=5?

Well.  It won't really change order of the algorithm because we have
to read the data in any way.  So an earlier short circuit exit
would speed things up by a little bit, but it really shouldn't
matter either way.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4: maxcpus still broken

2007-08-28 Thread Alexey Dobriyan

On Wed, Aug 29, 2007 at 06:03:34AM +0100, Hugh Dickins wrote:
> On Wed, 29 Aug 2007, Alexey Dobriyan wrote:
> > On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote:
> > > On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> > > > Every time I try to boot with maxcpus=1 it dies show_stat():
> > > 
> > > Is this a regression?
> > 
> > yep
> 
> A regression since when, I wonder?

Anything before "ACPI: boot correctly with "nosmp" or "maxcpus=0"" is
fine.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4: maxcpus still broken

2007-08-28 Thread Hugh Dickins

On Wed, 29 Aug 2007, Alexey Dobriyan wrote:
> On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote:
> > On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> > > Every time I try to boot with maxcpus=1 it dies show_stat():
> > 
> > Is this a regression?
> 
> yep

A regression since when, I wonder?  Please do NOT waste any time
bisecting, but I'd be interested to know which release or -rc you
previously found it worked on.

When I gave the code a quick look, it appeared to be something
which has long been wrong; but I didn't investigate whether per-cpu
allocation has changed recently.  My _suspicion_, no more than that,
is that it might be a regression to you because you're now forced
to have CONFIG_HOTPLUG_CPU=y where you didn't need it before.

Anyway, it doesn't matter too much what it's a regression since:
it's a bug that needs fixing whatever, and should be simple.
My x86_64 was running other tests yesterday which I didn't want
to interrupt, but I'll take a look later on today.

> 
> > Hugh fixed some issues on x86-64 commit 
> > 813409771731d80e6fa94199adf99f2269a4afc0
> 
> This is 2.6.23-rc4, which has this fix, yes.
> 
> And I have second box with exactly same behaviour: x86_64 E6400, it also has 
> ACPI=n
> Turning on ACPI doesn't make it any better, though.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Ingo Molnar


* Al Boldi <[EMAIL PROTECTED]> wrote:

> I have narrowed it down a bit to add_wait_runtime.

the scheduler is a red herring here. Could you "strace -ttt -TTT" one of 
the glxgears instances (and send us the cfs-debug-info.sh output, with 
CONFIG_SCHED_DEBUG=y and CONFIG_SCHEDSTATS=y as requested before) so 
that we can have a closer look?

i reproduced something similar and there the stall is caused by 1+ 
second select() delays on the X client<->server socket. The scheduler 
stats agree with that:

 se.sleep_max :  2194711437
 se.block_max :   0
 se.exec_max  :  977446
 se.wait_max  : 1912321

the scheduler itself had a worst-case scheduling delay of 1.9 
milliseconds for that glxgears instance (which is perfectly good - in 
fact - excellent interactivity) - but the task had a maximum sleep time 
of 2.19 seconds. So the 'glitch' was not caused by the scheduler.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Send quota messages via netlink

2007-08-28 Thread David Miller

From: Andrew Morton <[EMAIL PROTECTED]>
Date: Tue, 28 Aug 2007 21:13:35 -0700

> This is it.  Normally netlink payloads are represented as a struct.  How
> come this one is built-by-hand?

He is using attributes, which is perfect and arbitrarily
extensible with zero backwards compatability concerns.

If he wants to provide a new attribute, he just adds it
without any issues.

When new attributes are added, older apps simply ignore the attributes
they don't understand.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Send quota messages via netlink

2007-08-28 Thread Andrew Morton

On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:

> +static void send_warning(const struct dquot *dquot, const char warntype)
> +{
> + static unsigned long seq;
> + struct sk_buff *skb;
> + void *msg_head;
> + int ret;
> +
> + skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS);
> + if (!skb) {
> + printk(KERN_ERR
> +   "VFS: Not enough memory to send quota warning.\n");
> + return;
> + }
> + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0, 
> QUOTA_NL_C_WARNING);

The access to seq is racy, isn't it?

If so, that can be solved with a lock, or with atomic_add_return().
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Andrew Morton

On Wed, 29 Aug 2007 00:04:59 +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote:
> > +When:  September 2010
> > +Option: CONFIG_SYSCTL_SYSCALL
> > +Why:   The same information is available in a more convenient from
> > +   /proc/sys, and none of the sysctl variables appear to be
> > +   important performance wise.
> > +
> > +   Binary sysctls are a long standing source of subtle kernel
> > +   bugs and security issues.
> > +
> > +   When I looked several months ago all I could find after
> > +   searching several distributions were 5 user space programs and
> > +   glibc (which falls back to /proc/sys) using this syscall.
> 
> Umm, no way we're ever going to remove a syscall like this.  Please
> stop this deprecration crap.  Just make sure no ones adds more binary
> sysctls.

I think it's worth a try.  It might take two, three or five years, who
knows?  If it turns out to be impractical then we we can just change our
minds later, no big loss.  It's just too early to say right now.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Andrew Morton

On Tue, 28 Aug 2007 16:40:15 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote:

> +static int deprecated_sysctl_warning(struct __sysctl_args *args)
> +{
> + static int msg_count;
> + int name[CTL_MAXNAME];
> + int i;
> +
> + /* Read in the sysctl name for better debug message logging */
> + for (i = 0; i < args->nlen; i++)
> + if (get_user(name[i], args->name + i))
> + return -EFAULT;
> +
> + /* Ignore accesses to kernel.version */
> + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == 
> KERN_VERSION))
> + return 0;

Do we want to do all the above if msg_count>=5?

> + if (msg_count < 5) {
> + msg_count++;
> + printk(KERN_INFO
> + "warning: process `%s' used the deprecated sysctl "
> + "system call with ", current->comm);
> + for (i = 0; i < args->nlen; i++)
> + printk("%d.", name[i]);
> + printk("\n");
> + }
> + return 0;
> +}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4: BAD regression

2007-08-28 Thread Alexey Starikovskiy

Daniel,
Does this patch help you, or do we need to revert the whole thing?

Sorry for the trouble,
Alex.
Daniel Ritz wrote:
> tried that one on my old toshiba tecra 8000 laptop, almost killing it.
> the fan doesn't work any more...type 'make' and see the box dying.
> luckily my CPU doesn't commit suicide...bisected it to that one:
>
> cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit
> commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b
> Author: Alexey Starikovskiy <[EMAIL PROTECTED]>
> Date:   Fri Aug 3 17:52:48 2007 -0400
>
> ACPI: EC: If ECDT is not found, look up EC in DSDT.
>
> Some ASUS laptops access EC space from device _INI methods, but do not
> provide ECDT for early EC setup. In order to make them function properly,
> there is a need to find EC is DSDT before any _INI is called.
>
> Similar functionality was turned on by acpi_fake_ecdt=1 command line
> before. Now it is on all the time.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=8598
>
> Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]>
> Signed-off-by: Len Brown <[EMAIL PROTECTED]>
>   

Drop early init of EC from DSDT patch

From: Alexey Starikovskiy <[EMAIL PROTECTED]>


---

 drivers/acpi/ec.c |   21 +++--
 1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 43749c8..e28f5b2 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -876,20 +876,13 @@ int __init acpi_ec_ecdt_probe(void)
 	 */
 	status = acpi_get_table(ACPI_SIG_ECDT, 1,
 (struct acpi_table_header **)_ptr);
-	if (ACPI_SUCCESS(status)) {
-		printk(KERN_INFO PREFIX "EC description table is found, configuring boot EC\n\n");
-		boot_ec->command_addr = ecdt_ptr->control.address;
-		boot_ec->data_addr = ecdt_ptr->data.address;
-		boot_ec->gpe = ecdt_ptr->gpe;
-		boot_ec->handle = ACPI_ROOT_OBJECT;
-	} else {
-		printk(KERN_DEBUG PREFIX "Look up EC in DSDT\n");
-		status = acpi_get_devices(ec_device_ids[0].id, ec_parse_device,
-		boot_ec, NULL);
-		if (ACPI_FAILURE(status))
-			goto error;
-	}
-
+	if (ACPI_FAILURE(status))
+		goto error;
+	printk(KERN_INFO PREFIX "EC description table is found, configuring boot EC\n");
+	boot_ec->command_addr = ecdt_ptr->control.address;
+	boot_ec->data_addr = ecdt_ptr->data.address;
+	boot_ec->gpe = ecdt_ptr->gpe;
+	boot_ec->handle = ACPI_ROOT_OBJECT;
 	ret = ec_install_handlers(boot_ec);
 	if (!ret) {
 		first_ec = boot_ec;

Re: CFS review

2007-08-28 Thread Mike Galbraith

On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote:
> * Al Boldi <[EMAIL PROTECTED]> wrote:
> 
> > No need for framebuffer.  All you need is X using the X.org 
> > vesa-driver.  Then start gears like this:
> > 
> >   # gears & gears & gears &
> > 
> > Then lay them out side by side to see the periodic stallings for 
> > ~10sec.
> 
> i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
> and i'm wondering how it can be smooth on vesa-driver at all. I tested 
> it on a Core2Duo box and software rendering manages to do about 3 frames 
> per second. (although glxgears itself thinks it does ~600 fps) If i 
> start 3 glxgears then they do ~1 frame per second each. This is on 
> Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
> xorg-x11-drv-i810-2.0.0-4.fc7.

At least you can run the darn test... the third instance of glxgears
here means say bye bye to GUI instantly.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4: maxcpus still broken

2007-08-28 Thread Alexey Dobriyan

On Wed, Aug 29, 2007 at 01:35:57AM +0200, Michal Piotrowski wrote:
> On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> > Every time I try to boot with maxcpus=1 it dies show_stat():
> 
> Is this a regression?

yep

> Hugh fixed some issues on x86-64 commit 
> 813409771731d80e6fa94199adf99f2269a4afc0

This is 2.6.23-rc4, which has this fix, yes.

And I have second box with exactly same behaviour: x86_64 E6400, it also has 
ACPI=n
Turning on ACPI doesn't make it any better, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Keith Packard

On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote:

> > Then lay them out side by side to see the periodic stallings for 
> > ~10sec.

The X scheduling code isn't really designed to handle software GL well;
the requests can be very expensive to execute, and yet are specified as
atomic operations (sigh).

> i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
> and i'm wondering how it can be smooth on vesa-driver at all. I tested 
> it on a Core2Duo box and software rendering manages to do about 3 frames 
> per second. (although glxgears itself thinks it does ~600 fps) If i 
> start 3 glxgears then they do ~1 frame per second each. This is on 
> Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
> xorg-x11-drv-i810-2.0.0-4.fc7.

Are you attempting to measure the visible updates by eye? Or are you
using some other metric?

In any case, attempting to measure anything using glxgears is a bad
idea; it's not representative of *any* real applications. And then using
software GL on top of that...

What was the question again?

-- 
[EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

Re: CFS review

2007-08-28 Thread Al Boldi

Ingo Molnar wrote:
> * Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > I like your analysis, but how do you explain that these stalls
> > > vanish when __update_curr is disabled?
> >
> > It's entirely possible that what happens is that the X scheduling is
> > just a slightly unstable system - which effectively would turn a small
> > scheduling difference into a *huge* visible difference.
>
> i think it's because disabling __update_curr() in essence removes the
> ability of scheduler to preempt tasks - that hack in essence results in
> a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous
> pair of tasks in essence - and thus gears cannot "overload" X.

I have narrowed it down a bit to add_wait_runtime.

Patch 2.6.22.5-v20.4 like this:

346- * the two values are equal)
347- * [Note: delta_mine - delta_exec is negative]:
348- */
349://  add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
350-}
351-
352-static void update_curr(struct cfs_rq *cfs_rq)

When disabling add_wait_runtime the stalls are gone.  With this change the 
scheduler is still usable, but it does not constitute a fix.

Now, even with this hack, uneven nice-levels between X and gears causes a 
return of the stalls, so make sure both X and gears run on the same 
nice-level when testing.

Again, the whole point of this workload is to expose scheduler glitches 
regardless of whether X is broken or not, and my hunch is that this problem 
looks suspiciously like an ia-boosting bug.  What's important to note is 
that by adjusting the scheduler we can effect a correction in behaviour, and 
as such should yield this problem as fixable.

It's probably a good idea to look further into add_wait_runtime.

Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Ingo Molnar

* Al Boldi <[EMAIL PROTECTED]> wrote:

> No need for framebuffer.  All you need is X using the X.org 
> vesa-driver.  Then start gears like this:
> 
>   # gears & gears & gears &
> 
> Then lay them out side by side to see the periodic stallings for 
> ~10sec.

i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
and i'm wondering how it can be smooth on vesa-driver at all. I tested 
it on a Core2Duo box and software rendering manages to do about 3 frames 
per second. (although glxgears itself thinks it does ~600 fps) If i 
start 3 glxgears then they do ~1 frame per second each. This is on 
Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
xorg-x11-drv-i810-2.0.0-4.fc7.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Send quota messages via netlink

2007-08-28 Thread Andrew Morton

On Tue, 28 Aug 2007 16:13:18 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:

>   Hello,
> 
>   I'm sending rediffed patch implementing sending of quota messages via 
> netlink
> interface (some rationale in patch description). I've already posted it to
> LKML some time ago and there were no objections, so I guess it's fine to put
> it to -mm. Andrew, would you be so kind? Thanks.
>   Userspace deamon reading the messages from the kernel and sending them to
> dbus and/or user console is also written (it's part of quota-tools). The
> only remaining problem is there are a few changes needed to libnl needed for
> the userspace daemon. They were basically acked by the maintainer but it
> seems he has not merged the patches yet. So this will take a bit more time.
> 

So it's a new kernel->userspace interface.

But we have no description of the interface :(

> +/* Send warning to userspace about user which exceeded quota */
> +static void send_warning(const struct dquot *dquot, const char warntype)
> +{
> + static unsigned long seq;
> + struct sk_buff *skb;
> + void *msg_head;
> + int ret;
> +
> + skb = genlmsg_new(QUOTA_NL_MSG_SIZE, GFP_NOFS);
> + if (!skb) {
> + printk(KERN_ERR
> +   "VFS: Not enough memory to send quota warning.\n");
> + return;
> + }
> + msg_head = genlmsg_put(skb, 0, seq++, _genl_family, 0, 
> QUOTA_NL_C_WARNING);
> + if (!msg_head) {
> + printk(KERN_ERR
> +   "VFS: Cannot store netlink header in quota warning.\n");
> + goto err_out;
> + }
> + ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot->dq_type);
> + if (ret)
> + goto attr_err_out;
> + ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot->dq_id);
> + if (ret)
> + goto attr_err_out;
> + ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype);
> + if (ret)
> + goto attr_err_out;
> + ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR,
> + MAJOR(dquot->dq_sb->s_dev));
> + if (ret)
> + goto attr_err_out;
> + ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR,
> + MINOR(dquot->dq_sb->s_dev));
> + if (ret)
> + goto attr_err_out;
> + ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current->user->uid);
> + if (ret)
> + goto attr_err_out;
> + genlmsg_end(skb, msg_head);
> +
> + ret = genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS);
> + if (ret < 0 && ret != -ESRCH)
> + printk(KERN_ERR
> + "VFS: Failed to send notification message: %d\n", ret);
> + return;
> +attr_err_out:
> + printk(KERN_ERR "VFS: Failed to compose quota message: %d\n", ret);
> +err_out:
> + kfree_skb(skb);
> +}
> +#endif

This is it.  Normally netlink payloads are represented as a struct.  How
come this one is built-by-hand?

It doesn't appear to be versioned.  Should it be?

Does it have (or need) reserved-set-to-zero space for expansion?  Again,
hard to tell..

I guess it's OK to send a major and minor out of the kernel like this. 
What's it for?  To represent a filesytem?  I wonder if there's a more
modern and useful way of describing the fs.  Path to mountpoint or
something?

I suspect the namespace virtualisation guys would be interested in a new
interface which is sending current->user->uid up to userspace.  uids are
per-namespace now.  What are the implications?  (cc's added)

Is it worth adding a comment explaining why GFP_NOFS is used here?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4, maxcpus=1 regression

2007-08-28 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > > reverting that commit makes the system boot again. I've attached the 
> > > .config.
> > 
> > Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in 
> > particular ("fix maxcpus=N parsing") was supposed to fix that commit.
> 
> ah ... indeed my tree is a few commits ahead of rc4. Checking.

indeed that was it, it boots fine now :-/ Sorry about the noise. [ I 
guess i should make it a policy to not mail bugreports before 6am ;-) ]

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] iw_cxgb3 - dev->priv fix follow up

2007-08-28 Thread Divy Le Ray

From: Divy Le Ray <[EMAIL PROTECTED]>

The RDMA driver sitting on top of cxgb3 
now uses the exported function dev2t3cdev() 
to retrieve the the t3cdev associated with 
a net_device.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c 
b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 1518b41..beb2a38 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -916,7 +916,7 @@ int cxio_rdev_open(struct cxio_rdev *rdev_p)
PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name);
memset(_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp));
if (!rdev_p->t3cdev_p)
-   rdev_p->t3cdev_p = T3CDEV(netdev_p);
+   rdev_p->t3cdev_p = dev2t3cdev(netdev_p);
rdev_p->t3cdev_p->ulp = (void *) rdev_p;
err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS,
 &(rdev_p->rnic_info));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] i386 visws: "extern inline" -> "static inline"

2007-08-28 Thread Andrey Panin

On 239, 08 27, 2007 at 11:28:19PM +0200, Adrian Bunk wrote:
> "extern inline" will have different semantics with gcc 4.3.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Looks good.

Acked-by: Andrey Panin <[EMAIL PROTECTED]>


> ---
> 
> This patch has been sent on:
> - 14 Aug 2007
> 
>  include/asm-i386/mach-visws/cobalt.h  |8 
>  include/asm-i386/mach-visws/lithium.h |8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> e12d2e797af72524f53a0ef3a7dd3cf91f58c542 
> diff --git a/include/asm-i386/mach-visws/cobalt.h 
> b/include/asm-i386/mach-visws/cobalt.h
> index 33c3622..9952588 100644
> --- a/include/asm-i386/mach-visws/cobalt.h
> +++ b/include/asm-i386/mach-visws/cobalt.h
> @@ -94,22 +94,22 @@
>  #define  CO_IRQ_8259 CO_IRQ(CO_APIC_8259)
>  
>  #ifdef CONFIG_X86_VISWS_APIC
> -extern __inline void co_cpu_write(unsigned long reg, unsigned long v)
> +static inline void co_cpu_write(unsigned long reg, unsigned long v)
>  {
>   *((volatile unsigned long *)(CO_CPU_VADDR+reg))=v;
>  }
>  
> -extern __inline unsigned long co_cpu_read(unsigned long reg)
> +static inline unsigned long co_cpu_read(unsigned long reg)
>  {
>   return *((volatile unsigned long *)(CO_CPU_VADDR+reg));
>  }
>   
> -extern __inline void co_apic_write(unsigned long reg, unsigned long v)
> +static inline void co_apic_write(unsigned long reg, unsigned long v)
>  {
>   *((volatile unsigned long *)(CO_APIC_VADDR+reg))=v;
>  }
>   
> -extern __inline unsigned long co_apic_read(unsigned long reg)
> +static inline unsigned long co_apic_read(unsigned long reg)
>  {
>   return *((volatile unsigned long *)(CO_APIC_VADDR+reg));
>  }
> diff --git a/include/asm-i386/mach-visws/lithium.h 
> b/include/asm-i386/mach-visws/lithium.h
> index d443e68..dfcd4f0 100644
> --- a/include/asm-i386/mach-visws/lithium.h
> +++ b/include/asm-i386/mach-visws/lithium.h
> @@ -29,22 +29,22 @@
>  #define  LI_INTD 0x0080
>  
>  /* More special purpose macros... */
> -extern __inline void li_pcia_write16(unsigned long reg, unsigned short v)
> +static inline void li_pcia_write16(unsigned long reg, unsigned short v)
>  {
>   *((volatile unsigned short *)(LI_PCIA_VADDR+reg))=v;
>  }
>  
> -extern __inline unsigned short li_pcia_read16(unsigned long reg)
> +static inline unsigned short li_pcia_read16(unsigned long reg)
>  {
>return *((volatile unsigned short *)(LI_PCIA_VADDR+reg));
>  }
>  
> -extern __inline void li_pcib_write16(unsigned long reg, unsigned short v)
> +static inline void li_pcib_write16(unsigned long reg, unsigned short v)
>  {
>   *((volatile unsigned short *)(LI_PCIB_VADDR+reg))=v;
>  }
>  
> -extern __inline unsigned short li_pcib_read16(unsigned long reg)
> +static inline unsigned short li_pcib_read16(unsigned long reg)
>  {
>   return *((volatile unsigned short *)(LI_PCIB_VADDR+reg));
>  }
> 
> 

-- 
Andrey Panin| Linux and UNIX system administrator
[EMAIL PROTECTED]   | PGP key: wwwkeys.pgp.net


signature.asc
Description: Digital signature

[PATCH 2.6.23 1/2] cxgb3 - Fix dev->priv usage

2007-08-28 Thread Divy Le Ray

From: Divy Le Ray <[EMAIL PROTECTED]>

cxgb3 used netdev_priv() and dev->priv for different purposes.
In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the 
net-2.6.24 git branch. 

Without this fix, cxgb3 crashes on 2.6.23.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/adapter.h   |2 +
 drivers/net/cxgb3/cxgb3_main.c|  126 +
 drivers/net/cxgb3/cxgb3_offload.c |   16 -
 drivers/net/cxgb3/cxgb3_offload.h |2 +
 drivers/net/cxgb3/sge.c   |   23 ---
 drivers/net/cxgb3/t3cdev.h|3 -
 6 files changed, 104 insertions(+), 68 deletions(-)

diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
index ab72563..20e887d 100644
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -50,7 +50,9 @@ typedef irqreturn_t(*intr_handler_t) (int, void *);
 
 struct vlan_group;
 
+struct adapter;
 struct port_info {
+   struct adapter *adapter;
struct vlan_group *vlan_grp;
const struct port_type_info *port_type;
u8 port_id;
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index dc5d269..f3bf128 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -358,11 +358,14 @@ static int init_dummy_netdevs(struct adapter *adap)
 
for (j = 0; j < pi->nqsets - 1; j++) {
if (!adap->dummy_netdev[dummy_idx]) {
-   nd = alloc_netdev(0, "", ether_setup);
+   struct port_info *p;
+
+   nd = alloc_netdev(sizeof(*p), "", ether_setup);
if (!nd)
goto free_all;
 
-   nd->priv = adap;
+   p = netdev_priv(nd);
+   p->adapter = adap;
nd->weight = 64;
set_bit(__LINK_STATE_START, >state);
adap->dummy_netdev[dummy_idx] = nd;
@@ -482,7 +485,8 @@ static ssize_t attr_store(struct device *d, struct 
device_attribute *attr,
 #define CXGB3_SHOW(name, val_expr) \
 static ssize_t format_##name(struct net_device *dev, char *buf) \
 { \
-   struct adapter *adap = dev->priv; \
+   struct port_info *pi = netdev_priv(dev); \
+   struct adapter *adap = pi->adapter; \
return sprintf(buf, "%u\n", val_expr); \
 } \
 static ssize_t show_##name(struct device *d, struct device_attribute *attr, \
@@ -493,7 +497,8 @@ static ssize_t show_##name(struct device *d, struct 
device_attribute *attr, \
 
 static ssize_t set_nfilters(struct net_device *dev, unsigned int val)
 {
-   struct adapter *adap = dev->priv;
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adap = pi->adapter;
int min_tids = is_offload(adap) ? MC5_MIN_TIDS : 0;
 
if (adap->flags & FULL_INIT_DONE)
@@ -515,7 +520,8 @@ static ssize_t store_nfilters(struct device *d, struct 
device_attribute *attr,
 
 static ssize_t set_nservers(struct net_device *dev, unsigned int val)
 {
-   struct adapter *adap = dev->priv;
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adap = pi->adapter;
 
if (adap->flags & FULL_INIT_DONE)
return -EBUSY;
@@ -556,9 +562,10 @@ static struct attribute_group cxgb3_attr_group = {.attrs = 
cxgb3_attrs };
 static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr,
char *buf, int sched)
 {
-   ssize_t len;
+   struct port_info *pi = netdev_priv(to_net_dev(d));
+   struct adapter *adap = pi->adapter;
unsigned int v, addr, bpt, cpt;
-   struct adapter *adap = to_net_dev(d)->priv;
+   ssize_t len;
 
addr = A_TP_TX_MOD_Q1_Q0_RATE_LIMIT - sched / 2;
rtnl_lock();
@@ -581,10 +588,11 @@ static ssize_t tm_attr_show(struct device *d, struct 
device_attribute *attr,
 static ssize_t tm_attr_store(struct device *d, struct device_attribute *attr,
 const char *buf, size_t len, int sched)
 {
+   struct port_info *pi = netdev_priv(to_net_dev(d));
+   struct adapter *adap = pi->adapter;
+   unsigned int val;
char *endp;
ssize_t ret;
-   unsigned int val;
-   struct adapter *adap = to_net_dev(d)->priv;
 
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -858,8 +866,9 @@ static void schedule_chk_task(struct adapter *adap)
 
 static int offload_open(struct net_device *dev)
 {
-   struct adapter *adapter = dev->priv;
-   struct t3cdev *tdev = T3CDEV(dev);
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adapter = pi->adapter;
+   struct t3cdev *tdev = dev2t3cdev(dev);
int adap_up = adapter->open_device_map & PORT_MASK;
int err = 0;

[PATCH 2.6.23 0/2] cxgb3 - Fix dev->priv usage

2007-08-28 Thread Divy Le Ray


Jeff/Roland,

I'm resubmitting the cxgb3 dev->priv fix for inclusion in 2.6.23.
I also submit a follow-up patch for the iw_cxgb3 driver that fixes
the previous infiniband breakage.

Cheers,
Divy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4, maxcpus=1 regression

2007-08-28 Thread Ingo Molnar


* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > reverting that commit makes the system boot again. I've attached the 
> > .config.
> 
> Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in 
> particular ("fix maxcpus=N parsing") was supposed to fix that commit.

ah ... indeed my tree is a few commits ahead of rc4. Checking.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4, maxcpus=1 regression

2007-08-28 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup 
> (right after setting up the local APICs). I bisected it down to this 
> recent commit:
> 
> | commit 61ec7567db103d537329b0db9a887db570431ff4
> | Author: Len Brown <[EMAIL PROTECTED]>
> | Date:   Thu Aug 16 03:34:22 2007 -0400
> |
> | ACPI: boot correctly with "nosmp" or "maxcpus=0"
> 
> reverting that commit makes the system boot again. I've attached the 
> .config.

i suspect it's due to this:

 -early_param("maxcpus=", maxcpus);
 +__setup("maxcpus=", maxcpus);

i'm quite sure maxcpus still needs to be an early-param.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4, maxcpus=1 regression

2007-08-28 Thread Linus Torvalds



On Wed, 29 Aug 2007, Ingo Molnar wrote:
> 
> maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup 
> (right after setting up the local APICs). I bisected it down to this 
> recent commit:
> 
> | commit 61ec7567db103d537329b0db9a887db570431ff4
> | Author: Len Brown <[EMAIL PROTECTED]>
> | Date:   Thu Aug 16 03:34:22 2007 -0400
> |
> | ACPI: boot correctly with "nosmp" or "maxcpus=0"
> 
> reverting that commit makes the system boot again. I've attached the 
> .config.

Did you try -rc4? Commit 813409771731d80e6fa94199adf99f2269a4afc0 in 
particular ("fix maxcpus=N parsing") was supposed to fix that commit.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4, maxcpus=1 regression

2007-08-28 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> maxcpus=1 fails to boot on my T60 laptop, it hangs in early bootup 
> (right after setting up the local APICs). I bisected it down to this 
> recent commit:

maxcpus=0 fails to boot as well.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Ingo Molnar


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> > There is another way to show the problem visually under X 
> > (vesa-driver), by starting 3 gears simultaneously, which after 
> > laying them out side-by-side need some settling time before 
> > smoothing out.  Without __update_curr it's absolutely smooth from 
> > the start.
> 
> I posted a LOT of stuff using the glitch1 script, and finally found a 
> set of tuning values which make the test script run smooth. See back 
> posts, I don't have them here.

but you have real 3D hw and DRI enabled, correct? In that case X uses up 
almost no CPU time and glxgears makes most of the processing. That is 
quite different from the above software-rendering case, where X spends 
most of the CPU time.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

drm: VIA invalid device IDs removal

2007-08-28 Thread Xavier Bachelot


Remove 2 invalid device ids from in-kernel drm tree.

0x1106, 0x7204 is unknown and thus is not an IGP/GPU.
0x1106, 0x3304 is K8M800 hostbridge, not an IGP/GPU.
None of them are in drm git tree.

--- a/drivers/char/drm/drm_pciids.h	2007-08-28 14:08:27.0 +0200
+++ b/drivers/char/drm/drm_pciids.h	2007-08-28 14:17:12.0 +0200
@@ -236,10 +236,8 @@
 	{0x1106, 0x3022, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x3118, PCI_ANY_ID, PCI_ANY_ID, 0, 0, VIA_PRO_GROUP_A}, \
 	{0x1106, 0x3122, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
-	{0x1106, 0x7204, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x7205, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x3108, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
-	{0x1106, 0x3304, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x3344, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x3343, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
 	{0x1106, 0x3230, PCI_ANY_ID, PCI_ANY_ID, 0, 0, VIA_DX9_0}, \

Re: CFS review

2007-08-28 Thread Bill Davidsen


Ingo Molnar wrote:

* Al Boldi <[EMAIL PROTECTED]> wrote:

ok. I think i might finally have found the bug causing this. Could 
you try the fix below, does your webserver thread-startup test work 
any better?
It seems to help somewhat, but the problem is still visible.  Even 
v20.3 on 2.6.22.5 didn't help.


It does look related to ia-boosting, so I turned off __update_curr 
like Roman mentioned, which had an enormous smoothing effect, but then 
nice levels completely break down and lockup the system.


you can turn sleeper-fairness off via:

   echo 28 > /proc/sys/kernel/sched_features

another thing to try would be:

   echo 12 > /proc/sys/kernel/sched_features


14, and drop the granularity to 50.


(that's the new-task penalty turned off.)

Another thing to try would be to edit this:

if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2);

to:

if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
p->se.wait_runtime = -(sched_granularity(cfs_rq);

and could you also check 20.4 on 2.6.22.5 perhaps, or very latest -git? 
(Peter has experienced smaller spikes with that.)


Ingo



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sched: fix broken smt/mc optimizations with CFS

2007-08-28 Thread Ingo Molnar


* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> On Mon, Aug 27, 2007 at 12:31:03PM -0700, Siddha, Suresh B wrote:
> > Essentially I observed that nice 0 tasks still endup on two cores of same
> > package, with out getting spread out to two different packages. This 
> > behavior
> > is same with out this fix and this fix doesn't help in any way.
> 
> Ingo, Appended patch seems to fix the issue and as far as I can test, 
> seems ok to me.

thanks! I've queued your fix up for .23 merge. I've done a quick test 
and it indeed seems to work well.

> This is a quick fix for .23. Peter Williams and myself plan to look at 
> code cleanups in this area (HT/MC optimizations) post .23
> 
> BTW, with this fix, do you want to retain the current FUZZ value?

what value would you suggest? I was thinking about using 
busiest_rq->curr->load.weight instead, to always keep rotating tasks.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Bill Davidsen


Al Boldi wrote:

Ingo Molnar wrote:

* Al Boldi <[EMAIL PROTECTED]> wrote:

The problem is that consecutive runs don't give consistent results
and sometimes stalls.  You may want to try that.

well, there's a natural saturation point after a few hundred tasks
(depending on your CPU's speed), at which point there's no idle time
left. From that point on things get slower progressively (and the
ability of the shell to start new ping tasks is impacted as well),
but that's expected on an overloaded system, isnt it?

Of course, things should get slower with higher load, but it should be
consistent without stalls.

To see this problem, make sure you boot into /bin/sh with the normal
VGA console (ie. not fb-console).  Then try each loop a few times to
show different behaviour; loops like:

# for ((i=0; i<; i++)); do ping 10.1 -A > /dev/null & done

# for ((i=0; i<; i++)); do nice -99 ping 10.1 -A > /dev/null & done

# { for ((i=0; i<; i++)); do
ping 10.1 -A > /dev/null &
done } > /dev/null 2>&1

Especially the last one sometimes causes a complete console lock-up,
while the other two sometimes stall then surge periodically.

ok. I think i might finally have found the bug causing this. Could you
try the fix below, does your webserver thread-startup test work any
better?


It seems to help somewhat, but the problem is still visible.  Even v20.3 on 
2.6.22.5 didn't help.


It does look related to ia-boosting, so I turned off __update_curr like Roman 
mentioned, which had an enormous smoothing effect, but then nice levels 
completely break down and lockup the system.


There is another way to show the problem visually under X (vesa-driver), by 
starting 3 gears simultaneously, which after laying them out side-by-side 
need some settling time before smoothing out.  Without __update_curr it's 
absolutely smooth from the start.


I posted a LOT of stuff using the glitch1 script, and finally found a 
set of tuning values which make the test script run smooth. See back 
posts, I don't have them here.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Nick Piggin


Daniel Phillips wrote:

On Friday 24 August 2007 03:45, Theodore Tso wrote:


As I said; what's wrong with just using SPI membership?  It's not
like it is remotely hard for kernel hackers to gain membership in
SPI.  And somebody else takes care of the bureaucracy for you.


Given the huge overlap between SPI membership and Debian membership,
and then taking a look at the craziness that takes place on various
Debian mailing lists, such as but not limited to debian-legal, I'm
quite convinced that this would be a baad idea.



Hi Ted,

Ever watched a legislative assembly at work?  A bad idea perhaps, but 
the best that has been discovered so far.


Given that there is already some charter that says KS attendees vote...
isn't it best to retain that? Directives from above aside, you need
specifications on how to change voting procedure before changing it, no?
If those don't exist, then something vaguely similar in my country would
require a referendum I think.

Hasn't the KS committee / TAB board vote rigging conspiracy theory been
raised yet? Given they're not running a country, it would be great fun
to see the board getting corrupted and go off the rails ;) I'd vote for
them because if Ted has anything to do with it, I *know* we'll be having
KS in Hawaii ;)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS woes again

2007-08-28 Thread Florin Iucha

On Tue, Aug 28, 2007 at 09:28:43AM -0400, Trond Myklebust wrote:
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
> 
> The following patch should be correct. Please just discard the previous
> one...

So far so good.  This patch got one hour uptime...  I'll stay with
this kernel for a few days, to keep an eye on it.

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Theodore Tso

On Tue, Aug 28, 2007 at 03:59:09PM -0700, Daniel Phillips wrote:
> Ever watched a legislative assembly at work?  A bad idea perhaps, but 
> the best that has been discovered so far.

Sure, but a Debian mailing list where fanatics who have no job, no
life, but huge amounts of free time to post literally hundreds of
messages a day indulging in Debian's "last post wins" style of
argumentation have far more power to influence the decision making
process than those who have to work at a real job has very little in
common with a legislative assembly.

That's why any kind of election for the TAB should happen, IMHO, in
"real space", at some conference where there is a gross filter of
people being able to afford travel expenses or be paid by some company
for their expenses (thus showing that someone felt that they were
doing enough good work that they should be given the resources to pay
for travel expenses and the conference registration fees).

If that's an elitist attitude; I plead guilty --- Linux and OSS is
*not* a democracy.  Linus doesn't obey the whims of majority voting to
decide which patches to accept or reject.  The Linux kernel community
is very much a meritocracy, which is why I don't believe that some
kind of pure democracy such as using the SPI voting membership is the
right thing for electing the TAB.  Just remember, in the United
States, a democracy where around 50% of Americans believe that Saddam
Hussein was personally responsible for 9/11 elected George W. Bush to
the US presidency.  It's statistics like that which make you want to
impose some kind of comptency test on who is allowed to vote.

The kernel summit is one such place where we can hold such a vote, and
if people thought that a BOF at some conference like Linux.conf.au or
OLS would be a better place, those might be other alternatives.  I'll
note that most of this discussion is mostly moot, though, given that
at this point we have 5 candidates for 5 slots, for positions which is
really more about service than about any kind of power or benefits.

 - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: oops at sr_block_release [Re: 2.6.23-rc3-mm1]

2007-08-28 Thread Andrew Morton

On Tue, 28 Aug 2007 13:32:57 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote:

> Andrew Morton napsal(a):
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc3/2.6.23-rc3-mm1/
> 
> I got this during gxine initialization of ocko.tv live stream without any cd 
> in
> cdroms:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 005c
> printing eip: f88fbe7a *pde = 
> Oops:  [#1] SMP
> Modules linked in: ath5k arc4 ecb blkcipher cryptomgr crypto_algapi
> rc80211_simple mac80211 cfg80211 nls_cp437 vfat fat usb_storage tun ipv6 
> floppy
> parport_pc parport ohci1394 ieee1394 usbhid sr_mod ehci_hcd cdrom ff_memless
> 
> Pid: 2809, comm: hald-addon-stor Not tainted (2.6.23-rc3-mm1 #315)
> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> EIP is at sr_block_release+0xb/0x2c [sr_mod]
> EAX:  EBX:  ECX: f88fbe6f EDX: 
> ESI: c21c36c0 EDI: c289a780 EBP: c3729f18 ESP: c3729f10
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process hald-addon-stor (pid: 2809, ti=c3729000 task=c1c2be40 
> task.ti=c3729000)
> Stack:  c21c36c0 c3729f38 c018d7ad c21c36cc c1f9ff80 c21c3730 c21c36c0
>c2a6ada0 dcbb3f80 c3729f40 c018d7dc c3729f4c c018e103 0010 c3729f74
>c016bc5f   c217fa80 c1f9ff80 c2a6ada0 dcbb3f80 c1cc6900
> Call Trace:
>  [] show_trace_log_lvl+0x1a/0x30
>  [] show_stack_log_lvl+0xa5/0xca
>  [] show_registers+0xd0/0x1c1
>  [] die+0x10a/0x24d
>  [] do_page_fault+0x496/0x608
>  [] error_code+0x72/0x78
>  [] __blkdev_put+0x125/0x14a
>  [] blkdev_put+0xa/0xc
>  [] blkdev_close+0x29/0x2c
>  [] __fput+0xa6/0x161
>  [] fput+0x22/0x3b
>  [] filp_close+0x41/0x67
>  [] sys_close+0x60/0x9f
>  [] syscall_call+0x7/0xb
>  ===
> Code: 0c 81 c3 4c 01 00 00 89 5c 24 08 89 44 24 04 c7 04 24 88 cd 8f f8 e8 99 
> 84
> 82 c7 e9 04 fe ff ff 55 89 e5 56 53 8b 80 04 01 00 00 <8b> 40 5c 8b 70 3c 8d 
> 46
> 18 e8 cf f6 fe ff 89 c3 85 c0 75 07 89
> EIP: [] sr_block_release+0xb/0x2c [sr_mod] SS:ESP 0068:c3729f10
> 

Possibly due to remove-bdput-from-do_open-in-fs-block_devc.patch.

That patch is "wrong" and I think the problem which it attempts to address
actually lies in the cdrom code.  viro was taking a look at it but appears
to have recoiled in horror.  I'll drop
remove-bdput-from-do_open-in-fs-block_devc.patch so let's just watch out
for any reoccurrence, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Theodore Tso

On Tue, Aug 28, 2007 at 07:18:36PM -0700, Daniel Walker wrote:
> Just out of curiosity , have you had anyone nominate a really really
> large group ? Like say, anyone that has every send an email to lkml ? 

Nope; I suspect someone who did that would just be ignored by the
program committee.  We might publically mock someone who did that,
just to discourage that kind of behavior, but it's wouldn't be a
particularly effective denial of service attack, precisely because the
program committee has discretion about how to handle that sort of
thing.  

There have been people nominating 5-10 people in previous years, and
in general the set of people that were nominated overlapped with
suggestions made by others --- and that's the process working as it's
supposed to.  But that's not a "really, really large group".

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] NBD: set uninitialized devices to size 0

2007-08-28 Thread Bill Davidsen


Andrew Morton wrote:

On Fri, 24 Aug 2007 13:06:39 -0400
Paul Clements <[EMAIL PROTECTED]> wrote:

This fixes errors with utilities (such as LVM's vgscan) that try to scan 
all devices. Previously this would generate read errors when 
uninitialized nbd devices were scanned:


I somewhat randomly marked both these as 2.6.24 material.  If you think
that was incorrect, please shout out.


I have the feeling that I mentioned nbd issues several releases ago, but 
never got to getting more info on reproducing them. I try not to submit 
bugs I can't reproduce, oftem they're my fault :-(



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] [RFC][PATCH 0/2 -mm] kexec based hibernation

2007-08-28 Thread Nick Piggin


Huang, Ying wrote:

On Mon, 2007-08-27 at 09:28 +0800, Hu, Fenghua wrote:


One quick question is, can it improve hiberation/wakeup time?



In general, for kexec based hibernation, what increases
hibernation/wakeup time:

- One extra Linux boot is needed to hibernate and wakeup.


What decreases hibernation/wakeup time:

- Most hibernation/wakeup work is done in full functional user space
program, so it is possible to do some optimization, such as parallel
compression.


- It does not have to reclaim pagecache before suspend?

- It does not have to restore working set afterwards?

(You could do this to reduce image size, of course, but it can
be optional which is nice).



So, I think the kexec based hibernation may be slower than original
implementation in general. In this prototype implementation, the
hibernation/wakeup time is much longer than original hibernation/wakeup
implementation. But it has much to be optimized and I think it can
approach the speed of the original implementation after optimization.


Also, don't just look at the time to do a simple suspend/resume cycle,
but the full cost of going from working state to working state (eg.
grep a kernel tree or two!).

Although the kexec details are out of my league, I really like
everything about the concept :) Nice work.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Daniel Walker

On Tue, 2007-08-28 at 22:18 -0400, Theodore Tso wrote:
> On Mon, Aug 27, 2007 at 02:12:56PM +0200, Jes Sorensen wrote:
> > Yes, as well as 12 committee members, of which 5 didn't even comply with
> > their own git commit requirement last time I checked. 
> 
> Note that the git commit metric is not a "requirement", but a way of
> seeding the list of people to be considered.  The current selection
> process is that we *start* with that list, and then accept nominations
> from anyone for anyone (including self-nominations) that should be
> considered that weren't automatically included by the git selection
> criteria.

Just out of curiosity , have you had anyone nominate a really really
large group ? Like say, anyone that has every send an email to lkml ? 

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Theodore Tso

On Mon, Aug 27, 2007 at 02:12:56PM +0200, Jes Sorensen wrote:
> Yes, as well as 12 committee members, of which 5 didn't even comply with
> their own git commit requirement last time I checked. 

Note that the git commit metric is not a "requirement", but a way of
seeding the list of people to be considered.  The current selection
process is that we *start* with that list, and then accept nominations
from anyone for anyone (including self-nominations) that should be
considered that weren't automatically included by the git selection
criteria.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Eric W. Biederman

"H. Peter Anvin" <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Christoph Hellwig <[EMAIL PROTECTED]> writes:
>>
>>> Umm, no way we're ever going to remove a syscall like this.
>>
>> If someone besides me cares about more then rhetoric I will be happy
>> to reconsider and several years is plenty of time to find that out.
>>
>> I aborted the removal last time precisely because we had not done an
>> adequate job of warning our users.  A printk when we run a program
>> that uses the binary interface and an long enough interval the warning
>> makes it to the Enterprise kernels before we remove the interface
>> should be sufficient.
>>
>
> glibc uses it, and it uses it in contexts where access to the filesystem isn't
> functional (e.g. in chroot.)

Yes.  But (a) It doesn't affect correctness what answer it gets back.
  (b) It should be using uname.

Or are you thinking about something besides the pthreads usage?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Understanding I/O behaviour - next try

2007-08-28 Thread Fengguang Wu

On Tue, Aug 28, 2007 at 08:53:07AM -0700, Martin Knoblauch wrote:
[...]
>  The basic setup is a dual x86_64 box with 8 GB of memory. The DL380
> has a HW RAID5, made from 4x72GB disks and about 100 MB write cache.
> The performance of the block device with O_DIRECT is about 90 MB/sec.
> 
>  The problematic behaviour comes when we are moving large files through
> the system. The file usage in this case is mostly "use once" or
> streaming. As soon as the amount of file data is larger than 7.5 GB, we
> see occasional unresponsiveness of the system (e.g. no more ssh
> connections into the box) of more than 1 or 2 minutes (!) duration
> (kernels up to 2.6.19). Load goes up, mainly due to pdflush threads and
> some other poor guys being in "D" state.
[...]
>  Just by chance I found out that doing all I/O inc sync-mode does
> prevent the load from going up. Of course, I/O throughput is not
> stellar (but not much worse than the non-O_DIRECT case). But the
> responsiveness seem OK. Maybe a solution, as this can be controlled via
> mount (would be great for O_DIRECT :-).
> 
>  In general 2.6.22 seems to bee better that 2.6.19, but this is highly
> subjective :-( I am using the following setting in /proc. They seem to
> provide the smoothest responsiveness:
> 
> vm.dirty_background_ratio = 1
> vm.dirty_ratio = 1
> vm.swappiness = 1
> vm.vfs_cache_pressure = 1

You are apparently running into the sluggish kupdate-style writeback
problem with large files: huge amount of dirty pages are getting
accumulated and flushed to the disk all at once when dirty background
ratio is reached. The current -mm tree has some fixes for it, and
there are some more in my tree. Martin, I'll send you the patch if
you'd like to try it out.

>  Another thing I saw during my tests is that when writing to NFS, the
> "dirty" or "nr_dirty" numbers are always 0. Is this a conceptual thing,
> or a bug?

What are the nr_unstable numbers?

Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread H. Peter Anvin


Eric W. Biederman wrote:

Christoph Hellwig <[EMAIL PROTECTED]> writes:

Umm, no way we're ever going to remove a syscall like this.  


If someone besides me cares about more then rhetoric I will be happy
to reconsider and several years is plenty of time to find that out.

I aborted the removal last time precisely because we had not done an
adequate job of warning our users.  A printk when we run a program
that uses the binary interface and an long enough interval the warning
makes it to the Enterprise kernels before we remove the interface
should be sufficient.



glibc uses it, and it uses it in contexts where access to the filesystem 
isn't functional (e.g. in chroot.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] i386, apic: fix 4 bit apicid assumption of mach-default

2007-08-28 Thread Siddha, Suresh B

Andi/Andrew,

Can you pick this up for your trees and if there are no issues, can you please
push it to mainline before .23 gets released.

We have seen a boot failure with fewer cpu sockets populated on a MP platform.
Similar problem can happen on a fully populated system, if # of cpus <= 8
and any of the apic id's is > 16

thanks,
suresh
---
Fix get_apic_id() in mach-default, so that it uses 8 bits incase of xAPIC case
and 4 bits for legacy APIC case.

This fixes the i386 kernel assumption that apic id is less than 16 for xAPIC
platforms with 8 cpus or less and makes the kernel boot on such platforms.

Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

diff --git a/include/asm-i386/mach-default/mach_apicdef.h 
b/include/asm-i386/mach-default/mach_apicdef.h
index 7bcb350..ae98413 100644
--- a/include/asm-i386/mach-default/mach_apicdef.h
+++ b/include/asm-i386/mach-default/mach_apicdef.h
@@ -1,11 +1,17 @@
 #ifndef __ASM_MACH_APICDEF_H
 #define __ASM_MACH_APICDEF_H
 
+#include 
+
 #defineAPIC_ID_MASK(0xF<<24)
 
 static inline unsigned get_apic_id(unsigned long x) 
 { 
-   return (((x)>>24)&0xF);
+   unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
+   if (APIC_XAPIC(ver))
+   return (((x)>>24)&0xFF);
+   else
+   return (((x)>>24)&0xF);
 } 
 
 #defineGET_APIC_ID(x)  get_apic_id(x)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFSv4 client OOPS on 2.6.22-rc3 - I meant 2.6.23-rc3

2007-08-28 Thread Trond Myklebust

On Wed, 2007-08-29 at 01:41 +0200, Michal Piotrowski wrote:
> Hi Harry,
> 
> On 28/08/07, Harry Edmon <[EMAIL PROTECTED]> wrote:
> > Typo in my last message - I meant 2.6.23-rc3, not 2.6.22-rc3.  Here it
> > is again with correction
> >
> > I had a kernel oops on my x86_64 dual quad-core Xeon system running
> > 2.6.23-rc3.  The system is an NFSv4 client to another 2.6.23-rc3
> > system.  The OOPS text is attached and the config file.
> >
> 
> Is this a regression? Does 2.6.22 work fine?

Yes and yes. It is due to a typo when I was working on correcting the
NFSv4 open() state tracking in 2.6.23-rc1. A patch is available and I'm
planning on merging it soon.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

miss list_del(), bug ?

2007-08-28 Thread Yoann Padioleau


Shouldn't this code also do a list_del(e) ? 

in drivers/infiniband/core/iwcm.c:

static void dealloc_work_entries(struct iwcm_id_private *cm_id_priv)
{
struct list_head *e, *tmp;

list_for_each_safe(e, tmp, _id_priv->work_free_list)
kfree(list_entry(e, struct iwcm_work, free_list));
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Lguest] [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor

>>
>> Nice driver. I'm hoping we can do a virtio driver using a similar
>> concept.
>>
>> > +#define PCI_VENDOR_ID_9P 0x5002
>> > +#define PCI_DEVICE_ID_9P 0x000D
>>
>> Where do these numbers come from? Can we be sure they don't conflict
>with
>> actual hardware?
>
>I stole the VENDOR_ID from kvm's hypercall driver. There are no any
>guarantees that it doesn't conflict with actual hardware. As it was
>discussed before, there is still no ID assigned for the virtual
>devices.


Currently 5002 does not registered to Qumranet nor KVM.
We will do something about it pretty soon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd _page_from_freelist() : No more no page failures. (WHY????)

2007-08-28 Thread Nick Piggin


Mitchell Erblich wrote:

Nick Piggin wrote:




Nick Piggin, et al,

First diffs would generate alot of noise, since I rip and insert
alot of code based on whether I think the code is REALLY
needed for MY TEST environment. These suggestions are
basicly minimal merge suggestions between my
development envir and the public Linux tree.


That's OK. So long as the patch is against a well known tree, it
is just less ambiguous even if it doesn't actually compile :)




Now the why for this SUGGESTION/PATCH...



When we're in the (min,low) watermark range, we'll wake up kswapd
_before_ allocating anything, so what is better about the change to
wake up kswapd after allocating? Can you perhaps come up with an
example situation also to make this more clear?



Answer
Will GFP_ATOMIC alloc be failing at that point? If yes, then why
not allow kswapd attempt to prevent this condition from occuring?
The existing code reads that the first call to get_page_from_freelist()
has returned no page. Now you are going to start up something that
is at best going to take millisecs to start helping out. Won't it first
grab some pages to do its work? So we are going to be lower
in free memory right when it starts up. Right?


GFP_ATOMIC will not be failing at this point (also, kswapd could
probably have reclaimed several hundred or thousand pages in 1ms,
but that's besides the point -- we do have correct buffering here).

The watermarks go roughly like this:

high -- kswapd stops reclaiming
low  -- kswapd is started by any allocation, nothing else happens
min  -- non-GFP_ATOMIC can't go below this point; enter direct reclaim
min/X-- GFP_ATOMIC allocations fail below this point
0-- PF_MEMALLOC fails.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor

>> > This adds a shared memory transport for a synthetic 9p device for
>> > paravirtualized file system support under KVM/QEMU.
>>
>> Nice driver. I'm hoping we can do a virtio driver using a similar
>> concept.
>>
>
>Yes.  I'm looking at the patches from Dor now, it should be pretty
>straight forward.  The PCI is interesting in its own right for other
>(non-virtual) projects we've been playing with
>
> -eric

Great, we can add lots of pci bus shared functionality into the
kvm_pci_bus.c
--Dor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Crash report 2.6.22.5

2007-08-28 Thread Pete Monroe

On 8/28/07, Michal Piotrowski <[EMAIL PROTECTED]> wrote:
> Hi Pete,
>
> On 28/08/07, Pete Monroe <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Sorry there's not more to go on here.
> >
> > A 32-bit firewall running the kernel LVS virtual server to fan out to
> > a dozen webservers ran fine for a year using  2.6.17.13, but won't
> > last more than four hours or so with 2.6.22.5.  Another server,
> > different hardware and vendor but same purpose, also crashed with
> > 2.6.22.5 after a few hours.  It had previously run 2.6.20.11.  Nothing
> > on the screen, nothing in the logs.
> >
> > I'm attaching zipped dmesg (both kernel versions),
>
> Could you capture the bug with serial/netconsole etc.?

The servers are remote, production servers and it's a PITA when they
crash.  But I'll see what I can do.  Thanks for the pointer.

--
Pete

>
> "Collecting kernel messages"
> http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf
> for more info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -rt 1/8] introduce PICK_FUNCTION

2007-08-28 Thread Daniel Walker

On Wed, 2007-08-29 at 09:44 +1000, Nick Piggin wrote:
> Daniel Walker wrote:
> > PICK_FUNCTION() is similar to the other PICK_OP style macros, and was
> > created to replace them all. I used variable argument macros to handle
> > PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the
> > original macros used for semaphores. The entire system is used to do a
> > compile time switch between two different locking APIs. For example,
> > real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks).
> > 
> > This new macro replaces all the duplication from lock type to lock type.
> > The result of this patch, and the next two, is a fairly nice simplification,
> > and consolidation. Although the seqlock changes are larger than the 
> > originals
> > I think over all the patchset is worth while.
> > 
> > Incorporated peterz's suggestion to not require TYPE_EQUAL() to only
> > use pointers.
> 
> How come this is cc'ed to lkml? Is it something that is relevant to
> the mainline kernel... or?

The real time changes are usually developed on lkml , that's how it's
been in the past. I personally like CC'ing lkml since real time can
sometimes touch lots of different subsystems .. So it good to have a
diverse set of people reviewing ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Eric W. Biederman

Christoph Hellwig <[EMAIL PROTECTED]> writes:

> Umm, no way we're ever going to remove a syscall like this.  

If someone besides me cares about more then rhetoric I will be happy
to reconsider and several years is plenty of time to find that out.

I aborted the removal last time precisely because we had not done an
adequate job of warning our users.  A printk when we run a program
that uses the binary interface and an long enough interval the warning
makes it to the Enterprise kernels before we remove the interface
should be sufficient.

> stop this deprecration crap.  Just make sure no ones adds more binary
> sysctls.

The sysctl_check_table function should keep out most of the problem
cases and especially it should ensure we don't add any new binary
sysctls by accident. 

However given our atrocious record at catching these kinds of
problems via code review and testing and the fact that no one
uses these things anyway, I don't see an argument for keeping
dead code in the kernel.

Over the long term the goal is to not break user space binaries.

I see a better chance of achieving the goal of not breaking user space
binaries if we remove interfaces that no known user space applications
use, in a way a well written application can handle, then to let
the user space interface code succumb to bit rot, and start returning
the wrong values to user space.

That is where we are at with sys_sysctl.
Almost all of the binary paths have no known users and the
implementations are succumbing to bit rot.  The binary interface and
the proc interface go through two completely separate paths so there
is little to ensure those paths don't diverge over time.

It is also true that the non-generic helper functions are diverging
over time.  Currently these things are not an issue because no one
actually uses the binary interfaces.  The empirical evidence seems
overwhelming on this point.

So just freezing us at our current set of non-broken binary sysctls
does not seem sufficient to ensure we don't break user space binaries.
Although it does seem to be a good start.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Crash report 2.6.22.5

2007-08-28 Thread Michal Piotrowski

Hi Pete,

On 28/08/07, Pete Monroe <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Sorry there's not more to go on here.
>
> A 32-bit firewall running the kernel LVS virtual server to fan out to
> a dozen webservers ran fine for a year using  2.6.17.13, but won't
> last more than four hours or so with 2.6.22.5.  Another server,
> different hardware and vendor but same purpose, also crashed with
> 2.6.22.5 after a few hours.  It had previously run 2.6.20.11.  Nothing
> on the screen, nothing in the logs.
>
> I'm attaching zipped dmesg (both kernel versions),

Could you capture the bug with serial/netconsole etc.?

"Collecting kernel messages"
http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf
for more info.

> .config and lspci
> -v output for one of the machines, a Dell Intel dual-Xeon box.  The
> other machine is a dual Athlon box.  Both use SCSI drives (the
> attached Dell uses MPT Fusion, the other one Adaptec.)  Intel ethernet
> on both.
>
> I did enable the Slub allocator in 2.6.22.5, figuring that if it is
> going to be the default in 2.6.23 that it's probably solid in .22.5.
>
> PLMK if any more info would be useful.
>
> Thanks,
> Pete
>
>

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 01/28] Fall back on interrupt disable in cmpxchg8b on 80386 and 80486

2007-08-28 Thread Nick Piggin


Mathieu Desnoyers wrote:

* Nick Piggin ([EMAIL PROTECTED]) wrote:


Mathieu Desnoyers wrote:



Q:
What's the reason to have cmpxchg64_local on 32 bit architectures?
Without that need all this would just be a few simple defines.

A:
cmpxchg64_local on 32 bits architectures takes unsigned long long
parameters, but cmpxchg_local only takes longs. Since we have cmpxchg8b
to execute a 8 byte cmpxchg atomically on pentium and +, it makes sense
to provide a flavor of cmpxchg and cmpxchg_local using this instruction.

Also, for 32 bits architectures lacking the 64 bits atomic cmpxchg, it
makes sense _not_ to define cmpxchg64 while cmpxchg could still be
available.

Moreover, the fallback for cmpxchg8b on i386 for 386 and 486 is a
different case than cmpxchg (which is only required for 386). Using
different code makes this easier.

However, cmpxchg64_local will be emulated by disabling interrupts on all
architectures where it is not supported atomically.

Therefore, we *could* turn cmpxchg64_local into a cmpxchg_local, but it
would make the 386/486 fallbacks ugly, make its design different from
cmpxchg/cmpxchg64 (which really depends on atomic operations and cannot
be emulated) and require the __cmpxchg_local to be expressed as a macro
rather than an inline function so the parameters would not be fixed to
unsigned long long in every case.

So I think cmpxchg64_local makes sense there, but I am open to
suggestions.


Every new thing like this (especially 64 bit operation on 32 bit
architectures) adds a tiny bit more burden for maintainers. Are
there any callers? If not, don't add it. It's simple to add if we
do get a good reason.




I am actually using it in LTTng in my timestamping code. I use it to
work around CPUs with asynchronous TSCs. I need to update 64 bits
values atomically on this 32 bits architecture.

I plan to submit this timestamping code soon.


OK fair enough. So long as there is a user (and you are sure said
user is going to get upstream -- sometimes it is easier to put
this patchset in with the one that is going to call it, but OTOH
that can turn people off reviewing).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [07/36] Use page_cache_xxx in mm/filemap_xip.c

2007-08-28 Thread Nick Piggin


Christoph Hellwig wrote:

On Tue, Aug 28, 2007 at 09:49:38PM +0200, J??rn Engel wrote:


On Tue, 28 August 2007 12:05:58 -0700, [EMAIL PROTECTED] wrote:



-   index = *ppos >> PAGE_CACHE_SHIFT;
-   offset = *ppos & ~PAGE_CACHE_MASK;
+   index = page_cache_index(mapping, *ppos);
+   offset = page_cache_offset(mapping, *ppos);


Part of me feels inclined to marge this patch now because it makes the
code more readable, even if page_cache_index() is implemented as
#define page_cache_index(mapping, pos) ((pos) >> PAGE_CACHE_SHIFT)

I know there is little use in yet another global search'n'replace
wankfest and Andrew might wash my mouth just for mentioning it.  Still,
hard to dislike this part of your patch.



Yes, I I suggested that before.  Andrew seems to somehow hate this
patchset, but even if we don;'t get it in the lowercase macros are much
much better then the current PAGE_CACHE_* confusion.


I don't mind the change either. The open coded macros are very
recognisable, but it isn't hard to have a typo and get one
slightly wrong.

If it goes upstream now it wouldn't have the mapping argument
though, would it? Or the need to replace PAGE_CACHE_SIZE I guess.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -rt 1/8] introduce PICK_FUNCTION

2007-08-28 Thread Nick Piggin


Daniel Walker wrote:

PICK_FUNCTION() is similar to the other PICK_OP style macros, and was
created to replace them all. I used variable argument macros to handle
PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the
original macros used for semaphores. The entire system is used to do a
compile time switch between two different locking APIs. For example,
real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks).

This new macro replaces all the duplication from lock type to lock type.
The result of this patch, and the next two, is a fairly nice simplification,
and consolidation. Although the seqlock changes are larger than the originals
I think over all the patchset is worth while.

Incorporated peterz's suggestion to not require TYPE_EQUAL() to only
use pointers.


How come this is cc'ed to lkml? Is it something that is relevant to
the mainline kernel... or?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4: BAD regression

2007-08-28 Thread Linus Torvalds


Len? Should we just revert it?

That commit has been very painful. First it lost all registration of the 
query methods, and now this.

Daniel - can we please have a before/after dmesg on your machine, 
preferably with ACPI debugging enabled? And for ACPI stuff, it usually 
does help to fill in a bugzilla entry, since the ACPI people actually do 
track things there...

Linus

On Wed, 29 Aug 2007, Daniel Ritz wrote:
>
> tried that one on my old toshiba tecra 8000 laptop, almost killing it.
> the fan doesn't work any more...type 'make' and see the box dying.
> luckily my CPU doesn't commit suicide...bisected it to that one:
> 
> cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit
> commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b
> Author: Alexey Starikovskiy <[EMAIL PROTECTED]>
> Date:   Fri Aug 3 17:52:48 2007 -0400
> 
> ACPI: EC: If ECDT is not found, look up EC in DSDT.
> 
> Some ASUS laptops access EC space from device _INI methods, but do not
> provide ECDT for early EC setup. In order to make them function properly,
> there is a need to find EC is DSDT before any _INI is called.
> 
> Similar functionality was turned on by acpi_fake_ecdt=1 command line
> before. Now it is on all the time.
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8598
> 
> Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]>
> Signed-off-by: Len Brown <[EMAIL PROTECTED]>
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFSv4 client OOPS on 2.6.22-rc3 - I meant 2.6.23-rc3

2007-08-28 Thread Michal Piotrowski

Hi Harry,

On 28/08/07, Harry Edmon <[EMAIL PROTECTED]> wrote:
> Typo in my last message - I meant 2.6.23-rc3, not 2.6.22-rc3.  Here it
> is again with correction
>
> I had a kernel oops on my x86_64 dual quad-core Xeon system running
> 2.6.23-rc3.  The system is an NFSv4 client to another 2.6.23-rc3
> system.  The OOPS text is attached and the config file.
>

Is this a regression? Does 2.6.22 work fine?

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4: maxcpus still broken

2007-08-28 Thread Michal Piotrowski

Hi Alexey,

On 28/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> Every time I try to boot with maxcpus=1 it dies show_stat():

Is this a regression?

Hugh fixed some issues on x86-64 commit 813409771731d80e6fa94199adf99f2269a4afc0

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc4: BAD regression

2007-08-28 Thread Daniel Ritz

tried that one on my old toshiba tecra 8000 laptop, almost killing it.
the fan doesn't work any more...type 'make' and see the box dying.
luckily my CPU doesn't commit suicide...bisected it to that one:

cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b is first bad commit
commit cd8c93a4e04dce8f00d1ef3a476aac8bd65ae40b
Author: Alexey Starikovskiy <[EMAIL PROTECTED]>
Date:   Fri Aug 3 17:52:48 2007 -0400

ACPI: EC: If ECDT is not found, look up EC in DSDT.

Some ASUS laptops access EC space from device _INI methods, but do not
provide ECDT for early EC setup. In order to make them function properly,
there is a need to find EC is DSDT before any _INI is called.

Similar functionality was turned on by acpi_fake_ecdt=1 command line
before. Now it is on all the time.

http://bugzilla.kernel.org/show_bug.cgi?id=8598

Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] v3 of IBM power meter driver

2007-08-28 Thread Darrick J. Wong

Dave Hansen complained about the magic numbers, repetitive code, and
various other minor problems with the driver code, so here's a v3 with
the magic numbers migrated to the top of the file and #define'd,
helper macros taking place of the bit shifting/masking activities, and
the compression of the value/min/max sysfs code into parameterized
functions.
--
ibm_pex: Driver to export IBM PowerExecutive power meter sensors.

Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
---

 drivers/hwmon/Kconfig  |   12 +
 drivers/hwmon/Makefile |1 
 drivers/hwmon/ibmpex.c |  564 
 3 files changed, 577 insertions(+), 0 deletions(-)

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 555f470..41ffa2e 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -275,6 +275,18 @@ config SENSORS_CORETEMP
  sensor inside your CPU. Supported all are all known variants
  of Intel Core family.
 
+config SENSORS_IBMPEX
+   tristate "IBM PowerExecutive temperature/power sensors"
+   depends on IPMI_SI
+   help
+ If you say yes here you get support for the temperature and
+ power sensors in various IBM System X servers that support
+ PowerExecutive.  So far this includes the x3550, x3650, x3655,
+ x3755, and certain HS20 blades.
+
+ This driver can also be built as a module.  If so, the module
+ will be called ibmpex.
+
 config SENSORS_IT87
tristate "ITE IT87xx and compatibles"
select HWMON_VID
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index a133981..31da6fe 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -35,6 +35,7 @@ obj-$(CONFIG_SENSORS_FSCPOS)  += fscpos.o
 obj-$(CONFIG_SENSORS_GL518SM)  += gl518sm.o
 obj-$(CONFIG_SENSORS_GL520SM)  += gl520sm.o
 obj-$(CONFIG_SENSORS_HDAPS)+= hdaps.o
+obj-$(CONFIG_SENSORS_IBMPEX)   += ibmpex.o
 obj-$(CONFIG_SENSORS_IT87) += it87.o
 obj-$(CONFIG_SENSORS_K8TEMP)   += k8temp.o
 obj-$(CONFIG_SENSORS_LM63) += lm63.o
diff --git a/drivers/hwmon/ibmpex.c b/drivers/hwmon/ibmpex.c
new file mode 100644
index 000..632f897
--- /dev/null
+++ b/drivers/hwmon/ibmpex.c
@@ -0,0 +1,564 @@
+/*
+ * A hwmon driver for the IBM PowerExecutive temperature/power sensors
+ * Copyright (C) 2007 IBM
+ *
+ * Author: Darrick J. Wong <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define REFRESH_INTERVAL   (5 * HZ)
+#define DRVNAME"ibmpex"
+
+#define PEX_GET_VERSION1
+#define PEX_GET_SENSOR_COUNT   2
+#define PEX_GET_SENSOR_NAME3
+#define PEX_GET_SENSOR_DATA6
+
+#define PEX_NET_FUNCTION   0x3A
+#define PEX_COMMAND0x3C
+
+static inline u16 extract_value(const char *data, int offset)
+{
+   u16 val = *(u16*)[offset];
+   return be16_to_cpu(val);
+}
+
+#define PEX_INTERFACE(idx) ((idx) >> 16)
+#define PEX_SENSOR(idx)(((idx) >> 8) & 0xFF)
+#define PEX_FUNC(idx)  ((idx) & 0xFF)
+#define PEX_INDEX(iface, num, fn)  (((iface) << 16) | ((num) << 8) | (fn))
+
+#define PEX_SENSOR_TYPE_LEN3
+static char power_sensor_sig[] = {0x70, 0x77, 0x72};
+static char temp_sensor_sig[]  = {0x74, 0x65, 0x6D};
+
+#define PEX_MULT_LEN   2
+static char watt_sensor_sig[]  = {0x41, 0x43};
+
+#define PEX_NUM_SENSOR_FUNCS   3
+static char *sensor_name_templates[] = {
+   "%s%d_input",
+   "%s%d_min_input",
+   "%s%d_max_input"
+};
+
+static void ibmpex_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
+static void ibmpex_register_bmc(int iface, struct device *dev);
+static void ibmpex_bmc_gone(int iface);
+
+struct ibmpex_sensor_data {
+   int in_use;
+   s16 values[PEX_NUM_SENSOR_FUNCS];
+   int multiplier;
+
+   struct sensor_device_attribute  attr[PEX_NUM_SENSOR_FUNCS];
+};
+
+struct ibmpex_bmc_data {
+   struct list_headlist;
+   struct class_device *class_dev;
+   struct device   *bmc_device;
+   struct mutexlock;
+   charvalid;
+   unsigned long   last_updated;

Re: 2.6.23-rc3 USB segfaults + urb status -32

2007-08-28 Thread Michal Piotrowski

Hi Lasse,

On 25/08/07, Lasse Kärkkäinen <[EMAIL PROTECTED]> wrote:
> My system is unusably unstable using this kernel.

Does 2.6.22 work fine?

> On last boot it
> started flooding urb status -32 to kernel log at a rate of several
> megabytes per second. Now it printed segfaults before the system had
> finished booting and then some other errors... The full log is here:
>
> I couldn't find information on these bugs. If you need more debug info,
> please contact me. I can also reproduce the errors without the Nvidia
> kernel module,

Yes, please reproduce this error without nvidia binary crap and CC to
[EMAIL PROTECTED]

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/4] 2.6.23-rc3: known regressions v3

2007-08-28 Thread Michal Piotrowski

Hi Stephen,

On 24/08/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> O
> > Subject : New wake ups from sky2
> > References  : http://lkml.org/lkml/2007/7/20/386
> > Last known good : ?
> > Submitter   : Thomas Meyer <[EMAIL PROTECTED]>
> > Caused-By   : Stephen Hemminger <[EMAIL PROTECTED]>
> >   commit eb35cf60e462491249166182e3e755d3d5d91a28
> > Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
> > Status  : unknown
> >
> >
>
> Fix posted to netdev (sky2 1.17 series), but Jeff hasn't
> applied it.
>

commit 32c2c30085324aef9699934295281cca0161ef7e I guess

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Christoph Hellwig

On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote:
> +When:September 2010
> +Option: CONFIG_SYSCTL_SYSCALL
> +Why: The same information is available in a more convenient from
> + /proc/sys, and none of the sysctl variables appear to be
> + important performance wise.
> +
> + Binary sysctls are a long standing source of subtle kernel
> + bugs and security issues.
> +
> + When I looked several months ago all I could find after
> + searching several distributions were 5 user space programs and
> + glibc (which falls back to /proc/sys) using this syscall.

Umm, no way we're ever going to remove a syscall like this.  Please
stop this deprecration crap.  Just make sure no ones adds more binary
sysctls.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Daniel Phillips

On Friday 24 August 2007 03:45, Theodore Tso wrote:
> > As I said; what's wrong with just using SPI membership?  It's not
> > like it is remotely hard for kernel hackers to gain membership in
> > SPI.  And somebody else takes care of the bureaucracy for you.
>
> Given the huge overlap between SPI membership and Debian membership,
> and then taking a look at the craziness that takes place on various
> Debian mailing lists, such as but not limited to debian-legal, I'm
> quite convinced that this would be a baad idea.

Hi Ted,

Ever watched a legislative assembly at work?  A bad idea perhaps, but 
the best that has been discovered so far.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix find_next_best_node (Re: [BUG] 2.6.23-rc3-mm1 Kernel panic - not syncing: DMA: Memory would be corrupted)

2007-08-28 Thread Adam Litke

On Fri, 2007-08-24 at 15:53 +0900, Yasunori Goto wrote:
> I found find_next_best_node() was wrong.
> I confirmed boot up by the following patch.
> Mel-san, Kamalesh-san, could you try this?

FYI: This patch also allows the alloc-instantiate-race testcase in
libhugetlbfs to pass again :)

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] netlink: use container_of instead

2007-08-28 Thread David Miller

From: Denis Cheng <[EMAIL PROTECTED]>
Date: Wed, 29 Aug 2007 03:12:04 +0800

> this could make future redesign of struct netlink_sock easier.
> 
> Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>

Seems reasonable, patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

2007-08-28 Thread Eric W. Biederman


After adding checking to register_sysctl_table and finding a whole new
set of bugs.  Missed by countless code reviews and testers I have
finally lost patience with the binary sysctl interface.

The binary sysctl interface has been sort of deprecated for years and
finding a user space program that uses the syscall is more difficult
then finding a needle in a haystack.  Problems continue to crop up,
with the in kernel implementation.  So since supporting something that
no one uses is silly, deprecate sys_sysctl with a sufficient grace
period and notice that the handful of user space applications that
care can be fixed or replaced.

The /proc/sys sysctl interface that people use will continue to be
supported indefinitely.

This patch moves the tested warning about sysctls from the path where
sys_sysctl to a separate path called from both implementations of
sys_sysctl, and it adds a proper entry into
Documentation/feature-removal-schedule.

Allowing us to revisit this in a couple years time and actually kill
sys_sysctl.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 Documentation/feature-removal-schedule.txt |   35 
 kernel/sysctl.c|   62 +--
 2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index a43d287..4d3097e 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -290,3 +290,38 @@ Why:   All mthca hardware also supports MSI-X, which 
provides
 Who:   Roland Dreier <[EMAIL PROTECTED]>
 
 ---
+
+What:  sys_sysctl
+When:  September 2010
+Option: CONFIG_SYSCTL_SYSCALL
+Why:   The same information is available in a more convenient from
+   /proc/sys, and none of the sysctl variables appear to be
+   important performance wise.
+
+   Binary sysctls are a long standing source of subtle kernel
+   bugs and security issues.
+
+   When I looked several months ago all I could find after
+   searching several distributions were 5 user space programs and
+   glibc (which falls back to /proc/sys) using this syscall.
+
+   The man page for sysctl(2) documents it as unusable for user
+   space programs.
+
+   sysctl(2) is not generally ABI compatible to a 32bit user
+   space application on a 64bit and a 32bit kernel.
+
+   For the last several months the policy has been no new binary
+   sysctls and no one has put forward an argument to use them.
+
+   Binary sysctls issues seem to keep happening appearing so
+   properly deprecating them (with a warning to user space) and a
+   2 year grace warning period will mean eventually we can kill
+   them and end the pain.
+
+   In the mean time individual binary sysctls can be dealt with
+   in a piecewise fashion.
+
+Who:   Eric Biederman <[EMAIL PROTECTED]>
+
+---
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6d01497..792e6fe 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1275,6 +1275,33 @@ struct ctl_table_header *sysctl_head_next(struct 
ctl_table_header *prev)
return NULL;
 }
 
+static int deprecated_sysctl_warning(struct __sysctl_args *args)
+{
+   static int msg_count;
+   int name[CTL_MAXNAME];
+   int i;
+
+   /* Read in the sysctl name for better debug message logging */
+   for (i = 0; i < args->nlen; i++)
+   if (get_user(name[i], args->name + i))
+   return -EFAULT;
+
+   /* Ignore accesses to kernel.version */
+   if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == 
KERN_VERSION))
+   return 0;
+
+   if (msg_count < 5) {
+   msg_count++;
+   printk(KERN_INFO
+   "warning: process `%s' used the deprecated sysctl "
+   "system call with ", current->comm);
+   for (i = 0; i < args->nlen; i++)
+   printk("%d.", name[i]);
+   printk("\n");
+   }
+   return 0;
+}
+
 #ifdef CONFIG_SYSCTL_SYSCALL
 int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user 
*oldlenp,
   void __user *newval, size_t newlen)
@@ -1310,10 +1337,15 @@ asmlinkage long sys_sysctl(struct __sysctl_args __user 
*args)
if (copy_from_user(, args, sizeof(tmp)))
return -EFAULT;
 
+   error = deprecated_sysctl_warning();
+   if (error)
+   goto out;
+
lock_kernel();
error = do_sysctl(tmp.name, tmp.nlen, tmp.oldval, tmp.oldlenp,
  tmp.newval, tmp.newlen);
unlock_kernel();
+out:
return error;
 }
 #endif /* CONFIG_SYSCTL_SYSCALL */
@@ -2503,35 +2535,19 @@ int sysctl_ms_jiffies(struct ctl_table *table, int 
__user *name, int nlen,
 
 asmlinkage long sys_sysctl(struct __sysctl_args __user *args)
 {
-

Re: [patch] sched: fix broken smt/mc optimizations with CFS

2007-08-28 Thread Siddha, Suresh B

On Mon, Aug 27, 2007 at 12:31:03PM -0700, Siddha, Suresh B wrote:
> Essentially I observed that nice 0 tasks still endup on two cores of same
> package, with out getting spread out to two different packages. This behavior
> is same with out this fix and this fix doesn't help in any way.

Ingo, Appended patch seems to fix the issue and as far as I can test, seems ok
to me.

This is a quick fix for .23. Peter Williams and myself plan to look at
code cleanups in this area (HT/MC optimizations) post .23

BTW, with this fix, do you want to retain the current FUZZ value?

thanks,
suresh
--

Try to fix MC/HT scheduler optimization breakage again, with out breaking
the FUZZ logic.

First fix the check
if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task)
with this
if (*imbalance < busiest_load_per_task)

As the current check is always false for nice 0 tasks (as SCHED_LOAD_SCALE_FUZZ
is same as busiest_load_per_task for nice 0 tasks).

With the above change, imbalance was getting reset to 0 in the corner case
condition, making the FUZZ logic fail. Fix it by not corrupting the
imbalance and change the imbalance, only when it finds that the
HT/MC optimization is needed.

Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

diff --git a/kernel/sched.c b/kernel/sched.c
index 9fe473a..03e5e8d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2511,7 +2511,7 @@ group_next:
 * a think about bumping its value to force at least one task to be
 * moved
 */
-   if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task) {
+   if (*imbalance < busiest_load_per_task) {
unsigned long tmp, pwr_now, pwr_move;
unsigned int imbn;

@@ -2563,10 +2563,8 @@ small_imbalance:
pwr_move /= SCHED_LOAD_SCALE;

/* Move if we gain throughput */
-   if (pwr_move <= pwr_now)
-   goto out_balanced;
-
-   *imbalance = busiest_load_per_task;
+   if (pwr_move > pwr_now)
+   *imbalance = busiest_load_per_task;
}

return busiest;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cpufreq affects traffic control rates

2007-08-28 Thread Stephen Hemminger

On Tue, 28 Aug 2007 09:51:55 +0200
DervishD <[EMAIL PROTECTED]> wrote:

> Hi all :)
> 
> I noticed lately that my traffic control rates were being very slow,
> about 40% less than expected, and finally spotted the problem: cpufreq.
> 
> Looks like HTB puts buckets according to the requested rate but
> assuming that the CPU is running at its default clock or something like
> that.
> 
> Any way of fixing this without disabling cpufreq?
> 
> I'm using kernel 2.6.20.14, Athlon64 1000/1800MHz, HZ=1000 and a
> combination of HTB/SFQ in my traffic control.
> 
> Thanks a lot in advance :)
> 
> Raúl Núñez de Arenas Coronado
> 

Is the problem configuration of network scheduler clock? In 2.6.20 and earlier, 
you
could use CPU cycle counter (later kernels only use time of day).  So try
switching to jiffies or gettimeofday.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Christoph Lameter

On Mon, 27 Aug 2007, Jes Sorensen wrote:

> Right now it looks like we have a list of sane candidates up, which I
> certainly would be willing to vote for. However, it would be a shame
> that the credibility of the election is lost because of sticking to an
> undemocratic voting procedure. A procedure which it in fact was stated
> when the board was created last year, would be replaced this year.

Democracy is an ideal that is not attainable. A representative democracy 
is usually the best you can get. So you need people that have some 
competence to contribute to the endeavor. And AFAICT we approximate that 
reasonably. Many of the people that were not subject to the git commit 
quota are experienced hands that are valuable because of their experience 
with Linux and the Summit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 3/3] PM: Improve handling of ACPI system state indicator (rev. 3)

2007-08-28 Thread Rafael J. Wysocki

On Tuesday, 28 August 2007 21:57, Moore, Robert wrote:
> Since these changes appear to affect the ACPICA core in a fairly big
> way, I would like to see a short, concise description of each change and
> why it is necessary.

All right.  I'll describe the changes made by the current version of the
patches, but please note that if it's safe to run the AML interpreter with
IRQs disabled, it's better to do some of them in a different way.

1. Remove the execution of _GTS from acpi_enter_sleep_state_prep()

acpi_enter_sleep_state_prep() is called before disabling the nonboot
CPUs and _GTS should be executed after that, according to the spec.

2. Introduce acpi_enter_sleep_state_prep_late() that will execute _GTS

Necessary because of 1.

3. Split acpi_leave_sleep_state() into two functions:
   acpi_leave_sleep_state_prep() and acpi_leave_sleep_state().

acpi_leave_sleep_state_prep() contains the code that should be executed
before enabling the nonboot CPUs, most importantly the execution of
_BFS, and acpi_leave_sleep_state() contains the remaining code (the
enabling of GPEs, the execution of _WAK and the enabling of power
buttons)

4. Change the code ordering in acpi_leave_sleep_state_prep() (introduced
   in 3.) so that _SST is executed after _BFS

According to the spec, _BFS should be the first ACPI method executed
after leaving a sleep state

5. Introduce acpi_set_sleep_state_indicator() that will execute _SST for given
   ACPI sleep state

Needed so that we can set the state indicator independently of the
other lower-level operations.

6. Remove the execution of _SST from acpi_leave_sleep_state()

No longer needed, because we can use acpi_set_sleep_state_indicator()
to set the state indicator appropriately from higher level routines.

The other changes affect only drivers/acpi/sleep/main.c and the files in
kernel/power .

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rt 2/8] spinlocks/rwlocks: use PICK_FUNCTION()

2007-08-28 Thread Daniel Walker

Reaplace old PICK_OP style macros with the new PICK_FUNCTION macro.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 include/linux/sched.h|   13 -
 include/linux/spinlock.h |  345 ++-
 kernel/rtmutex.c |2 
 lib/dec_and_lock.c   |2 
 4 files changed, 111 insertions(+), 251 deletions(-)

Index: linux-2.6.22/include/linux/sched.h
===
--- linux-2.6.22.orig/include/linux/sched.h
+++ linux-2.6.22/include/linux/sched.h
@@ -2022,17 +2022,8 @@ extern int __cond_resched_raw_spinlock(r
 extern int __cond_resched_spinlock(spinlock_t *spinlock);
 
 #define cond_resched_lock(lock) \
-({ \
-   int __ret;  \
-   \
-   if (TYPE_EQUAL((lock), raw_spinlock_t)) \
-   __ret = __cond_resched_raw_spinlock((raw_spinlock_t *)lock);\
-   else if (TYPE_EQUAL(lock, spinlock_t))  \
-   __ret = __cond_resched_spinlock((spinlock_t *)lock); \
-   else __ret = __bad_spinlock_type(); \
-   \
-   __ret;  \
-})
+   PICK_SPIN_OP_RET(__cond_resched_raw_spinlock, __cond_resched_spinlock,\
+lock)
 
 extern int cond_resched_softirq(void);
 extern int cond_resched_softirq_context(void);
Index: linux-2.6.22/include/linux/spinlock.h
===
--- linux-2.6.22.orig/include/linux/spinlock.h
+++ linux-2.6.22/include/linux/spinlock.h
@@ -91,6 +91,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -162,7 +163,7 @@ extern void __lockfunc rt_spin_unlock_wa
 extern int __lockfunc
 rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags);
 extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
-extern int _atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
+extern int _atomic_dec_and_spin_lock(spinlock_t *lock, atomic_t *atomic);
 
 /*
  * lockdep-less calls, for derived types like rwlock:
@@ -243,54 +244,6 @@ do {   
\
 #  define _spin_trylock_irqsave(l,f)   TSNBCONRT(l)
 #endif
 
-#undef TYPE_EQUAL
-#define TYPE_EQUAL(lock, type) \
-   __builtin_types_compatible_p(typeof(lock), type *)
-
-#define PICK_OP(op, lock)  \
-do {   \
-   if (TYPE_EQUAL((lock), raw_spinlock_t)) \
-   __spin##op((raw_spinlock_t *)(lock));   \
-   else if (TYPE_EQUAL(lock, spinlock_t))  \
-   _spin##op((spinlock_t *)(lock));\
-   else __bad_spinlock_type(); \
-} while (0)
-
-#define PICK_OP_RET(op, lock...)   \
-({ \
-   unsigned long __ret;\
-   \
-   if (TYPE_EQUAL((lock), raw_spinlock_t)) \
-   __ret = __spin##op((raw_spinlock_t *)(lock));   \
-   else if (TYPE_EQUAL(lock, spinlock_t))  \
-   __ret = _spin##op((spinlock_t *)(lock));\
-   else __ret = __bad_spinlock_type(); \
-   \
-   __ret;  \
-})
-
-#define PICK_OP2(op, lock, flags)  \
-do {   \
-   if (TYPE_EQUAL((lock), raw_spinlock_t)) \
-   __spin##op((raw_spinlock_t *)(lock), flags);\
-   else if (TYPE_EQUAL(lock, spinlock_t))  \
-   _spin##op((spinlock_t *)(lock), flags); \
-   else __bad_spinlock_type(); \
-} while (0)
-
-#define PICK_OP2_RET(op, lock, flags)  \
-({ \
-   unsigned long __ret;\
-   \
-   if (TYPE_EQUAL((lock), raw_spinlock_t)) \
-   __ret = __spin##op((raw_spinlock_t *)(lock), flags);\
-   else if (TYPE_EQUAL(lock, spinlock_t))  \
-   __ret =

[PATCH -rt 1/8] introduce PICK_FUNCTION

2007-08-28 Thread Daniel Walker

PICK_FUNCTION() is similar to the other PICK_OP style macros, and was
created to replace them all. I used variable argument macros to handle
PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the
original macros used for semaphores. The entire system is used to do a
compile time switch between two different locking APIs. For example,
real spinlocks (raw_spinlock_t) and mutexes (or sleeping spinlocks).

This new macro replaces all the duplication from lock type to lock type.
The result of this patch, and the next two, is a fairly nice simplification,
and consolidation. Although the seqlock changes are larger than the originals
I think over all the patchset is worth while.

Incorporated peterz's suggestion to not require TYPE_EQUAL() to only
use pointers.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 include/linux/pickop.h  |   36 +
 include/linux/rt_lock.h |  129 +++-
 2 files changed, 77 insertions(+), 88 deletions(-)

Index: linux-2.6.22/include/linux/pickop.h
===
--- /dev/null
+++ linux-2.6.22/include/linux/pickop.h
@@ -0,0 +1,36 @@
+#ifndef _LINUX_PICKOP_H
+#define _LINUX_PICKOP_H
+
+#undef TYPE_EQUAL
+#define TYPE_EQUAL(var, type) \
+   __builtin_types_compatible_p(typeof(var), type *)
+
+#undef PICK_TYPE_EQUAL
+#define PICK_TYPE_EQUAL(var, type) \
+   __builtin_types_compatible_p(typeof(var), type)
+
+extern int __bad_func_type(void);
+
+#define PICK_FUNCTION(type1, type2, func1, func2, arg0, ...)   \
+do {   \
+   if (PICK_TYPE_EQUAL((arg0), type1)) \
+   func1((type1)(arg0), ##__VA_ARGS__);\
+   else if (PICK_TYPE_EQUAL((arg0), type2))\
+   func2((type2)(arg0), ##__VA_ARGS__);\
+   else __bad_func_type(); \
+} while (0)
+
+#define PICK_FUNCTION_RET(type1, type2, func1, func2, arg0, ...)   \
+({ \
+   unsigned long __ret;\
+   \
+   if (PICK_TYPE_EQUAL((arg0), type1)) \
+   __ret = func1((type1)(arg0), ##__VA_ARGS__);\
+   else if (PICK_TYPE_EQUAL((arg0), type2))\
+   __ret = func2((type2)(arg0), ##__VA_ARGS__);\
+   else __ret = __bad_func_type(); \
+   \
+   __ret;  \
+})
+
+#endif /* _LINUX_PICKOP_H */
Index: linux-2.6.22/include/linux/rt_lock.h
===
--- linux-2.6.22.orig/include/linux/rt_lock.h
+++ linux-2.6.22/include/linux/rt_lock.h
@@ -156,76 +156,40 @@ extern void fastcall rt_up(struct semaph
 
 extern int __bad_func_type(void);
 
-#undef TYPE_EQUAL
-#define TYPE_EQUAL(var, type) \
-   __builtin_types_compatible_p(typeof(var), type *)
-
-#define PICK_FUNC_1ARG(type1, type2, func1, func2, arg)
\
-do {   \
-   if (TYPE_EQUAL((arg), type1))   \
-   func1((type1 *)(arg));  \
-   else if (TYPE_EQUAL((arg), type2))  \
-   func2((type2 *)(arg));  \
-   else __bad_func_type(); \
-} while (0)
+#include 
 
-#define PICK_FUNC_1ARG_RET(type1, type2, func1, func2, arg)\
-({ \
-   unsigned long __ret;\
-   \
-   if (TYPE_EQUAL((arg), type1))   \
-   __ret = func1((type1 *)(arg));  \
-   else if (TYPE_EQUAL((arg), type2))  \
-   __ret = func2((type2 *)(arg));  \
-   else __ret = __bad_func_type(); \
-   \
-   __ret;  \
-})
-
-#define PICK_FUNC_2ARG(type1, type2, func1, func2, arg0, arg1) \
-do {   \
-   if (TYPE_EQUAL((arg0), type1))  \
-   func1((type1 *)(arg0), arg1);

[PATCH -rt 7/8] latency hist: add resetting for all timing options

2007-08-28 Thread Daniel Walker

I dropped parts of the prior reset method, and added a file called
"reset" into the /proc/latency_hist/ timing directories. It allows
any of the timing options to get their histograms reset.

I also fixed a couple of oddities in the code. Instead of creating a 
file for all NR_CPUS , I just used num_possible_cpus() . I also drop
a string which only hold "CPU" and just inserted it where it was used.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 include/linux/latency_hist.h |1 
 kernel/latency_hist.c|  119 ---
 kernel/latency_trace.c   |   13 
 3 files changed, 80 insertions(+), 53 deletions(-)

Index: linux-2.6.22/include/linux/latency_hist.h
===
--- linux-2.6.22.orig/include/linux/latency_hist.h
+++ linux-2.6.22/include/linux/latency_hist.h
@@ -23,7 +23,6 @@ enum {
 
 #ifdef CONFIG_LATENCY_HIST
 extern void latency_hist(int latency_type, int cpu, unsigned long latency);
-extern void latency_hist_reset(void);
 # define latency_hist_flag 1
 #else
 # define latency_hist(a,b,c) do { (void)(cpu); } while (0)
Index: linux-2.6.22/kernel/latency_hist.c
===
--- linux-2.6.22.orig/kernel/latency_hist.c
+++ linux-2.6.22/kernel/latency_hist.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 typedef struct hist_data_struct {
atomic_t hist_mode; /* 0 log, 1 don't log */
@@ -31,8 +32,6 @@ typedef struct hist_data_struct {
 static struct proc_dir_entry * latency_hist_root = NULL;
 static char * latency_hist_proc_dir_root = "latency_hist";
 
-static char * percpu_proc_name = "CPU";
-
 #ifdef CONFIG_INTERRUPT_OFF_HIST
 static DEFINE_PER_CPU(hist_data_t, interrupt_off_hist);
 static char * interrupt_off_hist_proc_dir = "interrupt_off_latency";
@@ -56,7 +55,7 @@ static inline u64 u64_div(u64 x, u64 y)
 return x;
 }
 
-void latency_hist(int latency_type, int cpu, unsigned long latency)
+void notrace latency_hist(int latency_type, int cpu, unsigned long latency)
 {
hist_data_t * my_hist;
 
@@ -205,6 +204,69 @@ static struct file_operations latency_hi
.release = seq_release,
 };
 
+static void hist_reset(hist_data_t *hist)
+{
+   atomic_dec(>hist_mode);
+
+   memset(hist->hist_array, 0, sizeof(hist->hist_array));
+   hist->beyond_hist_bound_samples = 0UL;
+   hist->min_lat = 0xUL;
+   hist->max_lat = 0UL;
+   hist->total_samples = 0UL;
+   hist->accumulate_lat = 0UL;
+   hist->avg_lat = 0UL;
+
+   atomic_inc(>hist_mode);
+}
+
+ssize_t latency_hist_reset(struct file *file, const char __user *a, size_t 
size, loff_t *off)
+{
+   int cpu;
+   hist_data_t *hist;
+   struct proc_dir_entry *entry_ptr = PDE(file->f_dentry->d_inode);
+   int latency_type = (int)entry_ptr->data;
+
+   switch (latency_type) {
+
+#ifdef CONFIG_WAKEUP_LATENCY_HIST
+   case WAKEUP_LATENCY:
+   for_each_online_cpu(cpu) {
+   hist = _cpu(wakeup_latency_hist, cpu);
+   hist_reset(hist);
+   }
+   break;
+#endif
+
+#ifdef CONFIG_PREEMPT_OFF_HIST
+   case PREEMPT_LATENCY:
+   for_each_online_cpu(cpu) {
+   hist = _cpu(preempt_off_hist, cpu);
+   hist_reset(hist);
+   }
+   break;
+#endif
+
+#ifdef CONFIG_INTERRUPT_OFF_HIST
+   case INTERRUPT_LATENCY:
+   for_each_online_cpu(cpu) {
+   hist = _cpu(interrupt_off_hist, cpu);
+   hist_reset(hist);
+   }
+   break;
+#endif
+   }
+
+   return size;
+}
+
+static struct file_operations latency_hist_reset_seq_fops = {
+   .write = latency_hist_reset,
+};
+
+static struct proc_dir_entry *interrupt_off_reset;
+static struct proc_dir_entry *preempt_off_reset;
+static struct proc_dir_entry *wakeup_latency_reset;
+
 static __init int latency_hist_init(void)
 {
struct proc_dir_entry *tmp_parent_proc_dir;
@@ -214,11 +276,10 @@ static __init int latency_hist_init(void
 
latency_hist_root = proc_mkdir(latency_hist_proc_dir_root, NULL);
 
-
 #ifdef CONFIG_INTERRUPT_OFF_HIST
tmp_parent_proc_dir = proc_mkdir(interrupt_off_hist_proc_dir, 
latency_hist_root);
-   for (i = 0; i < NR_CPUS; i++) {
-   len = sprintf(procname, "%s%d", percpu_proc_name, i);
+   for (i = 0; i < num_possible_cpus(); i++) {
+   len = sprintf(procname, "CPU%d", i);
procname[len] = '\0';
entry[INTERRUPT_LATENCY][i] =
create_proc_entry(procname, 0, tmp_parent_proc_dir);
@@ -228,12 +289,15 @@ static __init int latency_hist_init(void

[PATCH -rt 3/8] seqlocks: use PICK_FUNCTION

2007-08-28 Thread Daniel Walker

Replace the old PICK_OP style macros with PICK_FUNCTION. Although,
seqlocks has some alien code, which I also replaced as can be seen
from the line count below.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 include/linux/pickop.h  |4 
 include/linux/seqlock.h |  235 +++-
 2 files changed, 135 insertions(+), 104 deletions(-)

Index: linux-2.6.22/include/linux/pickop.h
===
--- linux-2.6.22.orig/include/linux/pickop.h
+++ linux-2.6.22/include/linux/pickop.h
@@ -1,10 +1,6 @@
 #ifndef _LINUX_PICKOP_H
 #define _LINUX_PICKOP_H
 
-#undef TYPE_EQUAL
-#define TYPE_EQUAL(var, type) \
-   __builtin_types_compatible_p(typeof(var), type *)
-
 #undef PICK_TYPE_EQUAL
 #define PICK_TYPE_EQUAL(var, type) \
__builtin_types_compatible_p(typeof(var), type)
Index: linux-2.6.22/include/linux/seqlock.h
===
--- linux-2.6.22.orig/include/linux/seqlock.h
+++ linux-2.6.22/include/linux/seqlock.h
@@ -90,6 +90,12 @@ static inline void __write_seqlock(seqlo
smp_wmb();
 }
 
+static __always_inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
+{
+   __write_seqlock(sl);
+   return 0;
+}
+
 static inline void __write_sequnlock(seqlock_t *sl)
 {
smp_wmb();
@@ -97,6 +103,8 @@ static inline void __write_sequnlock(seq
spin_unlock(>lock);
 }
 
+#define __write_sequnlock_irqrestore(sl, flags)__write_sequnlock(sl)
+
 static inline int __write_tryseqlock(seqlock_t *sl)
 {
int ret = spin_trylock(>lock);
@@ -149,6 +157,28 @@ static __always_inline void __write_seql
smp_wmb();
 }
 
+static __always_inline unsigned long
+__write_seqlock_irqsave_raw(raw_seqlock_t *sl)
+{
+   unsigned long flags;
+
+   local_irq_save(flags);
+   __write_seqlock_raw(sl);
+   return flags;
+}
+
+static __always_inline void __write_seqlock_irq_raw(raw_seqlock_t *sl)
+{
+   local_irq_disable();
+   __write_seqlock_raw(sl);
+}
+
+static __always_inline void __write_seqlock_bh_raw(raw_seqlock_t *sl)
+{
+   local_bh_disable();
+   __write_seqlock_raw(sl);
+}
+
 static __always_inline void __write_sequnlock_raw(raw_seqlock_t *sl)
 {
smp_wmb();
@@ -156,6 +186,27 @@ static __always_inline void __write_sequ
spin_unlock(>lock);
 }
 
+static __always_inline void
+__write_sequnlock_irqrestore_raw(raw_seqlock_t *sl, unsigned long flags)
+{
+   __write_sequnlock_raw(sl);
+   local_irq_restore(flags);
+   preempt_check_resched();
+}
+
+static __always_inline void __write_sequnlock_irq_raw(raw_seqlock_t *sl)
+{
+   __write_sequnlock_raw(sl);
+   local_irq_enable();
+   preempt_check_resched();
+}
+
+static __always_inline void __write_sequnlock_bh_raw(raw_seqlock_t *sl)
+{
+   __write_sequnlock_raw(sl);
+   local_bh_enable();
+}
+
 static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl)
 {
int ret = spin_trylock(>lock);
@@ -182,60 +233,93 @@ static __always_inline int __read_seqret
 
 extern int __bad_seqlock_type(void);
 
-#define PICK_SEQOP(op, lock)   \
+/*
+ * PICK_SEQ_OP() is a small redirector to allow less typing of the lock
+ * types raw_seqlock_t, seqlock_t, at the front of the PICK_FUNCTION
+ * macro.
+ */
+#define PICK_SEQ_OP(...)   \
+   PICK_FUNCTION(raw_seqlock_t *, seqlock_t *, ##__VA_ARGS__)
+#define PICK_SEQ_OP_RET(...) \
+   PICK_FUNCTION_RET(raw_seqlock_t *, seqlock_t *, ##__VA_ARGS__)
+
+#define write_seqlock(sl) PICK_SEQ_OP(__write_seqlock_raw, __write_seqlock, sl)
+
+#define write_sequnlock(sl)\
+   PICK_SEQ_OP(__write_sequnlock_raw, __write_sequnlock, sl)
+
+#define write_tryseqlock(sl)   \
+   PICK_SEQ_OP_RET(__write_tryseqlock_raw, __write_tryseqlock, sl)
+
+#define read_seqbegin(sl)  \
+   PICK_SEQ_OP_RET(__read_seqbegin_raw, __read_seqbegin, sl)
+
+#define read_seqretry(sl, iv)  \
+   PICK_SEQ_OP_RET(__read_seqretry_raw, __read_seqretry, sl, iv)
+
+#define write_seqlock_irqsave(lock, flags) \
 do {   \
-   if (TYPE_EQUAL((lock), raw_seqlock_t))  \
-   op##_raw((raw_seqlock_t *)(lock));  \
-   else if (TYPE_EQUAL((lock), seqlock_t)) \
-   op((seqlock_t *)(lock));\
-   else __bad_seqlock_type();  \
+   flags = PICK_SEQ_OP_RET(__write_seqlock_irqsave_raw,\
+   __write_seqlock_irqsave, lock); \
 } while (0)
 
-#define PICK_SEQOP_RET(op, lock)   \
-({ \
-   unsigned long __ret;\
-   \
-   if

[PATCH -rt 8/8] stop critical timing in idle.

2007-08-28 Thread Daniel Walker

without this the idle routine still gets traced.. This is done already
for ACPI idle , but it should also be done for other idle routines.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>
 
---
 arch/i386/kernel/process.c   |9 +
 arch/x86_64/kernel/process.c |   10 ++
 2 files changed, 19 insertions(+)

Index: linux-2.6.22/arch/i386/kernel/process.c
===
--- linux-2.6.22.orig/arch/i386/kernel/process.c
+++ linux-2.6.22/arch/i386/kernel/process.c
@@ -197,8 +197,17 @@ void cpu_idle(void)
if (cpu_is_offline(cpu))
play_dead();
 
+   /*
+* We have irqs disabled here, so stop latency tracing
+* at this point and restart it after we return:
+*/
+   stop_critical_timing();
+
__get_cpu_var(irq_stat).idle_timestamp = jiffies;
idle();
+
+   touch_critical_timing();
+
}
local_irq_disable();
trace_preempt_exit_idle();
Index: linux-2.6.22/arch/x86_64/kernel/process.c
===
--- linux-2.6.22.orig/arch/x86_64/kernel/process.c
+++ linux-2.6.22/arch/x86_64/kernel/process.c
@@ -223,8 +223,18 @@ void cpu_idle (void)
 * Otherwise, idle callbacks can misfire.
 */
local_irq_disable();
+
+   /*
+* We have irqs disabled here, so stop latency tracing
+* at this point and restart it after we return:
+*/
+   stop_critical_timing();
+
enter_idle();
idle();
+
+   touch_critical_timing();
+
/* In many cases the interrupt that ended idle
   has already called exit_idle. But some idle
   loops can be woken up without interrupt. */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rt 6/8] preempt_max_latency in all modes

2007-08-28 Thread Daniel Walker

This enables the /proc/preempt_max_latency facility for timing modes,
even if event tracing is disabled. Wakeup latency was the only one
that had this feature in the past.

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 kernel/sysctl.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22/kernel/sysctl.c
===
--- linux-2.6.22.orig/kernel/sysctl.c
+++ linux-2.6.22/kernel/sysctl.c
@@ -392,7 +392,7 @@ static ctl_table kern_table[] = {
.proc_handler   = _dointvec,
},
 #endif
-#if defined(CONFIG_WAKEUP_TIMING) || defined(CONFIG_EVENT_TRACE)
+#if defined(CONFIG_CRITICAL_TIMING)
{
.ctl_name   = CTL_UNNUMBERED,
.procname   = "preempt_max_latency",

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rt 5/8] latency tracing: use now() consistently

2007-08-28 Thread Daniel Walker

Just get_monotonic_cycles() switched to now() ..

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 kernel/latency_trace.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6.22/kernel/latency_trace.c
===
--- linux-2.6.22.orig/kernel/latency_trace.c
+++ linux-2.6.22/kernel/latency_trace.c
@@ -1751,7 +1751,7 @@ check_critical_timing(int cpu, struct cp
 * as long as possible:
 */
T0 = tr->preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now();
delta = T1-T0;
 
local_save_flags(flags);
@@ -1765,7 +1765,7 @@ check_critical_timing(int cpu, struct cp
 * might change it (it can only get larger so the latency
 * is fair to be reported):
 */
-   T2 = get_monotonic_cycles();
+   T2 = now();
 
delta = T2-T0;
 
@@ -1815,7 +1815,7 @@ check_critical_timing(int cpu, struct cp
printk(" =>   ended at timestamp %lu: ", t1);
print_symbol("<%s>\n", tr->critical_end);
dump_stack();
-   t1 = cycles_to_usecs(get_monotonic_cycles());
+   t1 = cycles_to_usecs(now());
printk(" =>   dump-end timestamp %lu\n\n", t1);
 #endif
 
@@ -1825,7 +1825,7 @@ check_critical_timing(int cpu, struct cp
 
 out:
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now();
tr->early_warning = 0;
reset_trace_idx(cpu, tr);
_trace_cmdline(cpu, tr);
@@ -1874,7 +1874,7 @@ __start_critical_timing(unsigned long ei
atomic_inc(>disabled);
 
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now();
tr->critical_start = eip;
reset_trace_idx(cpu, tr);
tr->latency_type = latency_type;
@@ -2196,7 +2196,7 @@ check_wakeup_timing(struct cpu_trace *tr
goto out;
 
T0 = tr->preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now();
/*
 * Any wraparound or time warp and we are out:
 */
@@ -2314,7 +2314,7 @@ void __trace_start_sched_wakeup(struct t
 // if (!atomic_read(>disabled)) {
atomic_inc(>disabled);
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now();
tr->latency_type = WAKEUP_LATENCY;
tr->critical_start = CALLER_ADDR0;
_trace_cmdline(raw_smp_processor_id(), tr);
@@ -2426,7 +2426,7 @@ long user_trace_start(void)
 
atomic_inc(>disabled);
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now();
tr->critical_start = CALLER_ADDR0;
_trace_cmdline(cpu, tr);
atomic_dec(>disabled);
@@ -2486,7 +2486,7 @@ long user_trace_stop(void)
unsigned long long tmp0;
 
T0 = tr->preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now();
tmp0 = preempt_max_latency;
if (T1 < T0)
T0 = T1;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rt 4/8] fork: desched_thread comment rework.

2007-08-28 Thread Daniel Walker

Lines are too long..

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 kernel/fork.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6.22/kernel/fork.c
===
--- linux-2.6.22.orig/kernel/fork.c
+++ linux-2.6.22/kernel/fork.c
@@ -1787,8 +1787,10 @@ static int desched_thread(void * __bind_
continue;
schedule();
 
-   /* This must be called from time to time on ia64, and is a 
no-op on other archs.
-* Used to be in cpu_idle(), but with the new -rt semantics it 
can't stay there.
+   /*
+* This must be called from time to time on ia64, and is a
+* no-op on other archs. Used to be in cpu_idle(), but with
+* the new -rt semantics it can't stay there.
 */
check_pgt_cache();
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 2/3] PM: More fine grained ACPI handling during suspend and hibernation (rev. 3)

2007-08-28 Thread Rafael J. Wysocki

On Tuesday, 28 August 2007 21:48, Len Brown wrote:
> On Monday 27 August 2007 17:51, Rafael J. Wysocki wrote:
> > According to the ACPI specification (eg. ACPI 2.0c, sec. 7.3.1, 7.3.3,
> > ACPI 3.0b, sec. 7.3.1, 7.3.3) the _GTS and _BFS global control methods 
> > should
> > be executed, respectively, right before entering a sleep state (S1-S4) and 
> > right
> > after leaving it, but we don't follow this reqirement.  Namely, in our
> > implementation the nonboot CPUs are disabled after executing _GTS and 
> > enabled
> > before executing _BFS, which doesn't seem to be correct.
> 
> I've never encountered a BIOS that actually implements _GTS or _BFS,
> so I expect that changing how they are invoked may be somewhat academic.

It is for now, but once we have a system that implements them, we'd most
probably need to change the current code, so I think it's better to consider
that in advance.

> > [In fact, the ACPI 
> > specification requires that no physical I/O and interrupt servicing be 
> > performed
> > after the sleep state has been left and before _BFS is executed as well as 
> > after
> > executing _GTS and before the sleep state is entered, but we can't follow 
> > this
> > requirement literally, 
> 
> > since our AML interpreter needs to run with interrupts 
> > enabled and we need to carry out some operations with interrupts disabled 
> > before
> > entering the sleep state and after leaving it.]
> 
> This is sort of a myth.
> 
> The real requirement is that the ACPI interpreter must be able to call 
> kmalloc().
> It does this today via acpi_os_allocate(), which does this:
> 
> kmalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL);
> 
> No, we don't actually run the interpreter during device interrupts,
> but we need to be able to run it with interrupts off for boot,
> suspend, and resume.

At present, during suspend and resume we always call the AML interpreter
with interrupts enabled.

Frankly, I'd like _BFS and _GTS to be executed with interrupts disabled,
just as the specification tells us to do.  If you think that's safe, I'll
change the patch to work this way.

> So how did boot work before this hack was added?
> kmalloc() does a might_sleep(), but deep down in
> cond_sleep, there is a handy little check for
> if (system_state == SYSTEM_RUNNING)
> to disable the run-time oops.
> 
> I suggested that since it works during boot, and resume is in many
> ways similar to boot, we should just re-use system_state for early resume.
> But at the time, akpm told me not to use system_state, and so we have the 
> hack above.
> 
> I don't recall his reasoning -- it might be something that should
> be re-visited.  I don't like disabling the may_sleep() check all the time,
> I'd rather just disable it during the critical boot/suspend/resume states.
> 
> > Moreover, acpi_enable() called 
> > after restoring the system memory state from a hibernation image should 
> > really
> > be executed before enabling the nonboot CPUs, since functional ACPI may be
> > needed for that.  All of this means that we need to handle ACPI in a more 
> > fine
> > grained manner during suspend and hibernation.
> 
> I don't follow the requirement to boot an ACPI-enabled resume image
> from a non-ACPI-enabled boot kernel.  Certainly this isn't a scenario
> described by the ACPI spec, which transitions between G1(S4) and G0(S0) 
> without
> going through an ACPI-disabled state.

That actually depends on which version of the ACPI specification you consider.

In ACPI 3.0 (and later) there's section 15 "Waking and Sleeping" that
describes, among other things, the supposed system start sequence (in 15.3.3).
It clearly states that we're supposed to check if ACPI is enabled (and enable
if not), only _after_ the hibernation image has been loaded.  After that, in
turn, we should execute _BFS and subsequently _WAK, so my interpretation is
that we should not execute any ACPI methods before that point.

Anyway, however, if the user passes acpi=off to the boot kernel, ACPI may not
be enabled until the image kernel gets control.  Thus, it should always check
if ACPI is enabled (and enable it, if need be) before doing anything
ACPI-related and that should happen before the nonboot CPUs are enabled.
Preferrably, with interrupts off, as that should be done before we attempt to
execute _BFS.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage

2007-08-28 Thread Divy Le Ray


Roland Dreier wrote:

 > I take that back.  Rejected -- it breaks infiniband build.

To be more precise:

drivers/infiniband/hw/cxgb3/cxio_hal.c: In function 'cxio_rdev_open':
drivers/infiniband/hw/cxgb3/cxio_hal.c:919: error: implicit declaration of 
function 'T3CDEV'

it seems the problem is that T3CDEV() has been deleted and been
replaced with the dev2t3cdev() inline function.  However a simple
replacement s/T3CDEV/dev2t3cdev/ in drivers/infiniband/hw/cxgb3
doesn't work because the function has moved from t3cdev.h to
adapter.h; and moving the function back to t3cdev.h doesn't work
because it depends on more structure definitions now.

And at that point I gave up...
  


Sorry about the compilation issue and the delay to reply.
I'll post a follow up for the iw_cxgb3 driver later this evening.
I plan to move the inlined dev2t3cdev() from adapter.h to an exported 
dev2t3cdev()

in cxgb3_offload.c.

Divy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Eric Van Hensbergen

On 8/28/07, Arnd Bergmann <[EMAIL PROTECTED]> wrote:
> On Tuesday 28 August 2007, Eric Van Hensbergen wrote:
>
> > This adds a shared memory transport for a synthetic 9p device for
> > paravirtualized file system support under KVM/QEMU.
>
> Nice driver. I'm hoping we can do a virtio driver using a similar
> concept.
>

Yes.  I'm looking at the patches from Dor now, it should be pretty
straight forward.  The PCI is interesting in its own right for other
(non-virtual) projects we've been playing with

 -eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Linux Kernel Markers - Documentation

2007-08-28 Thread Mathieu Desnoyers

* Christoph Hellwig ([EMAIL PROTECTED]) wrote:
> On Mon, Aug 20, 2007 at 04:27:07PM -0400, Mathieu Desnoyers wrote:
> > Here is some documentation explaining what is/how to use the Linux
> > Kernel Markers.
> 
> While porting my code from an older markers version I noticed the
> marker callbacks have grown a void *private argument.  Add it to
> the documentation aswell.
> 
> 
> Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>
> 
Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

Thanks!

> Index: linux-2.6/Documentation/marker.txt
> ===
> --- linux-2.6.orig/Documentation/marker.txt   2007-08-28 22:50:37.0 
> +0200
> +++ linux-2.6/Documentation/marker.txt2007-08-28 22:51:07.0 
> +0200
> @@ -115,7 +115,7 @@ struct probe_data {
>  };
>  
>  void probe_subsystem_event(const struct __mark_marker *mdata,
> - const char *format, ...)
> + void *private, const char *format, ...)
>  {
>   va_list ap;
>   /* Declare args */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage

2007-08-28 Thread Roland Dreier

 > I take that back.  Rejected -- it breaks infiniband build.

To be more precise:

drivers/infiniband/hw/cxgb3/cxio_hal.c: In function 'cxio_rdev_open':
drivers/infiniband/hw/cxgb3/cxio_hal.c:919: error: implicit declaration of 
function 'T3CDEV'

it seems the problem is that T3CDEV() has been deleted and been
replaced with the dev2t3cdev() inline function.  However a simple
replacement s/T3CDEV/dev2t3cdev/ in drivers/infiniband/hw/cxgb3
doesn't work because the function has moved from t3cdev.h to
adapter.h; and moving the function back to t3cdev.h doesn't work
because it depends on more structure definitions now.

And at that point I gave up...

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Daniel Phillips

On Tuesday 28 August 2007 10:54, Evgeniy Polyakov wrote:
> On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL PROTECTED]) 
> wrote:
> > > We do not care about one cpu being able to increase its counter
> > > higher than the limit, such inaccuracy (maximum bios in flight
> > > thus can be more than limit, difference is equal to the number of
> > > CPUs - 1) is a price for removing atomic operation. I thought I
> > > pointed it in the original description, but might forget, that if
> > > it will be an issue, that atomic operations can be introduced
> > > there. Any uber-precise measurements in the case when we are
> > > close to the edge will not give us any benefit at all, since were
> > > are already in the grey area.
> >
> > This is not just inaccurate, it is suicide.  Keep leaking throttle
> > counts and eventually all of them will be gone.  No more IO
> > on that block device!
>
> First, because number of increased and decreased operations are the
> same, so it will dance around limit in both directions.

No.  Please go and read it the description of the race again.  A count
gets irretrievably lost because the write operation of the first
decrement is overwritten by the second. Data gets lost.  Atomic 
operations exist to prevent that sort of thing.  You either need to use 
them or have a deep understanding of SMP read and write ordering in 
order to preserve data integrity by some equivalent algorithm.

> Let's solve problems in order of their appearence. If bio structure
> will be allowed to grow, then the whole patches can be done better.

How about like the patch below.  This throttles any block driver by
implementing a throttle metric method so that each block driver can
keep track of its own resource consumption in units of its choosing.
As an (important) example, it implements a simple metric for device
mapper devices.  Other block devices will work as before, because
they do not define any metric.  Short, sweet and untested, which is
why I have not posted it until now.

This patch originally kept its accounting info in backing_dev_info,
however that structure seems to be in some and it is just a part of
struct queue anyway, so I lifted the throttle accounting up into
struct queue.  We should be able to report on the efficacy of this
patch in terms of deadlock prevention pretty soon.

--- 2.6.22.clean/block/ll_rw_blk.c  2007-07-08 16:32:17.0 -0700
+++ 2.6.22/block/ll_rw_blk.c2007-08-24 12:07:16.0 -0700
@@ -3237,6 +3237,15 @@ end_io:
  */
 void generic_make_request(struct bio *bio)
 {
+   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+
+   if (q && q->metric) {
+   int need = bio->bi_reserved = q->metric(bio);
+   bio->queue = q;
+   wait_event_interruptible(q->throttle_wait, 
atomic_read(>available) >= need);
+   atomic_sub(>available, need);
+   }
+
if (current->bio_tail) {
/* make_request is active */
*(current->bio_tail) = bio;
--- 2.6.22.clean/drivers/md/dm.c2007-07-08 16:32:17.0 -0700
+++ 2.6.22/drivers/md/dm.c  2007-08-24 12:14:23.0 -0700
@@ -880,6 +880,11 @@ static int dm_any_congested(void *conges
return r;
 }
 
+static unsigned dm_metric(struct bio *bio)
+{
+   return bio->bi_vcnt;
+}
+
 /*-
  * An IDR is used to keep track of allocated minor numbers.
  *---*/
@@ -997,6 +1002,10 @@ static struct mapped_device *alloc_dev(i
goto bad1_free_minor;
 
md->queue->queuedata = md;
+   md->queue->metric = dm_metric;
+   atomic_set(>queue->available, md->queue->capacity = 1000);
+   init_waitqueue_head(>queue->throttle_wait);
+
md->queue->backing_dev_info.congested_fn = dm_any_congested;
md->queue->backing_dev_info.congested_data = md;
blk_queue_make_request(md->queue, dm_request);
--- 2.6.22.clean/fs/bio.c   2007-07-08 16:32:17.0 -0700
+++ 2.6.22/fs/bio.c 2007-08-24 12:10:41.0 -0700
@@ -1025,7 +1025,12 @@ void bio_endio(struct bio *bio, unsigned
bytes_done = bio->bi_size;
}
 
-   bio->bi_size -= bytes_done;
+   if (!(bio->bi_size -= bytes_done) && bio->bi_reserved) {
+   struct request_queue *q = bio->queue;
+   atomic_add(>available, bio->bi_reserved);
+   bio->bi_reserved = 0; /* just in case */
+   wake_up(>throttle_wait);
+   }
bio->bi_sector += (bytes_done >> 9);
 
if (bio->bi_end_io)
--- 2.6.22.clean/include/linux/bio.h2007-07-08 16:32:17.0 -0700
+++ 2.6.22/include/linux/bio.h  2007-08-24 11:53:51.0 -0700
@@ -109,6 +109,9 @@ struct bio {
bio_end_io_t*bi_end_io;
atomic_tbi_cnt; /* pin count */
 
+   struct request_queue

Re: [PATCH 1/2] sysctl: Properly register the irda binary sysctl numbers.

2007-08-28 Thread Eric W. Biederman

[EMAIL PROTECTED] writes:

> On Sat, 25 Aug 2007 11:59:53 MDT, Eric W. Biederman said:
>
>> It looks like you don't have CONFIG_SYSCTL_SYSCALL defined, and it
>> appears utsname_syscall and ipcdata_syscall both become NULL pointers
>> if they aren't needed.  So the complaint is a false positive.
>
> Yep. Nothing I actually use needs SYSCTL_SYSCALL, so I turned it off to
> see what breaks...

Other then glibc (which uses it to see if we are on a SMP system, and
has a fallback to /proc/sys) I only found 5 other applications
binaries when I was looking hard.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Linux Kernel Markers - Documentation

2007-08-28 Thread Christoph Hellwig

On Mon, Aug 20, 2007 at 04:27:07PM -0400, Mathieu Desnoyers wrote:
> Here is some documentation explaining what is/how to use the Linux
> Kernel Markers.

While porting my code from an older markers version I noticed the
marker callbacks have grown a void *private argument.  Add it to
the documentation aswell.


Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Index: linux-2.6/Documentation/marker.txt
===
--- linux-2.6.orig/Documentation/marker.txt 2007-08-28 22:50:37.0 
+0200
+++ linux-2.6/Documentation/marker.txt  2007-08-28 22:51:07.0 +0200
@@ -115,7 +115,7 @@ struct probe_data {
 };
 
 void probe_subsystem_event(const struct __mark_marker *mdata,
-   const char *format, ...)
+   void *private, const char *format, ...)
 {
va_list ap;
/* Declare args */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] drm: introduce drm_zalloc

2007-08-28 Thread Dave Airlie


On Tue, 28 Aug 2007, Christoph Hellwig wrote:


On Mon, Aug 27, 2007 at 10:57:50PM +0200, [EMAIL PROTECTED] wrote:

Hello,

As there are many places in drm code where drm_alloc + memset is used
this patch series introduces drm_zalloc and also makes use of drm_calloc where
needed. Most of these patches save some bytes so the benefit is a few kB saved
(gcc 4.1.2) with patch applied. Also some small (style, etc.) things are fixed.
This patch series does the conversion drm tree-wide. All patches were compile
tested.


Please just convert it to plain kzalloc/kcalloc and kill these utterly useless
wrappers instead.




The wrappers aren't useless the drm alloc/free passes in a memory space 
for debugging purposes so we can track memory abuse when developing,


but drm_zalloc shouldjust alias to drm_calloc really..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd _page_from_freelist() : No more no page failures. (WHY????)

2007-08-28 Thread Mitchell Erblich

Nick Piggin wrote:
>
> [EMAIL PROTECTED] wrote:
> > [EMAIL PROTECTED]
> > Sent: Friday, August 24, 2007 3:11 PM
> > Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd &
> > get_page_from_freelist() : No more no page failures.
> >
> > Mailer added a HTML subpart and chopped the earlier email :^(
>
> Hi Mitchell,
>
> Is it possible to send suggestions in the form of a unified diff, even
> if you haven't even compiled it (just add a note to let people know).
>
> Secondly, we already have a (supposedly working) system of asynch
> reclaim, with buffering and hysteresis. I don't exactly understand
> what problem you think it has that would be solved by rechecking
> watermarks after allocating a page.
>
> When we're in the (min,low) watermark range, we'll wake up kswapd
> _before_ allocating anything, so what is better about the change to
> wake up kswapd after allocating? Can you perhaps come up with an
> example situation also to make this more clear?
>
> Overhead of wakeup_kswapd isn't too much of a problem: if we _should_
> be waking it up when we currently aren't, then we should be calling
> it. However the extra checking in the allocator fastpath is something
> we want to avoid if possible, because this can be a really hot path.
>
> Thanks,
> Nick
>
> --
> SUSE Labs, Novell Inc.
> -

Nick Piggin, et al,

First diffs would generate alot of noise, since I rip and insert
alot of code based on whether I think the code is REALLY
needed for MY TEST environment. These suggestions are
basicly minimal merge suggestions between my
development envir and the public Linux tree.

Now the why for this SUGGESTION/PATCH...

> When we're in the (min,low) watermark range, we'll wake up kswapd
> _before_ allocating anything, so what is better about the change to
> wake up kswapd after allocating? Can you perhaps come up with an
> example situation also to make this more clear?

Answer
Will GFP_ATOMIC alloc be failing at that point? If yes, then why
not allow kswapd attempt to prevent this condition from occuring?
The existing code reads that the first call to get_page_from_freelist()
has returned no page. Now you are going to start up something that
is at best going to take millisecs to start helping out. Won't it first
grab some pages to do its work? So we are going to be lower
in free memory right when it starts up. Right?

So, before the change, with  high memory consumption/pressure,
various GFP_xxx allocations would fail or take an excessive
amount of time due to the simple fact of low memory and/or
Slub/slab consumption and/or first failure of
get_page_from_freelist() when in a  low free memory condition.

Once the above condition occurs the perception is that the
current mainline Linux code then on demand increases its
effort to find some memory. However, while this is happening
the system is in a low memory bind and various performance
parameters are being effected and some allocations are
sleeping or being delayed or outright failing.

What I could see is that CURR suggestions allow a new class
of GFP_xxx allocations to succeed while in low memory,
try again philosophy, wake-up kswapd , etc, are all AFTER the
fact while something is WAITING for the memory. This
wait is in effect a SYNCHRONOUS wait for memory.

   Assuming that kswapd is really what is mostly needed.
   Execute it BEFORE (JUST IN TIME) to PREVENT low
   memory since I/O needs pages and  GFP_ATOMIC
allocs fails and other GFP allocs sleping and

  The SUGGESTION is to
   take the fraction of microsec longer in the fast path to see if
   it is needed to be started up and to ATTEMPT to prevent
   the SLOW-PATH and low/min memory from occuring.

The 2x low memory is
to allow some scalability and to allow it ENOUGH time to do what
it needs to do, since I expect a minimum number of millisecs
before it can move us away from low free memory. As the
amount of memory increases in a system this probably could
be decreased somewhat to maybe 1.25x.

IF the above is good then the issue is how to optimize the heck
out of the check.

Mitchell Erblich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-28 Thread Valdis . Kletnieks

On Mon, 27 Aug 2007 22:05:37 PDT, Linus Torvalds said:
> 
> 
> On Tue, 28 Aug 2007, Al Boldi wrote:
> > 
> > No need for framebuffer.  All you need is X using the X.org vesa-driver.  
> > Then start gears like this:
> > 
> >   # gears & gears & gears &
> > 
> > Then lay them out side by side to see the periodic stallings for ~10sec.
> 
> I don't think this is a good test.
> 
> Why?
> 
> If you're not using direct rendering, what you have is the X server doing 
> all the rendering, which in turn means that what you are testing is quite 
> possibly not so much about the *kernel* scheduling, but about *X-server* 
> scheduling!

I wonder - can people who are doing this as a test please specify whether
they're using an older X that has the libX11 or the newer libxcb code? That
may have a similar impact as well.

(libxcb is pretty new - it landed in Fedora Rawhide just about a month ago,
after Fedora 7 shipped.  Not sure what other distros have it now...)

pgpI8maTCY4aR.pgp
Description: PGP signature

Re: [PATCH] Immediate Values - Powerpc Optimization Fix

2007-08-28 Thread Christoph Hellwig

On Tue, Aug 28, 2007 at 04:40:06PM -0400, Mathieu Desnoyers wrote:
> Immediate Values Powerpc Optimization Fix
> 
> Fix a bad call to flush_icache_range(). The second parameter is the end 
> address
> of the range, not the length.
> 
> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
> CC: Christoph Hellwig <[EMAIL PROTECTED]>

If've just verified that this works for me, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-28 Thread David Miller

From: Roland Dreier <[EMAIL PROTECTED]>
Date: Tue, 28 Aug 2007 12:38:07 -0700

> It seems that the NIC would also have to look into a TCP stream (and
> handle out of order segments etc) to find message boundaries for this
> to be equivalent to what an RDMA NIC does.

It would work for data that accumulates in-order, give or take a small
window, just like LRO does.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Latchesar Ionkov

On 8/28/07, Arnd Bergmann <[EMAIL PROTECTED]> wrote:
> On Tuesday 28 August 2007, Eric Van Hensbergen wrote:
>
> > This adds a shared memory transport for a synthetic 9p device for
> > paravirtualized file system support under KVM/QEMU.
>
> Nice driver. I'm hoping we can do a virtio driver using a similar
> concept.
>
> > +#define PCI_VENDOR_ID_9P 0x5002
> > +#define PCI_DEVICE_ID_9P 0x000D
>
> Where do these numbers come from? Can we be sure they don't conflict with
> actual hardware?

I stole the VENDOR_ID from kvm's hypercall driver. There are no any
guarantees that it doesn't conflict with actual hardware. As it was
discussed before, there is still no ID assigned for the virtual
devices.

> > +struct p9pci_trans {
> > + struct pci_dev  *pdev;
> > + void __iomem*ioaddr;
> > + void __iomem*tx;
> > + void __iomem*rx;
> > + int irq;
> > + int pos;
> > + int len;
> > + wait_queue_head_t   wait;
> > +};
>
> I would expect the data structure to contain an embedded struct p9_trans,
> which is how most drivers work nowadays.
>
> > +static struct p9pci_trans *p9pci_trans; /* single channel for now */
>
> As a result, it should be easier to get rid of this global. My feeling is
> that it really should not be here.
>
> > +static irqreturn_t p9pci_interrupt(int irq, void *dev)
> > +{
> > + p9pci_trans = dev;
>
> This can simply use a local variable.
>
> > + p9pci_trans->len = le32_to_cpu(readl(p9pci_trans->rx));
>
> readl implies le32_to_cpu. Doing it twice on a PCI device is broken
> on big-endian hardware.
>
> > + P9_DPRINTK(P9_DEBUG_TRANS, "%p len %d\n", p9pci_trans->pdev,
> > + p9pci_trans->len);
> > + iowrite32(0, p9pci_trans->ioaddr + 4);
>
> Also, you should not mix iowriteXX/ioreadXX and writeX/readX calls in one
> driver. Since you use pci_iomap, iowriteXX/ioreadXX are the correct functions.
>
> > + wake_up_interruptible(_trans->wait);
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static int p9pci_read(struct p9_trans *trans, void *v, int len)
> > +{
> > + struct p9pci_trans *ts;
> > +
> > + if (!trans || trans->status == Disconnected || !trans->priv)
> > + return -EREMOTEIO;
> > +
> > + ts = trans->priv;
> > +
> > + P9_DPRINTK(P9_DEBUG_TRANS, "trans %p rx %p tx %p buf %p len %d\n",
> > + trans, ts->rx, ts->tx, v, 
> > len);
> > + if (len > ts->len)
> > + len = ts->len;
> > +
> > + if (len) {
> > + memcpy_fromio(v, ts->rx, len);
> > + ts->len = 0;
> > + /* let the host knows the message is consumed */
> > + writel(0, ts->rx);
> > + iowrite32(0, p9pci_trans->ioaddr + 4);
> > + P9_DPRINTK(P9_DEBUG_TRANS, "zero rxlen %d txlen %d\n",
> > + readl(ts->rx), readl(ts->tx));
> > + }
> > +
> > + return len;
> > +}
>
> I would expect memcpy_fromio and memcpy_toio to be relatively inefficient
> compared to virtual DMA, depending on the hypervisor. Do you have plans
> to change that, or did you have specific reasons to do the memcpy here?

No specific reasons. We wanted to start with simple and easy transport
and make things work before we start optimizing it. There are many
areas where the transport can be improved, using virtual DMA sounds
like a good suggestion.

>
> > + P9_DPRINTK(P9_DEBUG_TRANS, "trans %p rx %p tx %p buf %p len %d\n",
> > + trans, ts->rx, ts->tx, v, 
> > len);
> > + P9_DPRINTK(P9_DEBUG_TRANS, "rxlen %d\n", readl(ts->rx));
> > + if (readb(ts->tx) != 0)
> > + return 0;
> > +
> > + P9_DPRINTK(P9_DEBUG_TRANS, "tx addr %p io addr %p\n", ts->tx,
> > + ts->ioaddr);
>
> All these P9_DPRINTK statements somewhat limit readability. I would suggest
> you kill them as soon as the driver is considered stable.
>
> > +static int __devinit p9pci_probe(struct pci_dev *pdev,
> > + const struct pci_device_id *ent)
> > +{
> > + int err;
> > + u8 pci_rev;
> > +
> > + if (p9pci_trans)
> > + return -1;
>
> probe should return -EBUSY or similar, not -1.
>
> > + pci_read_config_byte(pdev, PCI_REVISION_ID, _rev);
> > +
> > + if (pdev->vendor == PCI_VENDOR_ID_9P &&
> > + pdev->device == PCI_DEVICE_ID_9P)
> > + printk(KERN_INFO "pci dev %s (id %04x:%04x rev %02x) is a 
> > 9P\n",
> > +pci_name(pdev), pdev->vendor, pdev->device, pci_rev);
>
> You wouldn't be here for a different vendor/device code, so the check is
> bogus.
>
> > + P9_DPRINTK(P9_DEBUG_TRANS, "%p\n", pdev);
> > + p9pci_trans = kzalloc(sizeof(*p9pci_trans), GFP_KERNEL);
> > + p9pci_trans->irq = -1;
>
> Use

Re: [PATCH] Add documentation to some preprocessor directives in init/*.c.

2007-08-28 Thread Alexey Dobriyan

On Tue, Aug 28, 2007 at 01:56:24AM -0400, Robert P. J. Day wrote:
> Add some documentation to potentially confusing preprocessor
> directives in some source files in the init/ directory to show their
> proper association and nesting.

> --- a/init/calibrate.c
> +++ b/init/calibrate.c
> @@ -101,7 +101,7 @@ static unsigned long __devinit 
> calibrate_delay_direct(void)
>  "estimate for loops_per_jiffy.\nProbably due to long platform 
> interrupts. Consider using \"lpj=\" boot option.\n");
>   return 0;
>  }
> -#else
> +#else /* !ARCH_HAS_READ_CURRENT_TIMER */
>  static unsigned long __devinit calibrate_delay_direct(void) {return 0;}
>  #endif

I'm sorry, but this is not useful. You're adding comments and compiler
ignores comments. So, one can't rely on such comments being accurate and
will go to start of section for recheck anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Immediate Values - Powerpc Optimization Fix

2007-08-28 Thread Mathieu Desnoyers

Immediate Values Powerpc Optimization Fix

Fix a bad call to flush_icache_range(). The second parameter is the end address
of the range, not the length.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
CC: Christoph Hellwig <[EMAIL PROTECTED]>
---
 arch/powerpc/kernel/immediate.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/arch/powerpc/kernel/immediate.c
===
--- linux-2.6-lttng.orig/arch/powerpc/kernel/immediate.c2007-08-28 
16:36:10.0 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/immediate.c 2007-08-28 
16:36:40.0 -0400
@@ -67,7 +67,7 @@ int arch_immediate_update(const struct _
memcpy((void*)immediate->immediate, (void*)immediate->var,
immediate->size);
flush_icache_range((unsigned long)immediate->immediate,
-   immediate->size);
+   (unsigned long)immediate->immediate + immediate->size);
return 0;
 }
 
@@ -99,5 +99,5 @@ void __init arch_immediate_update_early(
memcpy((void*)immediate->immediate, (void*)immediate->var,
immediate->size);
flush_icache_range((unsigned long)immediate->immediate,
-   immediate->size);
+   (unsigned long)immediate->immediate + immediate->size);
 }

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: nmi_watchdog=2 regression in 2.6.21

2007-08-28 Thread Daniel Walker


Here's a simpler patch that fixes the boot hang ..

We have to call off the IPI looping regardless of the check_nmi_watchdog
outcome..

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

Index: linux-2.6.22/arch/i386/kernel/nmi.c
===
--- linux-2.6.22.orig/arch/i386/kernel/nmi.c2007-08-15 00:51:12.0 
+
+++ linux-2.6.22/arch/i386/kernel/nmi.c 2007-08-28 20:27:56.0 +
@@ -122,12 +122,12 @@ static int __init check_nmi_watchdog(voi
atomic_dec(_active);
}
}
+   endflag = 1;
if (!atomic_read(_active)) {
kfree(prev_nmi_count);
atomic_set(_active, -1);
return -1;
}
-   endflag = 1;
printk("OK.\n");
 
/* now that we know it works we can reduce NMI frequency to




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 794 matches

Mail list logo