Re: [PATCH 4/5] Fix the configuration dependencies

2007-11-25 Thread Andrew Morton
On Fri, 16 Nov 2007 11:33:20 +0900 "Ken'ichi Ohmichi" <[EMAIL PROTECTED]> wrote:

> 
> This patch fixes the configuration dependencies in the vmcoreinfo data.
> 
> i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
> and x86_64's one is defined in arch/x86/mm/numa_64.c.
> They depend on CONFIG_NUMA:
>   arch/x86/mm/Makefile_32:7
> obj-$(CONFIG_NUMA) += discontig_32.o
>   arch/x86/mm/Makefile_64:7
> obj-$(CONFIG_NUMA) += numa_64.o
> 
> ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
> and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
>   arch/ia64/mm/Makefile:9-10
> obj-$(CONFIG_DISCONTIGMEM) += discontig.o
> obj-$(CONFIG_SPARSEMEM)+= discontig.o
> 
> ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
> and it depends on CONFIG_NUMA:
>   arch/ia64/mm/Makefile:8
> obj-$(CONFIG_NUMA) += numa.o
> 
> Signed-off-by: Ken'ichi Ohmichi <[EMAIL PROTECTED]>
> ---
> diff -rpuN a/arch/ia64/kernel/machine_kexec.c 
> b/arch/ia64/kernel/machine_kexec.c
> --- a/arch/ia64/kernel/machine_kexec.c2007-11-14 15:39:06.0 
> +0900
> +++ b/arch/ia64/kernel/machine_kexec.c2007-11-14 15:41:41.0 
> +0900
> @@ -129,10 +129,11 @@ void machine_kexec(struct kimage *image)
>  
>  void arch_crash_save_vmcoreinfo(void)
>  {
> -#if defined(CONFIG_ARCH_DISCONTIGMEM_ENABLE) && defined(CONFIG_NUMA)
> +#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_SPARSEMEM)
>   VMCOREINFO_SYMBOL(pgdat_list);
>   VMCOREINFO_LENGTH(pgdat_list, MAX_NUMNODES);
> -
> +#endif
> +#ifdef CONFIG_NUMA
>   VMCOREINFO_SYMBOL(node_memblk);
>   VMCOREINFO_LENGTH(node_memblk, NR_NODE_MEMBLKS);
>   VMCOREINFO_STRUCT_SIZE(node_memblk_s);
> diff -rpuN a/arch/x86/kernel/machine_kexec_32.c 
> b/arch/x86/kernel/machine_kexec_32.c
> --- a/arch/x86/kernel/machine_kexec_32.c  2007-11-14 15:39:19.0 
> +0900
> +++ b/arch/x86/kernel/machine_kexec_32.c  2007-11-14 15:39:33.0 
> +0900
> @@ -151,7 +151,7 @@ NORET_TYPE void machine_kexec(struct kim
>  
>  void arch_crash_save_vmcoreinfo(void)
>  {
> -#ifdef CONFIG_ARCH_DISCONTIGMEM_ENABLE
> +#ifdef CONFIG_NUMA
>   VMCOREINFO_SYMBOL(node_data);
>   VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
>  #endif
> diff -rpuN a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> --- a/arch/x86/kernel/machine_kexec_64.c  2007-11-14 15:39:19.0 
> +0900
> +++ b/arch/x86/kernel/machine_kexec_64.c  2007-11-14 15:39:33.0 
> +0900
> @@ -235,7 +235,7 @@ void arch_crash_save_vmcoreinfo(void)
>  {
>   VMCOREINFO_SYMBOL(init_level4_pgt);
>  
> -#ifdef CONFIG_ARCH_DISCONTIGMEM_ENABLE
> +#ifdef CONFIG_NUMA
>   VMCOREINFO_SYMBOL(node_data);
>   VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
>  #endif
> _
> 

x86_64-make-sparsemem-vmemmap-the-default-memory-model-v2.patch removes the
`VMCOREINFO_SYMBOL(node_data);' from arch/x86/kernel/machine_kexec_64.c
altogether, so I dropped that part of your patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-25 Thread Hannes Reinecke
On Sat, Nov 24, 2007 at 07:44:13PM +0200, James Bottomley wrote:
> Probing intermittent failures in Domain Validation, even with the fixes
> applied leads me to the conclusion that there are further problems with
> this commit:
> 
> commit fc5eb4facedbd6d7117905e775cee1975f894e79
> Author: Hannes Reinecke <[EMAIL PROTECTED]>
> Date:   Tue Nov 6 09:23:40 2007 +0100
> 
> [SCSI] Do not requeue requests if REQ_FAILFAST is set
>  
> The essence of the problems is that you're causing REQ_FAILFAST to
> terminate commands with error on requeuing conditions, some of which are
> relatively common on most SCSI devices.  While this may be the correct
> behaviour for multi-path, it's certainly wrong for the previously
> understood meaning of REQ_FAILFAST, which was don't retry on error,
> which is why domain validation and other applications use it to control
> error handling, but don't expect to get failures for a simple requeue
> are now spitting errors.
> 
> I honestly can't see that, even for the multi-path case, returning an
> error when we're over queue depth is the correct thing to do (it may not
> matter to something like a symmetrix, but an array that has a non-zero
> cost associated with a path change, like a CPQ HSV or the AVT
> controllers, will show fairly large slow downs if you do this).  Even if
> this is the desired behaviour (and I think that's a policy issue),
> DID_NO_CONNECT is almost certainly the wrong error to be sending back.
> 
> This patch fixes up domain validation to work again correctly, however,
> I really think it's just a bandaid.  Do you want to rethink the above
> commit?
> 
Given the amounted error, yes, I'll have to.
But we still face the initial problem that requeued requests will be
stuck in the queue forever (ie until the timeout catches it), causing
failover to be painfully slow.

Anyway, I'll think it over.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Markus Rex, HRB 16746 (AG N�rnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/27] x86: debugctlmsr kconfig

2007-11-25 Thread Roland McGrath
> Why is it defined in configuration system instead of some *.h file?

That seems to be existing practice for this sort of thing.
I just followed what I saw.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PPC: CELLEB - fix potential NULL pointer dereference

2007-11-25 Thread Cyrill Gorcunov
This patch adds checking for NULL value returned to prevent possible
NULL pointer dereference.
Also two unneeded 'return' are removed.

Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
---
Any comments are welcome.
 arch/powerpc/platforms/celleb/pci.c |   23 ---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/celleb/pci.c 
b/arch/powerpc/platforms/celleb/pci.c
index 6bc32fd..9b8bb01 100644
--- a/arch/powerpc/platforms/celleb/pci.c
+++ b/arch/powerpc/platforms/celleb/pci.c
@@ -138,8 +138,6 @@ static void celleb_config_read_fake(unsigned char *config, 
int where,
*val = celleb_fake_config_readl(p);
break;
}
-
-   return;
 }
 
 static void celleb_config_write_fake(unsigned char *config, int where,
@@ -158,7 +156,6 @@ static void celleb_config_write_fake(unsigned char *config, 
int where,
celleb_fake_config_writel(val, p);
break;
}
-   return;
 }
 
 static int celleb_fake_pci_read_config(struct pci_bus *bus,
@@ -348,9 +345,25 @@ static int __init celleb_setup_fake_pci_device(struct 
device_node *node,
pr_debug("PCI: res assigned 0x%016lx\n", (unsigned long)*res);
 
wi0 = of_get_property(node, "device-id", NULL);
+   if (unlikely((!wi0))) {
+   printk(KERN_ERR "PCI: device-id not found.\n");
+   goto error;
+   }
wi1 = of_get_property(node, "vendor-id", NULL);
+   if (unlikely((!wi1))) {
+   printk(KERN_ERR "PCI: vendor-id not found.\n");
+   goto error;
+   }
wi2 = of_get_property(node, "class-code", NULL);
+   if (unlikely((!wi2))) {
+   printk(KERN_ERR "PCI: class-code not found.\n");
+   goto error;
+   }
wi3 = of_get_property(node, "revision-id", NULL);
+   if (unlikely((!wi3))) {
+   printk(KERN_ERR "PCI: revision-id not found.\n");
+   goto error;
+   }
 
celleb_config_write_fake(*config, PCI_DEVICE_ID, 2, wi0[0] & 0x);
celleb_config_write_fake(*config, PCI_VENDOR_ID, 2, wi1[0] & 0x);
@@ -372,6 +385,10 @@ static int __init celleb_setup_fake_pci_device(struct 
device_node *node,
celleb_setup_pci_base_addrs(hose, devno, fn, num_base_addr);
 
li = of_get_property(node, "interrupts", );
+   if (!li) {
+   printk(KERN_ERR "PCI: interrupts not found.\n");
+   goto error;
+   }
val = li[0];
celleb_config_write_fake(*config, PCI_INTERRUPT_PIN, 1, 1);
celleb_config_write_fake(*config, PCI_INTERRUPT_LINE, 1, val);


Re: [PATCH 58/59] sound/isa: Add missing "space"

2007-11-25 Thread Takashi Iwai
At Mon, 19 Nov 2007 17:53:45 -0800,
Joe Perches wrote:
> 
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

Applied to ALSA tree.  Thanks.


Takashi


> ---
>  sound/isa/sc6000.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/sound/isa/sc6000.c b/sound/isa/sc6000.c
> index 94daf83..bc0c379 100644
> --- a/sound/isa/sc6000.c
> +++ b/sound/isa/sc6000.c
> @@ -390,7 +390,7 @@ static int __devinit sc6000_init_board(char __iomem 
> *vport, int irq, int dma,
>  
>   err = sc6000_init_mss(vport, config, vmss_port, mss_config);
>   if (err < 0) {
> - snd_printk(KERN_ERR "Can not initialize"
> + snd_printk(KERN_ERR "Can not initialize "
>  "Microsoft Sound System mode.\n");
>   return -ENODEV;
>   }
> -- 
> 1.5.3.5.652.gf192c
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: enable dual rng on VIA C7

2007-11-25 Thread Andrew Morton
On Sun, 11 Nov 2007 19:49:08 +0100 Udo van den Heuvel <[EMAIL PROTECTED]> wrote:

> Any reason why the second rng on the VIA C7 CPU is not enabled?
> 
> Kind regards,
> Udo
> 
> 
> [via-rng.patch  text/plain (634B)]
> --- old/drivers/char/hw_random/via-rng.c  2007-11-11 19:39:49.0 
> +0100
> +++ new/drivers/char/hw_random/via-rng.c  2007-11-11 19:40:41.0 
> +0100
> @@ -41,6 +41,7 @@
>   VIA_STRFILT_ENABLE  = (1 << 14),
>   VIA_RAWBITS_ENABLE  = (1 << 13),
>   VIA_RNG_ENABLE  = (1 << 6),
> + VIA_RNG_DUAL= (1 << 9),
>   VIA_XSTORE_CNT_MASK = 0x0F,
>  
>   VIA_RNG_CHUNK_8 = 0x00, /* 64 rand bits, 64 stored bits */
> @@ -128,6 +129,7 @@
>   lo &= ~(0x7f << VIA_STRFILT_CNT_SHIFT);
>   lo &= ~VIA_XSTORE_CNT_MASK;
>   lo &= ~(VIA_STRFILT_ENABLE | VIA_STRFILT_FAIL | VIA_RAWBITS_ENABLE);
> + lo |= VIA_RNG_DUAL;
>   lo |= VIA_RNG_ENABLE;
>  
>   if (lo != old_lo)
> 

Does the patch work?

It's missing a signed-off-by:, btw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] ide-scsi: use print_hex_dump from

2007-11-25 Thread Denis Cheng
these utilities implemented in lib/hexdump.c are more handy, please use this.

Cc: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
there are still much other private hexdump implementations in the source,
which reinvent the wheel, we can find them through:

  $ grep -RsIn hexdump 
  ...

 drivers/scsi/ide-scsi.c |   18 --
 1 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
index 8d0244c..8f3fc1d 100644
--- a/drivers/scsi/ide-scsi.c
+++ b/drivers/scsi/ide-scsi.c
@@ -242,16 +242,6 @@ static void idescsi_output_buffers (ide_drive_t *drive, 
idescsi_pc_t *pc, unsign
}
 }
 
-static void hexdump(u8 *x, int len)
-{
-   int i;
-
-   printk("[ ");
-   for (i = 0; i < len; i++)
-   printk("%x ", x[i]);
-   printk("]\n");
-}
-
 static int idescsi_check_condition(ide_drive_t *drive, struct request 
*failed_command)
 {
idescsi_scsi_t *scsi = drive_to_idescsi(drive);
@@ -282,7 +272,7 @@ static int idescsi_check_condition(ide_drive_t *drive, 
struct request *failed_co
pc->scsi_cmd = ((idescsi_pc_t *) failed_command->special)->scsi_cmd;
if (test_bit(IDESCSI_LOG_CMD, >log)) {
printk ("ide-scsi: %s: queue cmd = ", drive->name);
-   hexdump(pc->c, 6);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 1, 
pc->c, 6, 1);
}
rq->rq_disk = scsi->disk;
return ide_do_drive_cmd(drive, rq, ide_preempt);
@@ -337,7 +327,7 @@ static int idescsi_end_request (ide_drive_t *drive, int 
uptodate, int nrsecs)
idescsi_pc_t *opc = (idescsi_pc_t *) rq->buffer;
if (log) {
printk ("ide-scsi: %s: wrap up check %lu, rst = ", 
drive->name, opc->scsi_cmd->serial_number);
-   hexdump(pc->buffer,16);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 
1, pc->buffer, 16, 1);
}
memcpy((void *) opc->scsi_cmd->sense_buffer, pc->buffer, 
SCSI_SENSE_BUFFERSIZE);
kfree(pc->buffer);
@@ -816,10 +806,10 @@ static int idescsi_queue (struct scsi_cmnd *cmd,
 
if (test_bit(IDESCSI_LOG_CMD, >log)) {
printk ("ide-scsi: %s: que %lu, cmd = ", drive->name, 
cmd->serial_number);
-   hexdump(cmd->cmnd, cmd->cmd_len);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 1, 
cmd->cmnd, cmd->cmd_len, 1);
if (memcmp(pc->c, cmd->cmnd, cmd->cmd_len)) {
printk ("ide-scsi: %s: que %lu, tsl = ", drive->name, 
cmd->serial_number);
-   hexdump(pc->c, 12);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 
1, pc->c, 12, 1);
}
}
 
-- 
1.5.3.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] crypto test: use print_hex_dump from

2007-11-25 Thread Denis Cheng
these utilities implemented in lib/hexdump.c are more handy, please use this.

Cc: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 crypto/tcrypt.c |   21 +++--
 1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 24141fb..8766023 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -81,14 +81,6 @@ static char *check[] = {
"camellia", "seed", NULL
 };
 
-static void hexdump(unsigned char *buf, unsigned int len)
-{
-   while (len--)
-   printk("%02x", *buf++);
-
-   printk("\n");
-}
-
 static void tcrypt_complete(struct crypto_async_request *req, int err)
 {
struct tcrypt_result *res = req->data;
@@ -156,7 +148,8 @@ static void test_hash(char *algo, struct hash_testvec 
*template,
goto out;
}
 
-   hexdump(result, crypto_hash_digestsize(tfm));
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 1, 
result, crypto_hash_digestsize(tfm), 1);
+
printk("%s\n",
   memcmp(result, hash_tv[i].digest,
  crypto_hash_digestsize(tfm)) ?
@@ -203,7 +196,7 @@ static void test_hash(char *algo, struct hash_testvec 
*template,
goto out;
}
 
-   hexdump(result, crypto_hash_digestsize(tfm));
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 
1, result, crypto_hash_digestsize(tfm), 1);
printk("%s\n",
   memcmp(result, hash_tv[i].digest,
  crypto_hash_digestsize(tfm)) ?
@@ -319,7 +312,7 @@ static void test_cipher(char *algo, int enc,
}
 
q = kmap(sg_page([0])) + sg[0].offset;
-   hexdump(q, cipher_tv[i].rlen);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 
1, q, cipher_tv[i].rlen, 1);
 
printk("%s\n",
   memcmp(q, cipher_tv[i].result,
@@ -393,7 +386,7 @@ static void test_cipher(char *algo, int enc,
for (k = 0; k < cipher_tv[i].np; k++) {
printk("page %u\n", k);
q = kmap(sg_page([k])) + sg[k].offset;
-   hexdump(q, cipher_tv[i].tap[k]);
+   print_hex_dump(KERN_DEBUG, "", 
DUMP_PREFIX_OFFSET, 16, 1, q, cipher_tv[i].tap[k], 1);
printk("%s\n",
memcmp(q, cipher_tv[i].result + temp,
cipher_tv[i].tap[k]) ? "fail" :
@@ -839,7 +832,7 @@ static void test_deflate(void)
printk("fail: ret=%d\n", ret);
continue;
}
-   hexdump(result, dlen);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 1, 
result, dlen, 1);
printk("%s (ratio %d:%d)\n",
   memcmp(result, tv[i].output, dlen) ? "fail" : "pass",
   ilen, dlen);
@@ -870,7 +863,7 @@ static void test_deflate(void)
printk("fail: ret=%d\n", ret);
continue;
}
-   hexdump(result, dlen);
+   print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 16, 1, 
result, dlen, 1);
printk("%s (ratio %d:%d)\n",
   memcmp(result, tv[i].output, dlen) ? "fail" : "pass",
   ilen, dlen);
-- 
1.5.3.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm1 (sync is slow ?)

2007-11-25 Thread KAMEZAWA Hiroyuki
On Sat, 24 Nov 2007 19:04:34 +0100
Gabriel C <[EMAIL PROTECTED]> wrote:
> >> It seems OK here from a quick test (i386, ext3-on-IDE).
> >>
> >> Maybe device driver/block breakage?
> 
> Try revert
> 
> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff_plain;h=8655a546c83fc43f0a73416bbd126d02de7ad6c0;hp=5bc717b6bdaaf52edf365eb7d9d8c89fec79df5d
> 
> See also :
> http://lkml.org/lkml/2007/11/23/5
> 
> and search for '2.6.24-rc3-mm1: I/O error, system hangs' on LKML
> 

Thank you!
The problem was fixed by reverting the patch you pointed out.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-25 Thread Roland Dreier
 > Except C doesn't have namespaces and this mechanism doesn't create them.  So 
 > this is just complete and utter makework; as I said before, noone's going to 
 > confuse all those udp_* functions if they're not in the udp namespace.

I don't understand why you're so opposed to organizing the kernel's
exported symbols in a more self-documenting way.  It seems pretty
clear to me that having a mechanism that requires modules to make
explicit which (semi-)internal APIs makes reviewing easier, makes it
easier to communicate "please don't use that API" to module authors,
and takes at least a small step towards bringing the kernel's exported
API under control.  What's the real downside?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24-rc3-$SHA1: kernel BUG at fs/jbd/checkpoint.c:683!

2007-11-25 Thread Alexey Dobriyan
In a desperate attempt to screw up /proc one more time, I added some
proc fixes, wrote test module which creates and removes simple proc
file, then ran a) modprobe/rmmod loop, b) cat /proc/foo/bar loop,
c) LTP loop. So far so good -- survived overnight run.

While rebooting into new kernel, kernel died:

[56400.857832] kernel BUG at fs/jbd/checkpoint.c:683!
[56400.857911] invalid opcode:  [1] PREEMPT SMP 
[56400.857996] CPU 0 
[56400.858059] Modules linked in: foo
[56400.858138] Pid: 392, comm: kjournald Not tainted 2.6.24-rc3-proc #11
[56400.858227] RIP: 0010:[]  [] 
__journal_drop_transaction+0x110/0x120
[56400.858380] RSP: :81017f30dd58  EFLAGS: 00010286
[56400.858462] RAX: 81012ab9f210 RBX: 81017f336cd8 RCX: 81017fcbbe48
[56400.858555] RDX: 81012ab9f210 RSI: 810110eeb318 RDI: 81017f336cd8
[56400.858648] RBP: 81017aa8a2a0 R08:  R09: 81017aa8a4f8
[56400.858741] R10: 0001 R11: 8021b220 R12: 81017aa8a2a0
[56400.858834] R13: 81017aa8a2a0 R14: 81017f30ddbc R15: 81017f30ddbc
[56400.858927] FS:  () GS:804ea000() 
knlGS:
[56400.859070] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[56400.859157] CR2: 00437c50 CR3: 000104dae000 CR4: 06e0
[56400.859250] DR0:  DR1:  DR2: 
[56400.859343] DR3:  DR6: 0ff0 DR7: 0400
[56400.859436] Process kjournald (pid: 392, threadinfo 81017f30c000, task 
81017fcc8ec0)
[56400.859581] Stack:  802cbf7a 81016f795c60 802cc0a8 

[56400.859734]  810110eeb318 81012ab9f210 0001 
810117e4da50
[56400.859881]  81017f30ddbc 81017f336e3c 802cca0b 
810117e4da50
[56400.859979] Call Trace:
[56400.860093]  [] __journal_remove_checkpoint+0x5a/0xb0
[56400.860183]  [] journal_clean_one_cp_list+0xd8/0x170
[56400.860273]  [] __journal_clean_checkpoint_list+0x4b/0xa0
[56400.860370]  [] journal_commit_transaction+0x21d/0x1110
[56400.860462]  [] lock_timer_base+0x34/0x70
[56400.860546]  [] try_to_del_timer_sync+0x53/0x60
[56400.860633]  [] kjournald+0xdf/0x240
[56400.860715]  [] autoremove_wake_function+0x0/0x30
[56400.860803]  [] kjournald+0x0/0x240
[56400.860884]  [] kthread+0x4b/0x80
[56400.860967]  [] child_rip+0xa/0x12
[56400.861047]  [] kthread+0x0/0x80
[56400.861126]  [] child_rip+0x0/0x12
[56400.861205] 
[56400.861262] 
[56400.861263] Code: 0f 0b eb fe 66 66 66 90 66 66 66 90 66 66 66 90 53 48 8b 
77 
[56400.861546] RIP  [] __journal_drop_transaction+0x110/0x120
[56400.861642]  RSP 
[56400.862158] Kernel panic - not syncing: Fatal exception

Version:2.6.24-rc3-2ffbb8377c7a0713baf6644e285adc27a5654582
 + proc fixes (cumulative patch attached)
Box:Core 2 Duo E6400, 4G RAM
mount info: /dev/sda2 on / type ext3 (rw,noatime,nodiratime)
scheduler:  CFQ
.config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc3-proc
# Sun Nov 25 14:29:24 2007
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
# CONFIG_QUICKLIST is not set
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_HT=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=15
# CONFIG_CGROUPS is not set
# CONFIG_FAIR_GROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-25 Thread Roland Dreier
 > > I agree that we shouldn't make things too hard for out-of-tree
 > > modules, but I disagree with your first statement: there clearly is a
 > > large class of symbols that are used by multiple modules but which are
 > > not generically useful -- they are only useful by a certain small class
 > > of modules.
 > 
 > If it is so clear, you should be able to easily provide examples?

Sure -- Andi's example of symbols required only by TCP congestion
modules; the SCSI internals that Christoph wants to mark; the symbols
exported by my mlx4_core driver (which I admit are currently only used
by the mlx4_ib driver, but which will also be used by at least the
ethernet NIC driver for the same hardware).  I thought this was
already covered repeatedly in the thread and indeed in Andi's code so
there was no need to repeat it...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


profile code added to netif_receive_skb function

2007-11-25 Thread kernel coder
hi,

I have added some code to netif_receive_skb function.As linux kernel
is multhreaded , so there is no gaurantee than mine code is completely
executed without being disturbed by any other process .Timer interrupt
handler is an example of code which might interrupt execution of mine
code.

I just want to observe which processes are disturbing mine code .I
think i need to print EIP register values .How can i print cache
contents as well in linux kernel .Are there any tools available for
such purpose


thanks,
shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] make I/O schedulers non-modular

2007-11-25 Thread Al Boldi
Andrew Morton wrote:
> (cc's lovingly restored.  Please do not do that)

Thanks!  I'm replying off list.

> On Mon, 26 Nov 2007 07:57:00 +0300 Al Boldi <[EMAIL PROTECTED]> wrote:
> > Jens Axboe wrote:
> > > On Sun, Nov 25 2007, Adrian Bunk wrote:
> > > > Is there any technical reason why we need 4 different schedulers at
> > > > all?
> > >
> > > Until we have the perfect scheduler :-)
> > >
> > > With some hard work and testing, we should be able to get rid of 'as'.
> > > It still beats cfq for some of the workloads that deadline is good at,
> > > so not quite yet.
> > >
> > > > I have the gut feeling that the usual thing happens and people e.g.
> > > > not report some cfq problems because as works for them...
> > >
> > > There's always a risk with "duplicate", like several drivers for the
> > > same hardware. I'm not disputing that.
> >
> > Actually, both 'cfq' and 'as' are broken, and have been repeatedly
> > reported as such.  Deadline is the only one that currently looks sane,
> > and seems like a good starting point for a more involved iosched.  But
> > keep in mind, the fact that 'cfq' and 'as' are broken may also point to
> > a lower-level block-io problem.  So, incrementally improving deadline
> > may help discovering the problems both 'cfq' and 'as' are plagued with.
>
> Sorry, but these are vague and unuseful assertions.
>
> Please send bug reports, preferably with testcases which developers can
> use when fixing the bugs.

http://bugzilla.kernel.org/show_bug.cgi?id=5900


Thanks again!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Thanks for the response Mikael.

Is your 486 running a IDE disk on a normal interface or via USB?  I wonder 
if the NSLU2 only having I/O via USB might be significant.  Also, this is a 
2.6 kernel and I've seen spurious reports across the internet about similar 
oom-killer problems since about 2.6.7.


Thanks!
  -Josh

- Original Message - 
From: "Mikael Pettersson" <[EMAIL PROTECTED]>

To: <[EMAIL PROTECTED]>; 
Sent: Sunday, November 25, 2007 3:55 PM
Subject: Re: Small System Paging Problem - OOM-killer goes nuts



I'm no VM tuning expert, but I have and still do heavy compile
jobs on similarly configured machines, with no OOM problems:

I regularly build 2.6 kernels and occasionally also gcc on a
100MHz 486 with 28MB of RAM and perhaps 500MB of swap. It runs
a standard but stripped down Fedora Core 4 user-space, with ext3
file systems and a kernel that doesn't include anything non-essential.
The machine will swap madly, but the OOM killer never triggers.
(All system settings are FC4 defaults. I haven't touched them.)

In the past I did a fair amount of package rebuilds and test suite
runs on an NSLU2 myself, with a 2.4 Linksys/Openslug kernel, ext3,
and a 1GB or perhaps 2GB swap partition on a disk attached via a
USB2-to-PATA enclosure. Even when swapping heavily the OOM killer
wouldn't trigger.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-25 Thread Andy Currid

> Isn't there a way we can make this work for any upstream HT
> bridge, rather than only for specific NVIDIA chipsets?


The lines Peer indicates below will work for any vendor's bridge device
that implements an HT MSI mapping and is an upstream bridge of the
endpoint requesting MSI.

On some NVIDIA chipsets, the host bridge that implements HT MSI mapping
is not hierarchically upstream from the MSI endpoint; it may be a peer
on the same bus as the endpoint or the PCIe root complex that's above
the endpoint. The NVIDIA-specific code in the patch is to detect those
specific chipsets where this can occur. We have tested the patch with
both internal and PCI Express MSI endpoints on each of these NVIDIA
chipsets.

It may be that other vendors have Hypertransport chipsets with similar
requirements for HT MSI mapping, but we don't have that information or
the ability to test code on those vendors' chipsets.

Regards,

Andy
--
Andy Currid, NVIDIA Corporation
[EMAIL PROTECTED]  408 566 6743


-Original Message-
From: Peer Chen 
Sent: Sunday, November 25, 2007 20:02
To: Robert Hancock; peerchen
Cc: linux-kernel; akpm; Andy Currid
Subject: RE: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on
HT platform

I think the following lines are suitable for other bridges besides
nvidia's, :) :
===
+   if (pci_enable_msi_ht_cap(dev) != 0) {
+   return 0;
+   } else {
+   /* Get upstream bridge device handle */
+
+   bridge_dev = dev->bus->self;
+   while(bridge_dev != 0) {
+   if (pci_enable_msi_ht_cap(bridge_dev) !=
0) {
+   return 0;
+   } else
+   bridge_dev =
bridge_dev->bus->self;
+   }
+
+   return 1;
+   } 


BRs
Peer Chen

-Original Message-
From: Robert Hancock [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 26, 2007 2:34 AM
To: peerchen
Cc: linux-kernel; akpm; Peer Chen; Andy Currid
Subject: Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on
HT platform

peerchen wrote:
> According to the HyperTransport spec, 'En' indicate if the MSI Mapping
is active. So it should be set when enable the MSI.
> 
> The patch base on kernel 2.6.24-rc3
> 
> Signed-off-by: Andy Currid <[EMAIL PROTECTED]>
> Signed-off-by: Peer Chen <[EMAIL PROTECTED]>

Isn't there a way we can make this work for any upstream HT bridge,
rather than only for specific NVIDIA chipsets?

-- 
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED] Home Page:
http://www.roberthancock.com/

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [BUG] USB_PERSIST

2007-11-25 Thread Raymano Garibaldi
The device which has the root fs is a READ-ONLY device. There is no
way for it to change between getting detached and reattached to the
computer which is suspended. In such a case there is no possibility of
hibernation because there is nothing to write back to.

I understand that this is currently considered a feature but I am
arguing here that there should also be another feature that allows
this  to work under suspend to ram the same as it does with suspend to
disk (hibernation).

Here's a scenario:

1) You are at the airport working on a laptop without a hard drive,
which you have booted up using a live USB distro on a read-only USB
key drive.

2) You want to board your plane so you suspend your laptop. You can't
keep the USB stick in your laptop because you can not fit the laptop
back in the bag with the USB stick still attached. So you detach the
USB stick while the laptop is still suspended.

3) You get on the plane and after some time when you are allowed to
work again you stick back in the USB stick, resume the laptop and
continue work where you left off.

This scenario is not currently possible with the any kernel after
2.6.22. It is a very important missing feature.

And yes. This feature does work under the 2.6.21 kernel, exactly
because the kernel did not have  the USB suspend and persist feature
available. Under the 2.6.21 kernel, during suspend, the kernel is
totally unaware of what is happening to the USB device so nothing
happens when the USB device is detached and reattached while the
computer is suspended, hence making the described scenario above
possible. I currently, and very frequently, use this feature on my
live USB distro, FaunOS which uses kernel 2.6.21.


Thank you,

Raymano G.







On 11/25/07, Alan Stern <[EMAIL PROTECTED]> wrote:
> On Sat, 24 Nov 2007, Andrew Morton wrote:
>
> > On Tue, 20 Nov 2007 17:04:32 -0700 "Raymano Garibaldi" <[EMAIL PROTECTED]> 
> > wrote:
> >
> > > Is there any other information that I can provide which might help in
> > > resolving this bug?
> >
> > Let's cc the USB developers.
> >
> > > On 11/18/07, Raymano Garibaldi <[EMAIL PROTECTED]> wrote:
> > > > The last time I tried this and it worked was 2.6.21. Below is a
>
> Sorry, that's not possible.  2.6.21 doesn't include USB Persist
> support.  Nor does 2.6.22.
>
> There were some experimental patches with early versions of USB Persist
> for those kernels.  They are different from what eventually went into
> 2.6.23.
>
> > > > On 11/18/07, Denys Vlasenko <[EMAIL PROTECTED]> wrote:
> > > > > On Sunday 18 November 2007 20:14, Raymano Garibaldi wrote:
> > > > > > In kernel 2.6.23.8 USB_PERSIST feature does not work if the same USB
> > > > > > device is detached and reattached while computer is suspended. The
> > > > > > mount points for the USB storage device mounted before suspend are
> > > > > > lost and the device has to be remounted after resume.
>
> USB Persist was never meant to allow you to detach and reattach a
> device while the computer is suspended; it was meant to deal with
> hibernation.  So what you observed is the correct behavior, not a bug.
> Detaching and reattaching a device while the computer is suspended
> should result in exactly the same behavior as detaching and reattaching
> the device while the computer is awake.
>
> If you try doing the same thing but with the computer in hibernation
> instead of suspended, you may find it more in line with what you
> expect.
>
> Alan Stern
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] make I/O schedulers non-modular

2007-11-25 Thread Andrew Morton

(cc's lovingly restored.  Please do not do that)

On Mon, 26 Nov 2007 07:57:00 +0300 Al Boldi <[EMAIL PROTECTED]> wrote:

> Jens Axboe wrote:
> > On Sun, Nov 25 2007, Adrian Bunk wrote:
> > > Is there any technical reason why we need 4 different schedulers at all?
> >
> > Until we have the perfect scheduler :-)
> >
> > With some hard work and testing, we should be able to get rid of 'as'.
> > It still beats cfq for some of the workloads that deadline is good at,
> > so not quite yet.
> >
> > > I have the gut feeling that the usual thing happens and people e.g. not
> > > report some cfq problems because as works for them...
> >
> > There's always a risk with "duplicate", like several drivers for the
> > same hardware. I'm not disputing that.
> 
> Actually, both 'cfq' and 'as' are broken, and have been repeatedly reported 
> as such.  Deadline is the only one that currently looks sane, and seems like 
> a good starting point for a more involved iosched.  But keep in mind, the 
> fact that 'cfq' and 'as' are broken may also point to a lower-level block-io 
> problem.  So, incrementally improving deadline may help discovering the 
> problems both 'cfq' and 'as' are plagued with.
> 

Sorry, but these are vague and unuseful assertions.

Please send bug reports, preferably with testcases which developers can use
when fixing the bugs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] make I/O schedulers non-modular

2007-11-25 Thread Al Boldi
Jens Axboe wrote:
> On Sun, Nov 25 2007, Adrian Bunk wrote:
> > Is there any technical reason why we need 4 different schedulers at all?
>
> Until we have the perfect scheduler :-)
>
> With some hard work and testing, we should be able to get rid of 'as'.
> It still beats cfq for some of the workloads that deadline is good at,
> so not quite yet.
>
> > I have the gut feeling that the usual thing happens and people e.g. not
> > report some cfq problems because as works for them...
>
> There's always a risk with "duplicate", like several drivers for the
> same hardware. I'm not disputing that.

Actually, both 'cfq' and 'as' are broken, and have been repeatedly reported 
as such.  Deadline is the only one that currently looks sane, and seems like 
a good starting point for a more involved iosched.  But keep in mind, the 
fact that 'cfq' and 'as' are broken may also point to a lower-level block-io 
problem.  So, incrementally improving deadline may help discovering the 
problems both 'cfq' and 'as' are plagued with.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch 4/4] sched: Improve fairness of cpu bandwidth allocation for task groups

2007-11-25 Thread Srivatsa Vaddagiri

The current load balancing scheme isn't good for group fairness.

For ex: on a 8-cpu system, I created 3 groups as under:

a = 8 tasks (cpu.shares = 1024) 
b = 4 tasks (cpu.shares = 1024) 
c = 3 tasks (cpu.shares = 1024) 

a, b and c are task groups that have equal weight. We would expect each
of the groups to receive 33.33% of cpu bandwidth under a fair scheduler.

This is what I get with the latest scheduler git tree:


Col1  | Col2| Col3  |  Col4
--|-|---|---
a | 277.676 | 57.8% | 54.1%  54.1%  54.1%  54.2%  56.7%  62.2%  62.8% 64.5%
b | 116.108 | 24.2% | 47.4%  48.1%  48.7%  49.3%
c |  86.326 | 18.0% | 47.5%  47.9%  48.5%


Explanation of o/p:

Col1 -> Group name
Col2 -> Cumulative execution time (in seconds) received by all tasks of that 
group in a 60sec window across 8 cpus
Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in 
percentage. Col3 data is derived as:
Col3 = 100 * Col2 / (NR_CPUS * 60)
Col4 -> CPU bandwidth received by each individual task of the group.
Col4 = 100 * cpu_time_recd_by_task / 60

[I can share the test case that produces a similar o/p if reqd]

The deviation from desired group fairness is as below:

a = +24.47%
b = -9.13%
c = -15.33%

which is quite high.

After the patch below is applied, here are the results:


Col1  | Col2| Col3  |  Col4
--|-|---|---
a | 163.112 | 34.0% | 33.2%  33.4%  33.5%  33.5%  33.7%  34.4%  34.8% 35.3%
b | 156.220 | 32.5% | 63.3%  64.5%  66.1%  66.5%
c | 160.653 | 33.5% | 85.8%  90.6%  91.4%


Deviation from desired group fairness is as below:

a = +0.67%
b = -0.83%
c = +0.17%

which is far better IMO. Most of other runs have yielded a deviation within
+-2% at the most, which is good.

Why do we see bad (group) fairness with current scheuler?
=

Currently cpu's weight is just the summation of individual task weights.
This can yield incorrect results. For ex: consider three groups as below
on a 2-cpu system:

CPU0CPU1
---
A (10)  B(5)
C(5)
---

Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all
of which are on CPU1. Each task has the same weight (NICE_0_LOAD =
1024).

The current scheme would yield a cpu weight of 10240 (10*1024) for each cpu and
the load balancer will think both CPUs are perfectly balanced and won't
move around any tasks. This, however, would yield this bandwidth:

A = 50%
B = 25%
C = 25%

which is not the desired result.

What's changing in the patch?
=

- How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is
  defined (see below)
- API Change 
- Two tunables introduced in sysfs (under SCHED_DEBUG) to 
  control the frequency at which the load balance monitor
  thread runs. 

The basic change made in this patch is how cpu weight (rq->load.weight) is 
calculated. Its now calculated as the summation of group weights on a cpu,
rather than summation of task weights. Weight exerted by a group on a
cpu is dependent on the shares allocated to it and also the number of
tasks the group has on that cpu compared to the total number of
(runnable) tasks the group has in the system.

Let,
W(K,i)  = Weight of group K on cpu i
T(K,i)  = Task load present in group K's cfs_rq on cpu i
T(K)= Total task load of group K across various cpus
S(K)= Shares allocated to group K
NRCPUS  = Number of online cpus in the scheduler domain to
  which group K is assigned.

Then,
W(K,i) = S(K) * NRCPUS * T(K,i) / T(K)

A load balance monitor thread is created at bootup, which periodically
runs and adjusts group's weight on each cpu. To avoid its overhead, two
min/max tunables are introduced (under SCHED_DEBUG) to control the rate at which
it runs.


Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---
 include/linux/sched.h |4 
 kernel/sched.c|  265 --
 kernel/sched_fair.c   |   86 ++--
 kernel/sysctl.c   |   18 +++
 4 files changed, 334 insertions(+), 39 deletions(-)

Index: current/include/linux/sched.h
===
--- 

[Patch 3/4 v2] sched: change how cpu load is calculated

2007-11-25 Thread Srivatsa Vaddagiri

This patch changes how the cpu load exerted by fair_sched_class tasks
is calculated. Load exerted by fair_sched_class tasks on a cpu is now a
summation of the group weights, rather than summation of task weights.
Weight exerted by a group on a cpu is dependent on the shares allocated
to it.

This version of patch (v2 of Patch 3/4) has a minor impact on code size
(but should have no runtime/functional impact) for !CONFIG_FAIR_GROUP_SCHED 
case, but the overall code, IMHO, is neater compared to v1 of Patch 3/4
(because of lesser #ifdefs).

I prefer v2 of Patch 3/4.

Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---
 kernel/sched.c  |   27 +++
 kernel/sched_fair.c |   31 +++
 kernel/sched_rt.c   |2 ++
 3 files changed, 40 insertions(+), 20 deletions(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -869,6 +869,16 @@
   struct rq_iterator *iterator);
 #endif
 
+static inline void inc_cpu_load(struct rq *rq, unsigned long load)
+{
+   update_load_add(>load, load);
+}
+
+static inline void dec_cpu_load(struct rq *rq, unsigned long load)
+{
+   update_load_sub(>load, load);
+}
+
 #include "sched_stats.h"
 #include "sched_idletask.c"
 #include "sched_fair.c"
@@ -879,26 +889,14 @@
 
 #define sched_class_highest (_sched_class)
 
-static inline void inc_load(struct rq *rq, const struct task_struct *p)
-{
-   update_load_add(>load, p->se.load.weight);
-}
-
-static inline void dec_load(struct rq *rq, const struct task_struct *p)
-{
-   update_load_sub(>load, p->se.load.weight);
-}
-
 static void inc_nr_running(struct task_struct *p, struct rq *rq)
 {
rq->nr_running++;
-   inc_load(rq, p);
 }
 
 static void dec_nr_running(struct task_struct *p, struct rq *rq)
 {
rq->nr_running--;
-   dec_load(rq, p);
 }
 
 static void set_load_weight(struct task_struct *p)
@@ -4070,10 +4068,8 @@
goto out_unlock;
}
on_rq = p->se.on_rq;
-   if (on_rq) {
+   if (on_rq)
dequeue_task(rq, p, 0);
-   dec_load(rq, p);
-   }
 
p->static_prio = NICE_TO_PRIO(nice);
set_load_weight(p);
@@ -4083,7 +4079,6 @@
 
if (on_rq) {
enqueue_task(rq, p, 0);
-   inc_load(rq, p);
/*
 * If the task increased its priority or is running and
 * lowered its priority, then reschedule its CPU:
Index: current/kernel/sched_fair.c
===
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -755,15 +755,26 @@
 static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup)
 {
struct cfs_rq *cfs_rq;
-   struct sched_entity *se = >se;
+   struct sched_entity *se = >se,
+   *topse = NULL;  /* Highest schedulable entity */
+   int incload = 1;
 
for_each_sched_entity(se) {
-   if (se->on_rq)
+   topse = se;
+   if (se->on_rq) {
+   incload = 0;
break;
+   }
cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, wakeup);
wakeup = 1;
}
+   /* Increment cpu load if we just enqueued the first task of a group on
+* 'rq->cpu'. 'topse' represents the group to which task 'p' belongs
+* at the highest grouping level.
+*/
+   if (incload)
+   inc_cpu_load(rq, topse->load.weight);
 }
 
 /*
@@ -774,16 +785,28 @@
 static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep)
 {
struct cfs_rq *cfs_rq;
-   struct sched_entity *se = >se;
+   struct sched_entity *se = >se,
+   *topse = NULL;  /* Highest schedulable entity */
+   int decload = 1;
 
for_each_sched_entity(se) {
+   topse = se;
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, sleep);
/* Don't dequeue parent if it has other entities besides us */
-   if (cfs_rq->load.weight)
+   if (cfs_rq->load.weight) {
+   if (parent_entity(se))
+   decload = 0;
break;
+   }
sleep = 1;
}
+   /* Decrement cpu load if we just dequeued the last task of a group on
+* 'rq->cpu'. 'topse' represents the group to which task 'p' belongs
+* at the highest grouping level.
+*/
+   if (decload)
+   dec_cpu_load(rq, topse->load.weight);
 }
 
 /*
Index: current/kernel/sched_rt.c
===
--- current.orig/kernel/sched_rt.c
+++ current/kernel/sched_rt.c
@@ 

Re: bonding sysfs output

2007-11-25 Thread Andrew Morton
On Sun, 25 Nov 2007 16:12:57 +0100 Wagner Ferenc <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Am I totally of the limit with the attached patch against
> drivers/net/bonding/bond_sysfs.c?  I'd like to receive some comments,
> as I'm not a kernel developer.

Plese alwayts cc [EMAIL PROTECTED] on networking-related matters.

> I propose it as a fix for trailing NULs and spaces like eg.
> 
> $ od -c /sys/class/net/bond0/bonding/slaves 
> 000   e   t   h   -   l   e   f   t   e   t   h   -   r   i   g
> 020   h   t  \n  \0
> 025
> 
> I'm afraid there're other problems with "++more++" handling, but let's
> not consider those just yet.  Find the patch attached.  The first
> hunks also renames buffer to buf, for consistency's shake.
> 
> The original version had varying behaviour for Not Applicable cases.
> This patch also settles for empty files (not even a line feed) in
> those cases, but I'm not sure about the general policy on this matter.
> 

hm, there are a lot of changes there.  Were they all actually needed to fix
the one bug which you have described?


--- bond_sysfs.c.orig   2007-11-16 19:14:27.0 +0100
+++ bond_sysfs.c2007-11-25 16:01:23.092973099 +0100
@@ -74,7 +74,7 @@
  * "show" function for the bond_masters attribute.
  * The class parameter is ignored.
  */
-static ssize_t bonding_show_bonds(struct class *cls, char *buffer)
+static ssize_t bonding_show_bonds(struct class *cls, char *buf)
 {
int res = 0;
struct bonding *bond;
@@ -86,14 +86,13 @@
/* not enough space for another interface name */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
-   res += sprintf(buffer + res, "++more++");
+   res += sprintf(buf + res, "++more++ ");
break;
}
-   res += sprintf(buffer + res, "%s ",
+   res += sprintf(buf + res, "%s ",
   bond->dev->name);
}
-   res += sprintf(buffer + res, "\n");
-   res++;
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
up_read(&(bonding_rwsem));
return res;
 }
@@ -237,14 +236,13 @@
/* not enough space for another interface name */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
-   res += sprintf(buf + res, "++more++");
+   res += sprintf(buf + res, "++more++ ");
break;
}
res += sprintf(buf + res, "%s ", slave->dev->name);
}
read_unlock_bh(>lock);
-   res += sprintf(buf + res, "\n");
-   res++;
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -401,7 +399,7 @@
 
return sprintf(buf, "%s %d\n",
bond_mode_tbl[bond->params.mode].modename,
-   bond->params.mode) + 1;
+   bond->params.mode);
 }
 
 static ssize_t bonding_store_mode(struct device *d,
@@ -452,17 +450,14 @@
  struct device_attribute *attr,
  char *buf)
 {
-   int count;
+   int count = 0;
struct bonding *bond = to_bond(d);
 
-   if ((bond->params.mode != BOND_MODE_XOR) &&
-   (bond->params.mode != BOND_MODE_8023AD)) {
-   // Not Applicable
-   count = sprintf(buf, "NA\n") + 1;
-   } else {
+   if ((bond->params.mode == BOND_MODE_XOR) ||
+   (bond->params.mode == BOND_MODE_8023AD)) {
count = sprintf(buf, "%s %d\n",
xmit_hashtype_tbl[bond->params.xmit_policy].modename,
-   bond->params.xmit_policy) + 1;
+   bond->params.xmit_policy);
}
 
return count;
@@ -522,7 +517,7 @@
 
return sprintf(buf, "%s %d\n",
   arp_validate_tbl[bond->params.arp_validate].modename,
-  bond->params.arp_validate) + 1;
+  bond->params.arp_validate);
 }
 
 static ssize_t bonding_store_arp_validate(struct device *d,
@@ -574,7 +569,7 @@
 {
struct bonding *bond = to_bond(d);
 
-   return sprintf(buf, "%d\n", bond->params.arp_interval) + 1;
+   return sprintf(buf, "%d\n", bond->params.arp_interval);
 }
 
 static ssize_t bonding_store_arp_interval(struct device *d,
@@ -671,10 +666,7 @@
res += sprintf(buf + res, "%u.%u.%u.%u ",
   NIPQUAD(bond->params.arp_targets[i]));
}
-   if (res)
-   res--;  /* eat the leftover space */
-   res += sprintf(buf + res, "\n");
-   res++;
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -775,7 +767,7 @@
 {
struct bonding *bond = to_bond(d);
 
-   return 

[Patch 3/4 v1] sched: change how cpu load is calculated

2007-11-25 Thread Srivatsa Vaddagiri
This patch changes how the cpu load exerted by fair_sched_class tasks
is calculated. Load exerted by fair_sched_class tasks on a cpu is now a
summation of the group weights, rather than summation of task weights.
Weight exerted by a group on a cpu is dependent on the shares allocated to it.

This version of patch (v1 of Patch 3/4) has zero impact for 
!CONFIG_FAIR_GROUP_SCHED case.

Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---
 kernel/sched.c  |   38 ++
 kernel/sched_fair.c |   31 +++
 kernel/sched_rt.c   |2 ++
 3 files changed, 59 insertions(+), 12 deletions(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -869,15 +869,25 @@
   struct rq_iterator *iterator);
 #endif
 
-#include "sched_stats.h"
-#include "sched_idletask.c"
-#include "sched_fair.c"
-#include "sched_rt.c"
-#ifdef CONFIG_SCHED_DEBUG
-# include "sched_debug.c"
-#endif
+#ifdef CONFIG_FAIR_GROUP_SCHED
 
-#define sched_class_highest (_sched_class)
+static inline void inc_cpu_load(struct rq *rq, unsigned long load)
+{
+   update_load_add(>load, load);
+}
+
+static inline void dec_cpu_load(struct rq *rq, unsigned long load)
+{
+   update_load_sub(>load, load);
+}
+
+static inline void inc_load(struct rq *rq, const struct task_struct *p) { }
+static inline void dec_load(struct rq *rq, const struct task_struct *p) { }
+
+#else  /* CONFIG_FAIR_GROUP_SCHED */
+
+static inline void inc_cpu_load(struct rq *rq, unsigned long load) { }
+static inline void dec_cpu_load(struct rq *rq, unsigned long load) { }
 
 static inline void inc_load(struct rq *rq, const struct task_struct *p)
 {
@@ -889,6 +899,18 @@
update_load_sub(>load, p->se.load.weight);
 }
 
+#endif /* CONFIG_FAIR_GROUP_SCHED */
+
+#include "sched_stats.h"
+#include "sched_idletask.c"
+#include "sched_fair.c"
+#include "sched_rt.c"
+#ifdef CONFIG_SCHED_DEBUG
+# include "sched_debug.c"
+#endif
+
+#define sched_class_highest (_sched_class)
+
 static void inc_nr_running(struct task_struct *p, struct rq *rq)
 {
rq->nr_running++;
Index: current/kernel/sched_fair.c
===
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -755,15 +755,26 @@
 static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup)
 {
struct cfs_rq *cfs_rq;
-   struct sched_entity *se = >se;
+   struct sched_entity *se = >se,
+   *topse = NULL;  /* Highest schedulable entity */
+   int incload = 1;
 
for_each_sched_entity(se) {
-   if (se->on_rq)
+   topse = se;
+   if (se->on_rq) {
+   incload = 0;
break;
+   }
cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, wakeup);
wakeup = 1;
}
+   /* Increment cpu load if we just enqueued the first task of a group on
+* 'rq->cpu'. 'topse' represents the group to which task 'p' belongs
+* at the highest grouping level.
+*/
+   if (incload)
+   inc_cpu_load(rq, topse->load.weight);
 }
 
 /*
@@ -774,16 +785,28 @@
 static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep)
 {
struct cfs_rq *cfs_rq;
-   struct sched_entity *se = >se;
+   struct sched_entity *se = >se,
+   *topse = NULL;  /* Highest schedulable entity */
+   int decload = 1;
 
for_each_sched_entity(se) {
+   topse = se;
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, sleep);
/* Don't dequeue parent if it has other entities besides us */
-   if (cfs_rq->load.weight)
+   if (cfs_rq->load.weight) {
+   if (parent_entity(se))
+   decload = 0;
break;
+   }
sleep = 1;
}
+   /* Decrement cpu load if we just dequeued the last task of a group on
+* 'rq->cpu'. 'topse' represents the group to which task 'p' belongs
+* at the highest grouping level.
+*/
+   if (decload)
+   dec_cpu_load(rq, topse->load.weight);
 }
 
 /*
Index: current/kernel/sched_rt.c
===
--- current.orig/kernel/sched_rt.c
+++ current/kernel/sched_rt.c
@@ -31,6 +31,7 @@
 
list_add_tail(>run_list, array->queue + p->prio);
__set_bit(p->prio, array->bitmap);
+   inc_cpu_load(rq, p->se.load.weight);
 }
 
 /*
@@ -45,6 +46,7 @@
list_del(>run_list);
if (list_empty(array->queue + p->prio))
__clear_bit(p->prio, array->bitmap);
+   dec_cpu_load(rq, p->se.load.weight);
 }
 
 /*



[PATCH 2/4] sched: minor fixes for group scheduler

2007-11-25 Thread Srivatsa Vaddagiri

Minor bug fixes for group scheduler:

- Use a mutex to serialize add/remove of task groups and also when
  changing shares of a task group. Use the same mutex when printing cfs_rq
  stats for various task groups.
- Use list_for_each_entry_rcu in for_each_leaf_cfs_rq macro (when walking task 
  group list)


Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---
 kernel/sched.c  |   33 +
 kernel/sched_fair.c |4 +++-
 2 files changed, 28 insertions(+), 9 deletions(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -169,8 +169,6 @@ struct task_group {
/* runqueue "owned" by this group on each cpu */
struct cfs_rq **cfs_rq;
unsigned long shares;
-   /* spinlock to serialize modification to shares */
-   spinlock_t lock;
struct rcu_head rcu;
 };
 
@@ -182,6 +180,11 @@ static DEFINE_PER_CPU(struct cfs_rq, ini
 static struct sched_entity *init_sched_entity_p[NR_CPUS];
 static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
 
+/* task_group_mutex serializes add/remove of task groups and also changes to
+ * a task group's cpu shares.
+ */
+static DEFINE_MUTEX(task_group_mutex);
+
 /* Default task group.
  * Every task in system belong to this group at bootup.
  */
@@ -222,9 +225,21 @@ static inline void set_task_cfs_rq(struc
p->se.parent = task_group(p)->se[cpu];
 }
 
+static inline void lock_task_group_list(void)
+{
+   mutex_lock(_group_mutex);
+}
+
+static inline void unlock_task_group_list(void)
+{
+   mutex_unlock(_group_mutex);
+}
+
 #else
 
 static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
+static inline void lock_task_group_list(void) { }
+static inline void unlock_task_group_list(void) { }
 
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
@@ -6747,7 +6762,6 @@ void __init sched_init(void)
se->parent = NULL;
}
init_task_group.shares = init_task_group_load;
-   spin_lock_init(_task_group.lock);
 #endif
 
for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
@@ -6987,14 +7001,15 @@ struct task_group *sched_create_group(vo
se->parent = NULL;
}
 
+   tg->shares = NICE_0_LOAD;
+
+   lock_task_group_list();
for_each_possible_cpu(i) {
rq = cpu_rq(i);
cfs_rq = tg->cfs_rq[i];
list_add_rcu(_rq->leaf_cfs_rq_list, >leaf_cfs_rq_list);
}
-
-   tg->shares = NICE_0_LOAD;
-   spin_lock_init(>lock);
+   unlock_task_group_list();
 
return tg;
 
@@ -7040,10 +7055,12 @@ void sched_destroy_group(struct task_gro
struct cfs_rq *cfs_rq = NULL;
int i;
 
+   lock_task_group_list();
for_each_possible_cpu(i) {
cfs_rq = tg->cfs_rq[i];
list_del_rcu(_rq->leaf_cfs_rq_list);
}
+   unlock_task_group_list();
 
BUG_ON(!cfs_rq);
 
@@ -7117,7 +7134,7 @@ int sched_group_set_shares(struct task_g
 {
int i;
 
-   spin_lock(>lock);
+   lock_task_group_list();
if (tg->shares == shares)
goto done;
 
@@ -7126,7 +7143,7 @@ int sched_group_set_shares(struct task_g
set_se_shares(tg->se[i], shares);
 
 done:
-   spin_unlock(>lock);
+   unlock_task_group_list();
return 0;
 }
 
Index: current/kernel/sched_fair.c
===
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -685,7 +685,7 @@ static inline struct cfs_rq *cpu_cfs_rq(
 
 /* Iterate thr' all leaf cfs_rq's on a runqueue */
 #define for_each_leaf_cfs_rq(rq, cfs_rq) \
-   list_for_each_entry(cfs_rq, >leaf_cfs_rq_list, leaf_cfs_rq_list)
+   list_for_each_entry_rcu(cfs_rq, >leaf_cfs_rq_list, leaf_cfs_rq_list)
 
 /* Do the two (enqueued) entities belong to the same group ? */
 static inline int
@@ -1126,7 +1126,9 @@ static void print_cfs_stats(struct seq_f
 #ifdef CONFIG_FAIR_GROUP_SCHED
print_cfs_rq(m, cpu, _rq(cpu)->cfs);
 #endif
+   lock_task_group_list();
for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq)
print_cfs_rq(m, cpu, cfs_rq);
+   unlock_task_group_list();
 }
 #endif

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] sched: code cleanup

2007-11-25 Thread Srivatsa Vaddagiri

Minor cleanups:

- Fix coding style
- remove obsolete comment

Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---
 kernel/sched.c |   21 +++--
 1 files changed, 3 insertions(+), 18 deletions(-)

Index: current/kernel/sched.c
===
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -191,12 +191,12 @@ struct task_group init_task_group = {
 };
 
 #ifdef CONFIG_FAIR_USER_SCHED
-# define INIT_TASK_GRP_LOAD2*NICE_0_LOAD
+# define INIT_TASK_GROUP_LOAD  2*NICE_0_LOAD
 #else
-# define INIT_TASK_GRP_LOADNICE_0_LOAD
+# define INIT_TASK_GROUP_LOAD  NICE_0_LOAD
 #endif
 
-static int init_task_group_load = INIT_TASK_GRP_LOAD;
+static int init_task_group_load = INIT_TASK_GROUP_LOAD;
 
 /* return group to which a task belongs */
 static inline struct task_group *task_group(struct task_struct *p)
@@ -864,21 +864,6 @@ iter_move_one_task(struct rq *this_rq, i
 
 #define sched_class_highest (_sched_class)
 
-/*
- * Update delta_exec, delta_fair fields for rq.
- *
- * delta_fair clock advances at a rate inversely proportional to
- * total load (rq->load.weight) on the runqueue, while
- * delta_exec advances at the same rate as wall-clock (provided
- * cpu is not idle).
- *
- * delta_exec / delta_fair is a measure of the (smoothened) load on this
- * runqueue over any given interval. This (smoothened) load is used
- * during load balance.
- *
- * This function is called /before/ updating rq->load
- * and when switching tasks.
- */
 static inline void inc_load(struct rq *rq, const struct task_struct *p)
 {
update_load_add(>load, p->se.load.weight);

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] sched: group scheduler related patches (V3)

2007-11-25 Thread Srivatsa Vaddagiri
Here's V3 of the group scheduler related patches, which is mainly addressing 
improved fairness of cpu bandwidth allocation for task groups.

Patch 1/4   -> coding style cleanup
Patch 2/4   -> Minor group scheduling related bug fixes

Patch 3/4 (v1)  -> Modifies how cpu load is calculated, such that there is zero
   impact on !CONFIG_FAIR_GROUP_SCHED
Patch 3/4 (v2)  -> Modifies how cpu load is calculated, such that there is a
   small impact on code size (but should have NO impact on
   functionality or runtime behavior) for
   !CONFIG_FAIR_GROUP_SCHED case. The resulting code however is
   much neater since it avoids some #ifdefs. I prefer v2.

Patch 4/4   -> Updates load balance logic to provide improved fairness for
   task groups.

To have zero impact on !CONFIG_FAIR_GROUP_SCHED case, please apply the
following patches:

- Patch 1/4
- Patch 2/4 
- Patch 3/4 (v1)
- Patch 4/4

I personally prefer v2 of Patch 3/4. Even though it has a minor impact
on code size for !CONFIG_FAIR_GROUP_SCHED case, the overall code is much
neater IMHO.

Impact on sched.o size:
===

!CONFIG_FAIR_GROUP_SCHED:

   textdata bss dec hex filename
  368292766  48   396439adb sched.o-before-nofgs
  368292766  48   396439adb sched.o-after-v1-nofgs (v1 of Patch 3/4)
  368432766  48   396579ae9 sched.o-after-v2-nofgs (v2 of Patch 3/4)

CONFIG_FAIR_GROUP_SCHED:

   textdata bss dec hex filename
  390193346 336   42701a6cd sched.o-before-fgs
  403033482 308   44093ac3d sched.o-after-v1-fgs (v1 of Patch 3/4)
  403033482 308   44093ac3d sched.o-after-v2-fgs (v2 of Patch 3/4)


Changes since V2 of this patchset [1]

- Split the patches better and make them pass under checkpatch.pl
  script
- Fixed compile issues under different config options and also
  a suspend failure (as posted by Ingo at [2])
- Make load_balance_monitor thread run as real-time task,
  so that its execution is not limited by shares allocated to
  default task group (init_task_group).
- Reduced minimum shares that can be allocated to a group to 1
  (from 100). Would be usefull if someone wants a task group
  to get very low bandiwdth or get bandwidth only when other groups
  are idle.
- Removed check for tg->last_total_load check in rebalance_shares()
  (which was incorrect in V2)

Changes since V1 of this patchset [3]:

- Introduced a task_group_mutex to serialize add/removal of task groups (as 
  pointed by Dipankar)

Please apply if there are no major concerns.


References:

1. http://marc.info/?l=linux-kernel=119549585223262
2. http://lkml.org/lkml/2007/11/19/127
3. http://marc.info/?l=linux-kernel=119547452517055


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question regarding naming scheme (HP Jornada 6XX/7XX)

2007-11-25 Thread Paul Mundt
On Mon, Nov 26, 2007 at 12:03:29AM +0100, Kristoffer Ericson wrote:
> For instance an hp 620 user thought that their system was unsupported
> because everything was for '680'. Or the other way round 728 users
> didn't want to use 720 since they thought they would loose their extra
> ram (only difference between versions).
> 
How exactly is changing from 6XX to 600 going to change this? If users
are confused, then you should be documenting this distinction better and
working on clearing up the confusion. I'm all for making things obvious
to the end user, but there gets to be a point where it just becomes
silly.

> Why I want to use 600-series/700-series instead of 6XX/7XX is simply
> because 600-series/700-series leaves no doubt.
> 
Apparently your end users are more technically apt than I am, as I have
no idea how using 00 over XX makes things any less ambiguous.

We already have a 6xx mach-type that drivers can set their dependency on.
If it's not 680-only, then that's a perfectly reasonable dependency. Feel
free to change the Kconfig text to make the description more useful, but
please don't start idly shuffling around code and symbols because users
can't work out why a driver is available that they can't support.

Besides, the kernel frowns upon recursion, and all you need is to find
two equally confused users with differening viewpoints to hit imminent
death (whether self-inflicted or otherwise).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-25 Thread Peer Chen
I think the following lines are suitable for other bridges besides
nvidia's, :) :
===
+   if (pci_enable_msi_ht_cap(dev) != 0) {
+   return 0;
+   } else {
+   /* Get upstream bridge device handle */
+
+   bridge_dev = dev->bus->self;
+   while(bridge_dev != 0) {
+   if (pci_enable_msi_ht_cap(bridge_dev) !=
0) {
+   return 0;
+   } else
+   bridge_dev =
bridge_dev->bus->self;
+   }
+
+   return 1;
+   } 


BRs
Peer Chen

-Original Message-
From: Robert Hancock [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 26, 2007 2:34 AM
To: peerchen
Cc: linux-kernel; akpm; Peer Chen; Andy Currid
Subject: Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on
HT platform

peerchen wrote:
> According to the HyperTransport spec, 'En' indicate if the MSI Mapping
is active. So it should be set when enable the MSI.
> 
> The patch base on kernel 2.6.24-rc3
> 
> Signed-off-by: Andy Currid <[EMAIL PROTECTED]>
> Signed-off-by: Peer Chen <[EMAIL PROTECTED]>

Isn't there a way we can make this work for any upstream HT bridge,
rather than only for specific NVIDIA chipsets?

-- 
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED] Home Page:
http://www.roberthancock.com/

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] Export force_sig_info

2007-11-25 Thread Jeremy Kerr
Hi Andrew,

> Perhaps export it from within a powerpc-specific C file (along with
> suitable comment) to prevent people from generally relying upon the
> export?

Even better, I'll export it from a Cell-specific C file. I'll follow 
this up in my own spufs series for .25.

Cheers,


Jeremy




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/irda/parameters.c: Trivial fixes

2007-11-25 Thread Richard Knutsson

Samuel Ortiz wrote:

Hi Richard,

On Sat, Nov 24, 2007 at 09:44:05PM +0100, Richard Knutsson wrote:
  

Make a single va_start() -> va_end() path + fixing:


Ok, this should be 2 separate patches then.
  
Thought about it, but they were so simple, I believed they would better 
be merged...

The warning fixes are all good, but I fail to see the point of the va_end()
one. That doesn't seem to bring any sort of improvement while adding one
variable to the stack and one loop test. Any explanation here ?
  
Not really. Many seem to like a single return and since this made it one 
va_end() to every va_start(), I thought it would be appropriate. But if 
not, then I will only filter this hit out from the 
va_start()->va_end()-testing and get going.

I'll push the warning fix for now, thanks.
  

Alright, thank you.

Cheers,
Samuel.

  

cu
Richard Knutsson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] SO_NO_CHECK for IPv6

2007-11-25 Thread Herbert Xu
David Schwartz <[EMAIL PROTECTED]> wrote:
>
> Exactly. But *he* doesn't need to check that checksum, given that he already
> got the packet, since he has an upper-level checksum. He is not saying that
> his reasoning applies to everyone, just that it applies to him. He is not
> talking about disabling the send checksum, but the receive checksum. He
> knows that he does not need it.

You must be in some other thread because this one started with
a patch to disable sender checksums.

Oh and please do keep CCs on this list.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


irq on nforce4 and realtek RTL8168B

2007-11-25 Thread cecco
Hello,

I have a ASUS A6T with nforce4 and realtek RTL8111/8168B
ethernet. I am testing kernel 2.6.24-rc3-git1 on this
notebook and I have noticed strange behaviour with irq.
I pass hpet=force and acpi_use_timer_override to enable
apic, otherwise timer cpu interrupt is in old XT-PIC mode.
Unfortunately, irq balancing on turion X2 doesn't work
very well and there are extra timers interrupt. Realtek
RTL8168B shares irq 17 with nvidia 7600 go card and it is
not very good, infact if I don't use pci=nomsi option,
Realtek RTL8168B is up but doesn't transmit any packet. I
have following situation interrupt without pci=nomsi:

 CPU0   CPU1   
  0: 57  29452   IO-APIC-edge  timer
  1:  0331   IO-APIC-edge  i8042
  7:  1  0   IO-APIC-edge
  8:  0  2   IO-APIC-edge  rtc
  9:180193   IO-APIC-fasteoi   acpi
 12:   8291133   IO-APIC-edge  i8042
 14:  2   2810   IO-APIC-edge  libata
 15:   5186   2646   IO-APIC-edge  libata
 17:  0683   IO-APIC-fasteoi   nvidia
 18:  0  2   IO-APIC-fasteoi   ohci1394
 19:  1 45   IO-APIC-fasteoi   ohci_hcd:usb1
 20:  0  0   IO-APIC-fasteoi   sdhci:slot0
 21:  0292   IO-APIC-fasteoi   HDA Intel
221:  0  0   PCI-MSI-edge  eth1
NMI:  0  0   Non-maskable interrupts
LOC:  29452  8   Local timer interrupts
RES:   2799   4491   Rescheduling interrupts
CAL:128102   function call interrupts
TLB:354247   TLB shootdowns
TRM:  0  0   Thermal event interrupts
SPU:  0  0   Spurious interrupts
ERR:  1
MIS:  0

Besides, these problems on irq seem to break lapic with
no_hz. I don't get a working suspend memory for irq fault, 
the notebook doesn't reboot after the suspend memory.
Unfortunately, the  bios is very buggy and I believe ASUS
has to behave better with linux users. 
I invite ASUS,AMD,NVIDIA and  REALTEK manifacturers to 
offer a better support for linux, and to not violate
standard ACPI specifics.  I wish to be personally CC'ed the
answers/comments posted to the list in response to my
posting.

Thanks

Best Regards

Francesco
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)

2007-11-25 Thread Tejun Heo
Robert Hancock wrote:
> We need to run any DMA command with result taskfile requested in ADMA mode
> when the port is in ADMA mode, otherwise it may try to use the legacy DMA 
> engine
> in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
> corruption could potentially result if this happened. Also, fail any attempt 
> to
> try and issue NCQ commands with result taskfile requested, since the hardware
> doesn't allow this.
> 
> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

Acked-by: Tejun Heo <[EMAIL PROTECTED]>

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/irda/parameters.c: Trivial fixes

2007-11-25 Thread Samuel Ortiz
Hi Richard,

On Sat, Nov 24, 2007 at 09:44:05PM +0100, Richard Knutsson wrote:
> Make a single va_start() -> va_end() path + fixing:
Ok, this should be 2 separate patches then.
The warning fixes are all good, but I fail to see the point of the va_end()
one. That doesn't seem to bring any sort of improvement while adding one
variable to the stack and one loop test. Any explanation here ?

I'll push the warning fix for now, thanks.

Cheers,
Samuel.


>   CHECK   /home/kernel/src/net/irda/parameters.c
> /home/kernel/src/net/irda/parameters.c:466:2: warning: Using plain integer as 
> NULL pointer
> /home/kernel/src/net/irda/parameters.c:520:2: warning: Using plain integer as 
> NULL pointer
> /home/kernel/src/net/irda/parameters.c:573:2: warning: Using plain integer as 
> NULL pointer
> 
> Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
> ---
> Compile-tested on i386 with allyesconfig and allmodconfig.
> 
> 
> diff --git a/net/irda/parameters.c b/net/irda/parameters.c
> index 2627dad..bf19071 100644
> --- a/net/irda/parameters.c
> +++ b/net/irda/parameters.c
> @@ -368,10 +368,11 @@ int irda_param_pack(__u8 *buf, char *fmt, ...)
>   va_list args;
>   char *p;
>   int n = 0;
> + int retval = 0;
>  
>   va_start(args, fmt);
>  
> - for (p = fmt; *p != '\0'; p++) {
> + for (p = fmt; *p != '\0' && retval == 0; p++) {
>   switch (*p) {
>   case 'b':  /* 8 bits unsigned byte */
>   buf[n++] = (__u8)va_arg(args, int);
> @@ -392,13 +393,12 @@ int irda_param_pack(__u8 *buf, char *fmt, ...)
>   break;
>  #endif
>   default:
> - va_end(args);
> - return -1;
> + retval = -1;
>   }
>   }
>   va_end(args);
>  
> - return 0;
> + return retval;
>  }
>  EXPORT_SYMBOL(irda_param_pack);
>  
> @@ -411,10 +411,11 @@ static int irda_param_unpack(__u8 *buf, char *fmt, ...)
>   va_list args;
>   char *p;
>   int n = 0;
> + int retval = 0;
>  
>   va_start(args, fmt);
>  
> - for (p = fmt; *p != '\0'; p++) {
> + for (p = fmt; *p != '\0' && retval == 0; p++) {
>   switch (*p) {
>   case 'b':  /* 8 bits byte */
>   arg.ip = va_arg(args, __u32 *);
> @@ -436,14 +437,13 @@ static int irda_param_unpack(__u8 *buf, char *fmt, ...)
>   break;
>  #endif
>   default:
> - va_end(args);
> - return -1;
> + retval = -1;
>   }
>  
>   }
>   va_end(args);
>  
> - return 0;
> + return retval;
>  }
>  
>  /*
> @@ -463,7 +463,7 @@ int irda_param_insert(void *self, __u8 pi, __u8 *buf, int 
> len,
>   int n = 0;
>  
>   IRDA_ASSERT(buf != NULL, return ret;);
> - IRDA_ASSERT(info != 0, return ret;);
> + IRDA_ASSERT(info != NULL, return ret;);
>  
>   pi_minor = pi & info->pi_mask;
>   pi_major = pi >> info->pi_major_offset;
> @@ -517,7 +517,7 @@ static int irda_param_extract(void *self, __u8 *buf, int 
> len,
>   int n = 0;
>  
>   IRDA_ASSERT(buf != NULL, return ret;);
> - IRDA_ASSERT(info != 0, return ret;);
> + IRDA_ASSERT(info != NULL, return ret;);
>  
>   pi_minor = buf[n] & info->pi_mask;
>   pi_major = buf[n] >> info->pi_major_offset;
> @@ -570,7 +570,7 @@ int irda_param_extract_all(void *self, __u8 *buf, int len,
>   int n = 0;
>  
>   IRDA_ASSERT(buf != NULL, return ret;);
> - IRDA_ASSERT(info != 0, return ret;);
> + IRDA_ASSERT(info != NULL, return ret;);
>  
>   /*
>* Parse all parameters. Each parameter must be at least two bytes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9]: Reduce Log I/O latency

2007-11-25 Thread Lachlan McIlroy

Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-22 10:47:21.945395328 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_log.c  2007-11-22 10:53:11.556186722 +1100
@@ -1443,6 +1443,8 @@ xlog_sync(xlog_t  *log,
XFS_BUF_ZEROFLAGS(bp);
XFS_BUF_BUSY(bp);
XFS_BUF_ASYNC(bp);
+   XFS_BUF_SET_LOGBUF(bp);
+
/*
 * Do an ordered write for the log block.
 * Its unnecessary to flush the first split block in the log wrap case.


Whichever way you go with this one Dave you should probably add another
XFS_BUF_SET_LOGBUF() call for the buffer split case further down in the
same function.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/27] x86: debugctlmsr kconfig

2007-11-25 Thread Dave Jones
On Sun, Nov 25, 2007 at 02:08:02PM -0800, Roland McGrath wrote:
 > 
 > This adds the (internal) Kconfig macro CONFIG_X86_DEBUGCTLMSR,
 > to be defined when configuring to support only hardware that
 > definitely supports MSR_IA32_DEBUGCTLMSR with the BTF flag.
 > 
 > The Intel documentation says "P6 family" and later processors all have it.
 > I think the Kconfig dependencies are right to have it set for those and
 > unset for others (i.e., when 586 and earlier are supported).

What about the non-Intel vendors ?
Was this msr present on AMD K6 ? Geode? Winchip? VIA C3 ?
If not, then this patch isn't complete. 

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "buggy cmd640" message followed by soft lockup

2007-11-25 Thread Frans Pop
(Dropped Rafael from CC)

On Sunday 25 November 2007, Bartlomiej Zolnierkiewicz wrote:
> So either something went very very wrong or the oops itself is incorrect.
>
> Please put BUG() before the put_cmd640_reg() above so the next time
> BUG happens we will know which one is it.

I've spent quite a bit of time on this issue over the weekend and have seen 
all kinds of "interesting" behavior with various kernels with different 
debug patches, but no definite clues (except confirmation that on "good" 
boots no cmd64x hardware is detected).

At some point I scrapped the virtual machine I had been using and created a 
new one. Since then I've been unable to reproduce the problem. I'm still 
quite confused by the issue exactly because it was so consistent when it 
_did_ happen and am still not sure if it can be blamed completely on the 
quirkiness of Virtualbox.

I'll keep testing new kernels in VirtualBox and will keep alert for the 
issue, but for now I think it's best to forget about it.

Bartlomiej: thanks for your feedback and suggestions.

Cheers,
FJP
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Re: nozomi version 2.1d for review

2007-11-25 Thread Michael Lothian
Hi Frank

I was wondering if you had a git tree somewhere I could pull.

Thanks

Mike

On 11/11/2007, Frank Seidel <[EMAIL PROTECTED]> wrote:
> Hello,
>
> this one also holds the - little reworked and optimized -
> cleanup of the read/write_mem32 functions.
>
> Comments and any feedback is more than welcome.
>
> Thanks a lot - especially to Jiri, Alan and Greg,
> Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-25 Thread Rusty Russell
On Monday 26 November 2007 07:27:03 Roland Dreier wrote:
>  > This patch allows to export symbols only for specific modules by
>  > introducing symbol name spaces. A module name space has a white
>  > list of modules that are allowed to import symbols for it; all others
>  > can't use the symbols.
>  >
>  > It adds two new macros:
>  >
>  > MODULE_NAMESPACE_ALLOW(namespace, module);
>
> I definitely like the idea of organizing exported symbols into
> namespaces.  However, I feel like it would make more sense to have
> something like
>
> MODULE_NAMESPACE_IMPORT(namespace);

Except C doesn't have namespaces and this mechanism doesn't create them.  So 
this is just complete and utter makework; as I said before, noone's going to 
confuse all those udp_* functions if they're not in the udp namespace.

For better or worse, this is not C++.

Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-25 Thread Rusty Russell
On Monday 26 November 2007 07:29:39 Roland Dreier wrote:
>  > Yes, and if a symbol is already used by multiple modules, it's
>  > generically useful.  And if so, why restrict it to in-tree modules?
>
> I agree that we shouldn't make things too hard for out-of-tree
> modules, but I disagree with your first statement: there clearly is a
> large class of symbols that are used by multiple modules but which are
> not generically useful -- they are only useful by a certain small class
> of modules.

If it is so clear, you should be able to easily provide examples?

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-25 Thread Rusty Russell
On Saturday 24 November 2007 23:39:43 Andi Kleen wrote:
> On Sat, Nov 24, 2007 at 03:53:34PM +1100, Rusty Russell wrote:
> > So, you're saying that there's a problem with in-tree modules using
> > symbols they shouldn't?  Can you give an example?

[ Note: no response to this ]

> > If people aren't reviewing, this won't make them review.  I don't think
> > the
>
> With millions of LOC the primary maintainers cannot review everything.
> It's not that anybody is doing a bad job -- it is just so much code
> that explicit mechanisms are better than implicit contracts.
>
> > problem is that people are conniving to avoid review.
>
> No of course not -- it is just too much code to let everything
> be reviewed by the core subsystem maintainers. But with explicit
> marking of internal symbols they would need to look at it because
> the relationship will be clearly spelled out in the code.

No, a one-line patch adding the module to the set is all they'd see.  There's 
no reason to think this will cause more review.

> > > Several distributions have policies that require to
> > > keep the changes to these exported interfaces minimal and that
> > > is very hard with thousands of exported symbol.  With name spaces
> > > the number of truly publicly exported symbols will hopefully
> > > shrink to a much smaller, more manageable set.
> >
> > *This* makes sense.  But it's not clear that the burden should be placed
> > on kernel coders.  You can create a list yourself.  How do I tell the
> > difference between "truly publicly exported" symbols and others?
>
> Out of tree solutions generally do not scale.  Nobody else can
> keep up with 2+ Million changes each merge window.
>
> > If a symbol has more than one in-tree user, it's hard to argue against an
>
> There are still classes of drivers. e.g. for the SCSI example: SD,SG,SR
> etc. are more internal while low level drivers like aic7xxx are clearly
> external drivers.

Then mark those symbols internal and only allow concurrently-built modules to 
access them.  That's simpler and requires much less maintenance than your 
solution.

> > out-of-tree module using the symbol, unless you're arguing against *all*
> > out-of-tree modules.
>
> No, actually namespaces kind of help out of tree modules. Once they only
> use interfaces that are really generic driver interfaces and fairly stable
> their authors will have much less pain forward porting to newer kernel
> version. But currently the authors cannot even know what is an instable
> internal interface and what is a generic relatively stable driver level
> interface. Namespaces are a mechanism to make this all explicit.

So in your head you have a notion of a kernel API, and you're trying to make 
that API explicit in the code.

Sorry, but no.
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth ethernet driver & Low power state

2007-11-25 Thread Denys Vlasenko
On Sunday 25 November 2007 10:59, Jeroen wrote:
> On Nov 25, 2007 7:36 PM, Robert Hancock <[EMAIL PROTECTED]> wrote:
> > Are you sure forcedeth even supports that feature? I haven't seen any
> > code for it, and certainly it should never be enabled by default..
>
> The windows driver does. I have to disable it because otherwise I have
> lot's of connection speed troubles. This is also what i see when I use a
> linux distro on the server unfortunately I can't disable it.

You need to prepare more extensive bug report, for starters.
What are "connection speed troubles"? Which kernel version?
Do you see any "interesting" messages in kernel log?
lspci output? ethtool output? etc...
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iwlwifi: remove redundant declaration of 'iwl3945_priv' and 'iwl4965_priv' structs

2007-11-25 Thread Zhu Yi

On Sun, 2007-11-25 at 15:58 +0100, Miguel Botón wrote:
> This patch removes a redundant declaration of 'iwl3945_priv' and
> 'iwl4965_priv' structs.
> 
> Signed-off-by: Miguel Boton <[EMAIL PROTECTED]>

ACK.

Thanks,
-yi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/27] x86: debugctlmsr kconfig

2007-11-25 Thread Denys Vlasenko
On Sunday 25 November 2007 14:08, Roland McGrath wrote:
> This adds the (internal) Kconfig macro CONFIG_X86_DEBUGCTLMSR,
> to be defined when configuring to support only hardware that
> definitely supports MSR_IA32_DEBUGCTLMSR with the BTF flag.
>
> The Intel documentation says "P6 family" and later processors all have it.
> I think the Kconfig dependencies are right to have it set for those and
> unset for others (i.e., when 586 and earlier are supported).
>
> +config X86_DEBUGCTLMSR
> + bool
> + depends on !(M586MMX || M586TSC || M586 || M486 || M386)
> + default y

Why is it defined in configuration system instead of some *.h file?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Alan Cox

> mc68020+  No  No
> (mc68000/010  No  2)  (not for Linux)

Actually ucLinux has been persuaded to run on m68000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


setting the init process's personality?

2007-11-25 Thread David Madore
Hi,

Is there a simple way (via a kernel boot option or config setting or -
if really necessary - a patch or something like that) to set the
personality for the init process?  I'm running an x86_64 kernel on a
system whose userland is almost entirely 32-bits (but needs an
occasional 64-bit process to be run, hence the choice of kernel), and
I'd like `uname -m` to be i686 unless I take special action.  So I
think that means letting init (which is indeed a 32-bit process) have
the PER_LINUX32 personality (in case I'm wrong about this, the output
of uname -m is essentially what matters to me).

So, where does the default come from?

-- 
 David A. Madore
([EMAIL PROTECTED],
 http://www.madore.org/~david/ )
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Update REPORTING-BUGS

2007-11-25 Thread Rafael J. Wysocki
On Monday, 26 of November 2007, Adrian Bunk wrote:
> On Mon, Nov 26, 2007 at 01:04:25AM +0100, Rafael J. Wysocki wrote:
> > On Monday, 26 of November 2007, Adrian Bunk wrote:
> > > On Mon, Nov 26, 2007 at 12:00:28AM +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > >...
> > > > > I don't care whether that's done with Bugzilla, some email based bug 
> > > > > tracker like the Debian bug tracker, someone putting emails manually 
> > > > > into some bug tracker like you are doing, or whatever else.
> > > > 
> > > > That last solution doesn't scale very well ...
> > > > 
> > > > How about using the system in which it's possible to report bugs using 
> > > > both
> > > > email and a web interface?
> > > > 
> > > > We can request that the address of the bug tracker be added to the Cc 
> > > > lists of
> > > > bug reports sent by email and we can make it resend reports filed with 
> > > > it to
> > > > the appropriate mailing lists and with the appropriate email headers.  
> > > > This is
> > > > technically doable.
> > > 
> > > You are trying to solve something that is not a problem.
> > 
> > It _is_ a problem, because many bug are reported using email and not really
> > tracked.  The ones that I manually put into the Bugzilla are the tip of the
> > iceberg (and BTW I'd prefer not to have to do that manually).
> > 
> > Every bug reported by email and not responded to by the right people, that 
> > is
> > not a recent regression, is currently lost.  I'd like to avoid that, if 
> > possible.
> 
> This is solved by many other projects by asking the submitter to open a 
> bug for the issue when he sends it in an email.
> 
> The submitter then simply copies the information from his email to his 
> newly opened bug in the bug tracker.
> 
> -> no problem
> 
> > > It does not matter which medium we choose for getting bug reports.
> > 
> > [Well, you said that we should use a web interface for that. ;-)]
> 
> I said a web interface is not worse than via email.
> And it's enough.
> 
> (And I e.g. wouldn't oppose using the Debian bug tracker where the web 
>  interface only allows reading and everything has to be done via email
>  if all kernel maintainers would agree to use this.)
> 
> > No, it doesn't, as long as the bug reports reach the right place.  Now, the
> > question is what's that.
> > 
> > IMO, ideally, for each subsystem there should be a mailing list to send bug
> > reports to.  The Bugzilla should forward the reports to these lists.  On 
> > every
> > such list there should be (at least) one person responsible for responding 
> > to
> > the bug reports, if no one else responds first, and for forwarding the 
> > reports
> > to the appropriate developers.  This person should also be responsible for
> > monitoring the status of each bug report sent to his/her list.
> 
> After all discussions about crazy bug tracker features we are back at 
> the real problem:

We started to discuss them, because you argued that the Bugzilla in its current
shape was sufficient, which I didn't agree with and tried to give some
arguments.

> Where do we find the tree these people grow on?

That's a good question, but either we find these people, or we'll start losing
users at growing rates.

I'm afraid that's already happening ...

> > _Every_ bug report sent (including invalid ones) should be recorded in a bug
> > tracking system (be it the Bugzilla or whatever else) along with all of it's
> > history (at least, refernces to the bug's history should be stored), no 
> > matter
> > how it's been handled.  Moreover, a bug can only be resolved as "fixed" if
> > there's a pointer to the exact commit fixing it in the bug's history.
> 
> And back we are at crazy bug tracker features...

No, they are not bug tracker features, but parts of a process that I think we
should have in place.

> > > The only thing that matters is that we get bug reports resolved within a 
> > > reasonable amount of time.
> > 
> > I'm not sure if that's generally possible:
> > - What about the bugs that take 2 weeks or more to reproduce?
> > - What about the bugs that we _don't_ _know_ how to fix?
> 
> We will never get 100% of all bugs fixed.
> 
> Let's get back to the fact that we have many bug reports that could be 
> fixed within a reasonable amount of time but are not.

Do you have specific examples?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ipmi_watchdog can not reset the kernel panic machine

2007-11-25 Thread Corey Minyard
The watchdog is "off" by default, meaning that you have to have 
something actually start resetting the watchdog before it will start 
running.  That's why you are seeing this behavior.


There is a start_now option that will start the watchdog when it is 
loaded, but then it will reset the system unless something resets the 
watchdog periodically, and you have a limited time to start this operation.


On a panic, the IPMI driver attempts to preserve the state of the 
watchdog and (if running) increase the timeout time to allow a kdump or 
something like that to occur.  That's the purpose of the code you 
reference.  It is not to start a reset operation on any panic.  It used 
to start a reset on every panic, but that cause problems for many users.


-corey

Andrew Morton wrote:

(cc's added)

On Fri, 23 Nov 2007 20:28:41 -0800 (PST) [EMAIL PROTECTED] wrote:

  

Build kernel-2.6.24-rc3.  pmi_watchdog can not reset the kernel panic
machine.  The watchdog can never to record panic information to IPMI SEL.

1. I disable auto reset when kernel panic by echo "0" >
/proc/sys/kernel/panic

2.  modprobe ipmi_watchdog timeout=120 action=reset

3.  Load a driver, the driver will call panic() when  ioctl to call into
the driver.

4.  By ioctl call into the driver,  panic the system.

in wdog_panic_handler, I printk "ipmi_watchdog_state=WDOG_TIMEOUT_NONE"
so, the watchdog can never to record panic information to IPMI SEL.


static int wdog_panic_handler(struct notifier_block *this,
  unsigned long event,
  void  *unused)
{
static int panic_event_handled = 0;

/* On a panic, if we have a panic timeout, make sure to extend
   the watchdog timer to a reasonable value to complete the
   panic, if the watchdog timer is running.  Plus the
   pretimeout is meaningless at panic time. */
if (watchdog_user && !panic_event_handled &&
ipmi_watchdog_state != WDOG_TIMEOUT_NONE) {
/* Make sure we do this only once. */
panic_event_handled = 1;

timeout = 255;
pretimeout = 0;
panic_halt_ipmi_set_timeout();
}

return NOTIFY_OK;
}



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Rafael J. Wysocki
On Monday, 26 of November 2007, Adrian Bunk wrote:
> On Mon, Nov 26, 2007 at 12:28:17AM +0100, Rafael J. Wysocki wrote:
> > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > On Sun, Nov 25, 2007 at 11:38:59PM +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > > > On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > > > >..
> > > > > > > First of all, Bugzilla is a quite often used bug tracker in the 
> > > > > > > open 
> > > > > > > source world [1], so many users already know it.
> > > > > > > 
> > > > > > > But more important, "it pretends to require them to spend" isn't 
> > > > > > > true 
> > > > > > > because there's no pretending - we actually often require bug 
> > > > > > > reporters 
> > > > > > > to spend a lot of time on the bug report (e.g. when asking for 
> > > > > > > bisecting).
> > > > > > 
> > > > > > But not *initially*.
> > > > > > 
> > > > > > We should not confuse *debugging* with *reporting bugs*.  While the 
> > > > > > former is
> > > > > > actually more difficult and more time consuming than writing the 
> > > > > > code in which
> > > > > > the bug is present, the latter should be as simple as sending an 
> > > > > > email.
> > > > > 
> > > > > For hardcore geeks like you and me sending an email might be easier 
> > > > > than 
> > > > > using some web interface.
> > > > > 
> > > > > Normal humans tend to be more accustomed to web interfaces, and 
> > > > > following the instructions on some web page is _much_ easier than 
> > > > > reading three text files for knowing what to write in an email.
> > > > 
> > > > Hm, this is a good argument for having such a web interface, but IMO it
> > > > shouldn't be mandatory.  IOW, there should be a way to report a bug 
> > > > using plain
> > > > email, if the reporter prefers that.  We can, however, request that the 
> > > > address
> > > > of our bug tracking system be added to the report's Cc list.
> > > 
> > > Looking at both other open source projects and the support of commercial 
> > > software a web interface should be enough.
> > 
> > Well, IMHO the Linux kernel is exceptional in many ways ...
> 
> If your goal is not to solve our problems with bug handling but trying 
> to maximize the "being different" factor...
> 
> > > But this is not the problem - the problem is what happens after the 
> > > initial report with the bug report.
> > 
> > Not only that.
> > 
> > First, each bug report has to reach the right lists/people and that's what 
> > we
> > can't assure using the Bugzilla alone right now.  To make the Bugzilla
> > generally useful for that we need to change the way in which the target of 
> > the
> > report is selected and make it send reports to mailing lists rather than to
> > individual people.
> 
> In recent years, the default assignees of changed or new components in 
> the kernel Bugzilla have been pseudo addresses, and you can subscribe a 
> mailing list (like any other email address) to get copies of the emails 
> going to this pseudo address.

OK

Why haven't they been subscribed already, then?

I think you would agree that right now the choice of subsystems in the Bugzilla
doesn't reflect the current status of the kernel (some subsystems should be
added, some should be called differently, some should be moved to different
places etc.) and some addresses to which the bug reports are assigned by
default are not the best ones ...

> > Second, once the bug report have reached the right place, we have two 
> > problems
> > to solve:
> > (1) we need to make the developers respond and actively work on the bug
> 
> This is the one problem we have.
>
> > (2) we need to make the tracking of the bug possibly unintrusive (ie.
> > developers should be able to work with the reporter in a way that *they*
> > prefer)
> > While it's generally difficult to solve (1), we can at least make (2) happen
> > (well, in theory).
> 
> For normal communication (2) already works in the kernel Bugzilla.
>
> > > > Now, the question is what information this web interface should ask for.
> > > > 
> > > > IMO, first, it should ask for what the bug is against, ie.:
> > > > - kernel version (to be obtained from 'git describe' or from 
> > > > /proc/version or
> > > >   from .config, if the kernel doesn't boot)
> > > > - architecture (x86, ARM, MIPS etc.)
> > > > - subsystem and subsubsystem (that could be selectable from a menu and 
> > > > might
> > > >   depend on the architecture)
> > > > 
> > > > It also should ask if the problem is a regression and what was the last 
> > > > known
> > > > good kernel (I'd prefer that to be the last known major release 
> > > > selectable from
> > > > a list).
> > > > 
> > > > Also, the reporter should be required to provide a summary (subject) and
> > > > a (concise) description of the problem and a list of email addresses to
> > > > send the report to in addition 

Re: [RFC][PATCH] Update REPORTING-BUGS

2007-11-25 Thread Adrian Bunk
On Mon, Nov 26, 2007 at 01:04:25AM +0100, Rafael J. Wysocki wrote:
> On Monday, 26 of November 2007, Adrian Bunk wrote:
> > On Mon, Nov 26, 2007 at 12:00:28AM +0100, Rafael J. Wysocki wrote:
> > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > >...
> > > > I don't care whether that's done with Bugzilla, some email based bug 
> > > > tracker like the Debian bug tracker, someone putting emails manually 
> > > > into some bug tracker like you are doing, or whatever else.
> > > 
> > > That last solution doesn't scale very well ...
> > > 
> > > How about using the system in which it's possible to report bugs using 
> > > both
> > > email and a web interface?
> > > 
> > > We can request that the address of the bug tracker be added to the Cc 
> > > lists of
> > > bug reports sent by email and we can make it resend reports filed with it 
> > > to
> > > the appropriate mailing lists and with the appropriate email headers.  
> > > This is
> > > technically doable.
> > 
> > You are trying to solve something that is not a problem.
> 
> It _is_ a problem, because many bug are reported using email and not really
> tracked.  The ones that I manually put into the Bugzilla are the tip of the
> iceberg (and BTW I'd prefer not to have to do that manually).
> 
> Every bug reported by email and not responded to by the right people, that is
> not a recent regression, is currently lost.  I'd like to avoid that, if 
> possible.

This is solved by many other projects by asking the submitter to open a 
bug for the issue when he sends it in an email.

The submitter then simply copies the information from his email to his 
newly opened bug in the bug tracker.

-> no problem

> > It does not matter which medium we choose for getting bug reports.
> 
> [Well, you said that we should use a web interface for that. ;-)]

I said a web interface is not worse than via email.
And it's enough.

(And I e.g. wouldn't oppose using the Debian bug tracker where the web 
 interface only allows reading and everything has to be done via email
 if all kernel maintainers would agree to use this.)

> No, it doesn't, as long as the bug reports reach the right place.  Now, the
> question is what's that.
> 
> IMO, ideally, for each subsystem there should be a mailing list to send bug
> reports to.  The Bugzilla should forward the reports to these lists.  On every
> such list there should be (at least) one person responsible for responding to
> the bug reports, if no one else responds first, and for forwarding the reports
> to the appropriate developers.  This person should also be responsible for
> monitoring the status of each bug report sent to his/her list.

After all discussions about crazy bug tracker features we are back at 
the real problem:

Where do we find the tree these people grow on?

> _Every_ bug report sent (including invalid ones) should be recorded in a bug
> tracking system (be it the Bugzilla or whatever else) along with all of it's
> history (at least, refernces to the bug's history should be stored), no matter
> how it's been handled.  Moreover, a bug can only be resolved as "fixed" if
> there's a pointer to the exact commit fixing it in the bug's history.

And back we are at crazy bug tracker features...

> > The only thing that matters is that we get bug reports resolved within a 
> > reasonable amount of time.
> 
> I'm not sure if that's generally possible:
> - What about the bugs that take 2 weeks or more to reproduce?
> - What about the bugs that we _don't_ _know_ how to fix?

We will never get 100% of all bugs fixed.

Let's get back to the fact that we have many bug reports that could be 
fixed within a reasonable amount of time but are not.

> Rafael

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Update REPORTING-BUGS

2007-11-25 Thread Rafael J. Wysocki
On Monday, 26 of November 2007, Adrian Bunk wrote:
> On Mon, Nov 26, 2007 at 12:00:28AM +0100, Rafael J. Wysocki wrote:
> > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> >...
> > > I don't care whether that's done with Bugzilla, some email based bug 
> > > tracker like the Debian bug tracker, someone putting emails manually 
> > > into some bug tracker like you are doing, or whatever else.
> > 
> > That last solution doesn't scale very well ...
> > 
> > How about using the system in which it's possible to report bugs using both
> > email and a web interface?
> > 
> > We can request that the address of the bug tracker be added to the Cc lists 
> > of
> > bug reports sent by email and we can make it resend reports filed with it to
> > the appropriate mailing lists and with the appropriate email headers.  This 
> > is
> > technically doable.
> 
> You are trying to solve something that is not a problem.

It _is_ a problem, because many bug are reported using email and not really
tracked.  The ones that I manually put into the Bugzilla are the tip of the
iceberg (and BTW I'd prefer not to have to do that manually).

Every bug reported by email and not responded to by the right people, that is
not a recent regression, is currently lost.  I'd like to avoid that, if 
possible.

> It does not matter which medium we choose for getting bug reports.

[Well, you said that we should use a web interface for that. ;-)]

No, it doesn't, as long as the bug reports reach the right place.  Now, the
question is what's that.

IMO, ideally, for each subsystem there should be a mailing list to send bug
reports to.  The Bugzilla should forward the reports to these lists.  On every
such list there should be (at least) one person responsible for responding to
the bug reports, if no one else responds first, and for forwarding the reports
to the appropriate developers.  This person should also be responsible for
monitoring the status of each bug report sent to his/her list.

_Every_ bug report sent (including invalid ones) should be recorded in a bug
tracking system (be it the Bugzilla or whatever else) along with all of it's
history (at least, refernces to the bug's history should be stored), no matter
how it's been handled.  Moreover, a bug can only be resolved as "fixed" if
there's a pointer to the exact commit fixing it in the bug's history.

> The only thing that matters is that we get bug reports resolved within a 
> reasonable amount of time.

I'm not sure if that's generally possible:
- What about the bugs that take 2 weeks or more to reproduce?
- What about the bugs that we _don't_ _know_ how to fix?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH] drm: Fix for non-coherent DMA PowerPC

2007-11-25 Thread Benjamin Herrenschmidt
This patch fixes bits of the DRM so to make the radeon DRI work on
non-cache coherent PCI DMA variants of the PowerPC processors.

It moves the few places that needs change to wrappers to that
other architectures with similar issues can easily add their
own changes to those wrappers, at least until we have more useful
generic kernel API.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

 drivers/char/drm/ati_pcigart.c |6 ++
 drivers/char/drm/drm_scatter.c |   12 +++-
 drivers/char/drm/drm_vm.c  |   20 +++-
 3 files changed, 32 insertions(+), 6 deletions(-)

Index: linux-work/drivers/char/drm/ati_pcigart.c
===
--- linux-work.orig/drivers/char/drm/ati_pcigart.c  2007-11-26 
10:07:29.0 +1100
+++ linux-work/drivers/char/drm/ati_pcigart.c   2007-11-26 10:21:33.0 
+1100
@@ -214,6 +214,12 @@ int drm_ati_pcigart_init(struct drm_devi
}
}
 
+   if (gart_info->gart_table_location == DRM_ATI_GART_MAIN)
+   dma_sync_single_for_device(>pdev->dev,
+  bus_address,
+  max_pages * sizeof(u32),
+  PCI_DMA_TODEVICE);
+
ret = 1;
 
 #if defined(__i386__) || defined(__x86_64__)
Index: linux-work/drivers/char/drm/drm_scatter.c
===
--- linux-work.orig/drivers/char/drm/drm_scatter.c  2007-11-26 
10:07:29.0 +1100
+++ linux-work/drivers/char/drm/drm_scatter.c   2007-11-26 10:20:08.0 
+1100
@@ -36,6 +36,16 @@
 
 #define DEBUG_SCATTER 0
 
+static inline void *drm_vmalloc_dma(unsigned long size)
+{
+#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
+   return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM,
+PAGE_KERNEL | _PAGE_NO_CACHE);
+#else
+   return vmalloc_32(size);
+#endif
+}
+
 void drm_sg_cleanup(struct drm_sg_mem * entry)
 {
struct page *page;
@@ -104,7 +114,7 @@ int drm_sg_alloc(struct drm_device *dev,
}
memset((void *)entry->busaddr, 0, pages * sizeof(*entry->busaddr));
 
-   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
+   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
if (!entry->virtual) {
drm_free(entry->busaddr,
 entry->pages * sizeof(*entry->busaddr), DRM_MEM_PAGES);
Index: linux-work/drivers/char/drm/drm_vm.c
===
--- linux-work.orig/drivers/char/drm/drm_vm.c   2007-11-26 10:07:29.0 
+1100
+++ linux-work/drivers/char/drm/drm_vm.c2007-11-26 10:11:09.0 
+1100
@@ -54,13 +54,24 @@ static pgprot_t drm_io_prot(uint32_t map
pgprot_val(tmp) |= _PAGE_NO_CACHE;
if (map_type == _DRM_REGISTERS)
pgprot_val(tmp) |= _PAGE_GUARDED;
-#endif
-#if defined(__ia64__)
+#elif defined(__ia64__)
if (efi_range_is_wc(vma->vm_start, vma->vm_end -
vma->vm_start))
tmp = pgprot_writecombine(tmp);
else
tmp = pgprot_noncached(tmp);
+#elif defined(__sparc__)
+   tmp = pgprot_noncached(tmp);
+#endif
+   return tmp;
+}
+
+static pgprot_t drm_dma_prot(uint32_t map_type, struct vm_area_struct *vma)
+{
+   pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
+
+#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
+   tmp |= _PAGE_NO_CACHE;
 #endif
return tmp;
 }
@@ -617,9 +628,6 @@ static int drm_mmap_locked(struct file *
offset = dev->driver->get_reg_ofs(dev);
vma->vm_flags |= VM_IO; /* not in core dump */
vma->vm_page_prot = drm_io_prot(map->type, vma);
-#ifdef __sparc__
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-#endif
if (io_remap_pfn_range(vma, vma->vm_start,
   (map->offset + offset) >> PAGE_SHIFT,
   vma->vm_end - vma->vm_start,
@@ -638,6 +646,7 @@ static int drm_mmap_locked(struct file *
page_to_pfn(virt_to_page(map->handle)),
vma->vm_end - vma->vm_start, vma->vm_page_prot))
return -EAGAIN;
+   vma->vm_page_prot = drm_dma_prot(map->type, vma);
/* fall through to _DRM_SHM */
case _DRM_SHM:
vma->vm_ops = _vm_shm_ops;
@@ -650,6 +659,7 @@ static int drm_mmap_locked(struct file *
vma->vm_ops = _vm_sg_ops;
vma->vm_private_data = (void *)map;
vma->vm_flags |= VM_RESERVED;
+   vma->vm_page_prot = drm_dma_prot(map->type, vma);
break;
default:
return -EINVAL; /* This should never happen. */
-
To unsubscribe from this list: send the line "unsubscribe 

Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Adrian Bunk
On Mon, Nov 26, 2007 at 12:28:17AM +0100, Rafael J. Wysocki wrote:
> On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > On Sun, Nov 25, 2007 at 11:38:59PM +0100, Rafael J. Wysocki wrote:
> > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > > On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > > > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > > >..
> > > > > > First of all, Bugzilla is a quite often used bug tracker in the 
> > > > > > open 
> > > > > > source world [1], so many users already know it.
> > > > > > 
> > > > > > But more important, "it pretends to require them to spend" isn't 
> > > > > > true 
> > > > > > because there's no pretending - we actually often require bug 
> > > > > > reporters 
> > > > > > to spend a lot of time on the bug report (e.g. when asking for 
> > > > > > bisecting).
> > > > > 
> > > > > But not *initially*.
> > > > > 
> > > > > We should not confuse *debugging* with *reporting bugs*.  While the 
> > > > > former is
> > > > > actually more difficult and more time consuming than writing the code 
> > > > > in which
> > > > > the bug is present, the latter should be as simple as sending an 
> > > > > email.
> > > > 
> > > > For hardcore geeks like you and me sending an email might be easier 
> > > > than 
> > > > using some web interface.
> > > > 
> > > > Normal humans tend to be more accustomed to web interfaces, and 
> > > > following the instructions on some web page is _much_ easier than 
> > > > reading three text files for knowing what to write in an email.
> > > 
> > > Hm, this is a good argument for having such a web interface, but IMO it
> > > shouldn't be mandatory.  IOW, there should be a way to report a bug using 
> > > plain
> > > email, if the reporter prefers that.  We can, however, request that the 
> > > address
> > > of our bug tracking system be added to the report's Cc list.
> > 
> > Looking at both other open source projects and the support of commercial 
> > software a web interface should be enough.
> 
> Well, IMHO the Linux kernel is exceptional in many ways ...

If your goal is not to solve our problems with bug handling but trying 
to maximize the "being different" factor...

> > But this is not the problem - the problem is what happens after the 
> > initial report with the bug report.
> 
> Not only that.
> 
> First, each bug report has to reach the right lists/people and that's what we
> can't assure using the Bugzilla alone right now.  To make the Bugzilla
> generally useful for that we need to change the way in which the target of the
> report is selected and make it send reports to mailing lists rather than to
> individual people.

In recent years, the default assignees of changed or new components in 
the kernel Bugzilla have been pseudo addresses, and you can subscribe a 
mailing list (like any other email address) to get copies of the emails 
going to this pseudo address.

> Second, once the bug report have reached the right place, we have two problems
> to solve:
> (1) we need to make the developers respond and actively work on the bug

This is the one problem we have.

> (2) we need to make the tracking of the bug possibly unintrusive (ie.
> developers should be able to work with the reporter in a way that *they*
> prefer)
> While it's generally difficult to solve (1), we can at least make (2) happen
> (well, in theory).

For normal communication (2) already works in the kernel Bugzilla.

> > > Now, the question is what information this web interface should ask for.
> > > 
> > > IMO, first, it should ask for what the bug is against, ie.:
> > > - kernel version (to be obtained from 'git describe' or from 
> > > /proc/version or
> > >   from .config, if the kernel doesn't boot)
> > > - architecture (x86, ARM, MIPS etc.)
> > > - subsystem and subsubsystem (that could be selectable from a menu and 
> > > might
> > >   depend on the architecture)
> > > 
> > > It also should ask if the problem is a regression and what was the last 
> > > known
> > > good kernel (I'd prefer that to be the last known major release 
> > > selectable from
> > > a list).
> > > 
> > > Also, the reporter should be required to provide a summary (subject) and
> > > a (concise) description of the problem and a list of email addresses to
> > > send the report to in addition to the regular handling (there should be a 
> > > way
> > > to verify which addresses are acceptable).
> > > 
> > > Anything else?
> > > 
> > > Next, the report should be sent to a mailing list selected on the basis 
> > > of the
> > > information provided (not necessarily to individual developers, unless 
> > > there
> > > are some addresses provided explicitly by the reporter).
> > 
> > The architecture choice seems to be the only thing from your list that
> > isn't already available in the "Enter a new bug report" dialog of the
> > kernel Bugzilla.
> 
> Yet, the architecture choice affects the way in which the other choices are
> made.

I can 

Re: [PATCH 1/9]: introduce radix_tree_gang_lookup_range

2007-11-25 Thread David Chinner
On Mon, Nov 26, 2007 at 10:17:24AM +1100, Nick Piggin wrote:
> On Thursday 22 November 2007 11:32, David Chinner wrote:
> > Introduce radix_tree_gang_lookup_range()
> >
> > The inode clustering in XFS requires a gang lookup on the radix tree to
> > find all the inodes in the cluster.  The gang lookup has to set the
> > maximum items to that of a fully populated cluster so we get all the
> > inodes in the cluster, but we only populate the radix tree sparsely (on
> > demand).
> >
> > As a result, the gang lookup can search way, way past the index of end
> > of the cluster because it is looking for a fixed number of entries to
> > return.
> >
> > We know we want to terminate the search at either a specific index or a
> > maximum number of items, so we need to add a "last_index" parameter to
> > the lookup.
> 
> Yeah, this fixes one downside of the gang lookup API. For consistency
> it would be nice to do this for the tag lookup API as well...

Sure, I have need to do that as well. ;)

> > Furthermore, the existing radix_tree_gang_lookup() can use this same
> > function if we define a RADIX_TREE_MAX_INDEX value so the search is not
> > limited by the last_index.
> 
> Nit: should just define it to be ULONG_MAX.

Oh, right. Silly me. I'll post updated radix tree patches later today.

Thanks, Nick.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 (gentoo + grsec) kernel BUG at mm/mlock.c:205!

2007-11-25 Thread Arjan van de Ven
On Sun, 25 Nov 2007 20:36:04 +0100
Mathias Kretschmer <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> this is a x86_64 kernel with 4GB of RAM. incident happened when
> compiling cdrecord (or some variant of it :) in a 32-bit chroot jail
> during the 'configure' process.
> 
> alpha / # uname -a
> Linux alpha 2.6.22-hardened-r8 #10 SMP Sun Nov 25 12:52:39 CET 2007
> x86_64 AMD Processor model unknown AuthenticAMD GNU/Linux
> 
> Let me know, if you need for info.
> 

you have both a heavily patched kernel and a tainted kernel due to
binary kernel modules sounds like you're best of contacting the
support side of whoever gave you the patches and/or the binary module;
I don't think there's much lkml can do for you.


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] make I/O schedulers non-modular

2007-11-25 Thread Arjan van de Ven
On Sun, 25 Nov 2007 17:56:54 +0100
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> Is there any technical reason why we need 4 different schedulers at
> all?
> 

there is at least one technical reason to need more than one: certain
types of storage (both big EMC boxes as well as solid state disks)
don't behave like disks and have no seek penalty; any cpu time spent on
avoiding seeks is wasted on those, so for these devices one really
wants to use a different IO scheduler, one which is much lighter weight
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] power: use kasprintf

2007-11-25 Thread Anton Vorontsov
On Sat, Nov 17, 2007 at 07:55:58PM +0900, Akinobu Mita wrote:
> Use kasprintf instead of kmalloc()-strcpy()-strcat().

Applied to battery-2.6.git, thanks.

> Cc: Anton Vorontsov <[EMAIL PROTECTED]>
> Cc: David Woodhouse <[EMAIL PROTECTED]>
> Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]>
> 
> ---
>  drivers/power/power_supply_leds.c |   25 +++--
>  1 file changed, 7 insertions(+), 18 deletions(-)
> 
> Index: 2.6-mm/drivers/power/power_supply_leds.c
> ===
> --- 2.6-mm.orig/drivers/power/power_supply_leds.c
> +++ 2.6-mm/drivers/power/power_supply_leds.c
> @@ -10,6 +10,7 @@
>   *  You may use this code as per GPL version 2
>   */
>  
> +#include 
>  #include 
>  
>  #include "power_supply.h"
> @@ -48,28 +49,20 @@ static int power_supply_create_bat_trigg
>  {
>   int rc = 0;
>  
> - psy->charging_full_trig_name = kmalloc(strlen(psy->name) +
> -   sizeof("-charging-or-full"), GFP_KERNEL);
> + psy->charging_full_trig_name = kasprintf(GFP_KERNEL,
> + "%s-charging-or-full", psy->name);
>   if (!psy->charging_full_trig_name)
>   goto charging_full_failed;
>  
> - psy->charging_trig_name = kmalloc(strlen(psy->name) +
> -   sizeof("-charging"), GFP_KERNEL);
> + psy->charging_trig_name = kasprintf(GFP_KERNEL,
> + "%s-charging", psy->name);
>   if (!psy->charging_trig_name)
>   goto charging_failed;
>  
> - psy->full_trig_name = kmalloc(strlen(psy->name) +
> -   sizeof("-full"), GFP_KERNEL);
> + psy->full_trig_name = kasprintf(GFP_KERNEL, "%s-full", psy->name);
>   if (!psy->full_trig_name)
>   goto full_failed;
>  
> - strcpy(psy->charging_full_trig_name, psy->name);
> - strcat(psy->charging_full_trig_name, "-charging-or-full");
> - strcpy(psy->charging_trig_name, psy->name);
> - strcat(psy->charging_trig_name, "-charging");
> - strcpy(psy->full_trig_name, psy->name);
> - strcat(psy->full_trig_name, "-full");
> -
>   led_trigger_register_simple(psy->charging_full_trig_name,
>   >charging_full_trig);
>   led_trigger_register_simple(psy->charging_trig_name,
> @@ -120,14 +113,10 @@ static int power_supply_create_gen_trigg
>  {
>   int rc = 0;
>  
> - psy->online_trig_name = kmalloc(strlen(psy->name) + sizeof("-online"),
> - GFP_KERNEL);
> + psy->online_trig_name = kasprintf(GFP_KERNEL, "%s-online", psy->name);
>   if (!psy->online_trig_name)
>   goto online_failed;
>  
> - strcpy(psy->online_trig_name, psy->name);
> - strcat(psy->online_trig_name, "-online");
> -
>   led_trigger_register_simple(psy->online_trig_name, >online_trig);
>  
>   goto success;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] power_supply_{leds,sysfs}.c should #include "power_supply.h"

2007-11-25 Thread Anton Vorontsov
On Mon, Nov 05, 2007 at 06:07:45PM +0100, Adrian Bunk wrote:
> Every file should include the headers containing the prototypes for
> its global functions.

Applied to battery-2.6.git, thanks.

p.s.
Sorry for the delay, I've not been Cc'ed, so I've found out about
that patch by pure chance (through looking in the -mm series).

> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/power/power_supply_leds.c  |2 ++
>  drivers/power/power_supply_sysfs.c |2 ++
>  2 files changed, 4 insertions(+)
> 
> e34cc994731ec9102bf5b1c7d6585c0aa87d1fa2 
> diff --git a/drivers/power/power_supply_leds.c 
> b/drivers/power/power_supply_leds.c
> index 7f8f359..80ca288 100644
> --- a/drivers/power/power_supply_leds.c
> +++ b/drivers/power/power_supply_leds.c
> @@ -12,6 +12,8 @@
>  
>  #include 
>  
> +#include "power_supply.h"
> +
>  /* Battery specific LEDs triggers. */
>  
>  static void power_supply_update_bat_leds(struct power_supply *psy)
> diff --git a/drivers/power/power_supply_sysfs.c 
> b/drivers/power/power_supply_sysfs.c
> index 249f61b..e8ad1fd 100644
> --- a/drivers/power/power_supply_sysfs.c
> +++ b/drivers/power/power_supply_sysfs.c
> @@ -14,6 +14,8 @@
>  #include 
>  #include 
>  
> +#include "power_supply.h"
> +
>  /*
>   * This is because the name "current" breaks the device attr macro.
>   * The "current" word resolves to "(get_current())" so instead of
> 

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9]: introduce radix_tree_gang_lookup_range

2007-11-25 Thread Nick Piggin
On Thursday 22 November 2007 11:32, David Chinner wrote:
> Introduce radix_tree_gang_lookup_range()
>
> The inode clustering in XFS requires a gang lookup on the radix tree to
> find all the inodes in the cluster.  The gang lookup has to set the
> maximum items to that of a fully populated cluster so we get all the
> inodes in the cluster, but we only populate the radix tree sparsely (on
> demand).
>
> As a result, the gang lookup can search way, way past the index of end
> of the cluster because it is looking for a fixed number of entries to
> return.
>
> We know we want to terminate the search at either a specific index or a
> maximum number of items, so we need to add a "last_index" parameter to
> the lookup.

Yeah, this fixes one downside of the gang lookup API. For consistency
it would be nice to do this for the tag lookup API as well...


> Furthermore, the existing radix_tree_gang_lookup() can use this same
> function if we define a RADIX_TREE_MAX_INDEX value so the search is not
> limited by the last_index.

Nit: should just define it to be ULONG_MAX.

>
> Signed-off-by: Dave Chinner <[EMAIL PROTECTED]>

Otherwise, Acked-by: Nick Piggin <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Update REPORTING-BUGS

2007-11-25 Thread Adrian Bunk
On Mon, Nov 26, 2007 at 12:00:28AM +0100, Rafael J. Wysocki wrote:
> On Sunday, 25 of November 2007, Adrian Bunk wrote:
>...
> > I don't care whether that's done with Bugzilla, some email based bug 
> > tracker like the Debian bug tracker, someone putting emails manually 
> > into some bug tracker like you are doing, or whatever else.
> 
> That last solution doesn't scale very well ...
> 
> How about using the system in which it's possible to report bugs using both
> email and a web interface?
> 
> We can request that the address of the bug tracker be added to the Cc lists of
> bug reports sent by email and we can make it resend reports filed with it to
> the appropriate mailing lists and with the appropriate email headers.  This is
> technically doable.

You are trying to solve something that is not a problem.

It does not matter which medium we choose for getting bug reports.

The only thing that matters is that we get bug reports resolved within a 
reasonable amount of time.

> Greetings,
> Rafael

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Rafael J. Wysocki
On Sunday, 25 of November 2007, Adrian Bunk wrote:
> On Sun, Nov 25, 2007 at 11:38:59PM +0100, Rafael J. Wysocki wrote:
> > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > >..
> > > > > First of all, Bugzilla is a quite often used bug tracker in the open 
> > > > > source world [1], so many users already know it.
> > > > > 
> > > > > But more important, "it pretends to require them to spend" isn't true 
> > > > > because there's no pretending - we actually often require bug 
> > > > > reporters 
> > > > > to spend a lot of time on the bug report (e.g. when asking for 
> > > > > bisecting).
> > > > 
> > > > But not *initially*.
> > > > 
> > > > We should not confuse *debugging* with *reporting bugs*.  While the 
> > > > former is
> > > > actually more difficult and more time consuming than writing the code 
> > > > in which
> > > > the bug is present, the latter should be as simple as sending an email.
> > > 
> > > For hardcore geeks like you and me sending an email might be easier than 
> > > using some web interface.
> > > 
> > > Normal humans tend to be more accustomed to web interfaces, and 
> > > following the instructions on some web page is _much_ easier than 
> > > reading three text files for knowing what to write in an email.
> > 
> > Hm, this is a good argument for having such a web interface, but IMO it
> > shouldn't be mandatory.  IOW, there should be a way to report a bug using 
> > plain
> > email, if the reporter prefers that.  We can, however, request that the 
> > address
> > of our bug tracking system be added to the report's Cc list.
> 
> Looking at both other open source projects and the support of commercial 
> software a web interface should be enough.

Well, IMHO the Linux kernel is exceptional in many ways ...

> But this is not the problem - the problem is what happens after the 
> initial report with the bug report.

Not only that.

First, each bug report has to reach the right lists/people and that's what we
can't assure using the Bugzilla alone right now.  To make the Bugzilla
generally useful for that we need to change the way in which the target of the
report is selected and make it send reports to mailing lists rather than to
individual people.

Second, once the bug report have reached the right place, we have two problems
to solve:
(1) we need to make the developers respond and actively work on the bug
(2) we need to make the tracking of the bug possibly unintrusive (ie.
developers should be able to work with the reporter in a way that *they*
prefer)
While it's generally difficult to solve (1), we can at least make (2) happen
(well, in theory).

> > Now, the question is what information this web interface should ask for.
> > 
> > IMO, first, it should ask for what the bug is against, ie.:
> > - kernel version (to be obtained from 'git describe' or from /proc/version 
> > or
> >   from .config, if the kernel doesn't boot)
> > - architecture (x86, ARM, MIPS etc.)
> > - subsystem and subsubsystem (that could be selectable from a menu and might
> >   depend on the architecture)
> > 
> > It also should ask if the problem is a regression and what was the last 
> > known
> > good kernel (I'd prefer that to be the last known major release selectable 
> > from
> > a list).
> > 
> > Also, the reporter should be required to provide a summary (subject) and
> > a (concise) description of the problem and a list of email addresses to
> > send the report to in addition to the regular handling (there should be a 
> > way
> > to verify which addresses are acceptable).
> > 
> > Anything else?
> > 
> > Next, the report should be sent to a mailing list selected on the basis of 
> > the
> > information provided (not necessarily to individual developers, unless there
> > are some addresses provided explicitly by the reporter).
> 
> The architecture choice seems to be the only thing from your list that
> isn't already available in the "Enter a new bug report" dialog of the
> kernel Bugzilla.

Yet, the architecture choice affects the way in which the other choices are
made.  Also, the "sending to mailing lists" part is obviously missing.

> > IMO, it should be possible to work on the bug using both email and the web
> > interface, whichever is preferred by the participant in question, without 
> > the
> > need to stick to any of them (ie. email messages sent in the corresponding
> > email thread should be registered by the bug tracking system and comments
> > entered into it should appear as messages in the email thread with the
> > appropriate To:, From: and Cc: information).
> > 
> > There surely are more things that we'd like it to do, but the above seem to 
> > be
> > a reasonable minimum.
> 
> Except from the From: header in outgoing emails the kernel Bugzilla 
> already offers this for years.

No, it doesn't.  You can't send the initial report by 

Question regarding naming scheme (HP Jornada 6XX/7XX)

2007-11-25 Thread Kristoffer Ericson
Greetings,

Just want some input before I start dropping patches everywhere. A simple ack 
will do nicely if you just agree.

Currently we use the name of the most typical HP Jornada (680 and 720) to mean 
all 6XX/7XX (= 620/660/680/690 and 720/720/728). 
In the past this has led to some confusion when people tried to compile their 
own kernels. 
For instance an hp 620 user thought that their system was unsupported because 
everything was for '680'. Or the other way round
728 users didn't want to use 720 since they thought they would loose their 
extra ram (only difference between versions).

So, I want to instead use the term 600-series or 700-series. This would mean 
changing Kconfig/Makefile and driver name.

For example /drivers/input/keyboard/jornada680_kbd.c would become 
/drivers/input/keyboard/jornada600_kbd.c

The machine name tag would also return (HP Jornada 600-series | HP Jornada 
700-series) since I know for instance opie loves to 
grep the machine line. Currently this is set as "hp6xx" for 600-series and "HP 
Jornada 720" for 700-series. They are related machines so 
it would be nice to unify their output a tad.

Why I want to use 600-series/700-series instead of 6XX/7XX is simply because 
600-series/700-series leaves no doubt.

Any objections?

Best wishes
Kristoffer Ericson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)

2007-11-25 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also, fail any attempt to
try and issue NCQ commands with result taskfile requested, since the hardware
doesn't allow this.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-25 
16:28:58.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-25 
16:31:09.0 -0600
@@ -792,11 +792,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough, which is the only way that switching into
+  ADMA mode could abort outstanding commands. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1379,11 +1381,9 @@
struct nv_adma_port_priv *pp = qc->ap->private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc->tf.flags & ATA_TFLAG_POLLING) ||
-  (qc->flags & ATA_QCFLAG_RESULT_TF))
+  (qc->tf.flags & ATA_TFLAG_POLLING))
return 1;
 
if ((qc->flags & ATA_QCFLAG_DMAMAP) ||
@@ -1401,6 +1401,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
ata_qc_prep(qc);
return;
@@ -1445,9 +1447,21 @@
 
VPRINTK("ENTER\n");
 
+   /* We can't handle result taskfile with NCQ commands, since
+  retrieving the taskfile switches us out of ADMA mode and would abort
+  existing commands. */
+   if (unlikely(qc->tf.protocol == ATA_PROT_NCQ &&
+(qc->flags & ATA_QCFLAG_RESULT_TF))) {
+   ata_dev_printk(qc->dev, KERN_ERR,
+   "NCQ w/ RESULT_TF not allowed\n");
+   return AC_ERR_SYSTEM;
+   }
+
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK("using ATA register mode: 0x%lx\n", qc->flags);
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
return ata_qc_issue_prot(qc);
} else

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/27] ptrace: arch_has_single_step

2007-11-25 Thread Roland McGrath
> Why should arch_has_single_step be a function-like macro?  I can't thing
> of a case were this wouln't be a compile-time constant.  And given that
> this is hopefully a transitionary ifdef because eventually all architectures
> would use the generic code I'd prefer ifdefs in the code that clearly mark
> this as transitional in this case.

I'm not sure it's true that there is no machine where some chips support
single-step and others don't, though I do think it's true that no arch code
has a conditional like this now.  In the case of block-step (in later
patches), is is the case that a run-time check for availability of the
hardware feature comes up (on some x86 configurations).  So a main reason
is to keep the two parallel macros with the same style and semantics.

> > +static inline void user_enable_single_step(struct task_struct *task)
> 
> > +static inline void user_disable_single_step(struct task_struct *task)
> 
> And I don't think these should be provided at all as generic stubs. If
> an arch doesn't use the generic code it simply shouldn't compile the
> code using this.

The code compiles away completely with if (0)'s.  I did it this way to
avoid more #ifdef's in the generic ptrace code.  Previous patch reviews
I've read (including ones from you) have said to use header-defined stubs
in #ifdef and unconditional calls in the code.  Please be explicit in
proposing the specific alternatives you would prefer.

> Whats the reason for the user_ prefix btw, most architectures seems to
> have these functions already anyway, just without the user_ prefix.

The arch's are not consistent now, so I chose a new scheme to harmonize
on.  I think the "set_foo" names are a bit too nonspecific-sounding,
especially given that we do have other things kicking around that use
single-step functionality in kernel mode.  Also, I plan to submit some
more work harmonizing the arch-specific access to the user-mode view of
machine state, and a uniform prefix for the new, reliably coherent,
documented set of internal interfaces just seems like the right thing to
do.  (I don't really care enough to argue about the names for functions.
Anyone who, for some reason I cannot fathom, cares enough to be contrary
about the subject, is welcome to set the standard.)


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Mikael Pettersson
On Sun, 25 Nov 2007 15:02:15 -0700, Josh Goldsmith wrote:
>   I have a Linksys NSLU2 running 2.6.21 (I can replicate the problem on 
> 2.6.23 but it isn't fully supported on SlugOS).  It is a armv5teb device 
> with 32MB of RAM, 400+ MB swap on its 160GB USB2 root disk.  The machine is 
> used as a fileserver and to build packages for other ARM devices.  It may be 
> underpowered by today's standard but is a whole lot faster than my first 
> Linux system (386sx20 with 4MB RAM) but the whole system with disk uses <8 
> watts and is silent.
> 
>   The problem comes when I try to untar a large file (in this case 
> linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
> eventually the oom-killer will appear and kill either the tar or the shell. 
> I've tried every tuning option I and my buddy Google could find including 
> (/proc/sys/vm/overcommit*) with no success.  I'm not worried about paging 
> impacting performance.
> 
>   I'd appreciate any help, pointers, or gentle taps with the cluebat.

I'm no VM tuning expert, but I have and still do heavy compile
jobs on similarly configured machines, with no OOM problems:

I regularly build 2.6 kernels and occasionally also gcc on a
100MHz 486 with 28MB of RAM and perhaps 500MB of swap. It runs
a standard but stripped down Fedora Core 4 user-space, with ext3
file systems and a kernel that doesn't include anything non-essential. 
The machine will swap madly, but the OOM killer never triggers.
(All system settings are FC4 defaults. I haven't touched them.)

In the past I did a fair amount of package rebuilds and test suite
runs on an NSLU2 myself, with a 2.4 Linksys/Openslug kernel, ext3,
and a 1GB or perhaps 2GB swap partition on a disk attached via a
USB2-to-PATA enclosure. Even when swapping heavily the OOM killer
wouldn't trigger.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Update REPORTING-BUGS

2007-11-25 Thread Rafael J. Wysocki
On Sunday, 25 of November 2007, Adrian Bunk wrote:
> On Sun, Nov 25, 2007 at 10:51:14PM +0100, Rafael J. Wysocki wrote:
> > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > > On Sun, Nov 25, 2007 at 09:57:09PM +0100, Rafael J. Wysocki wrote:
[--snip--]
> > > 
> > > How should a newbie find the correct mailing list?
> > 
> > Read MAINTAINERS?  Ok, I should have said about that.
> > 
> > > Benchmark:
> > > Easier than the "some more work" when using Bugzilla.
> > 
> > Nope.  Please try to file a report against libata/PATA using the Bugzilla.
> > Good luck. ;-)
> 
> $ grep PATA MAINTAINERS
> $ 

Too bad (and this is a bug BTW).

> > > >...
> > > > +It also is a good idea to notify the maintainer of the affected 
> > > > subsystem and
> > > > +the maintainer of the tree in which the bug is present by adding their 
> > > > email
> > > > +addresses to the Cc list of the bug report message.  The email 
> > > > addresses of
> > > > +maintainers of the majority of kernel subsystems can be found in the 
> > > > MAINTAINERS
> > > > +file, but you should not worry too much about getting a wrong person.
> > > 
> > > If you don't already know MAINTAINERS well then finding the right 
> > > component in Bugzilla is much easier.
> > 
> > I disagree.  How a newbie is supposed to know what AIO and DIO mean and WTH 
> > is
> > the difference between LVM2/DM and MD? 
> > 
> > I took only the IO/Storage submenu as an example, but there are other things
> > like that.  For instance, what is the difference between "Flash/Memory
> > Technology Devices" and MMC/SD?  Why "Hotplug" is under "Drivers" and WTH
> > does it *mean*?  What "W1" means for that matter??  Etc.
> 
> Then let's get that improved.

OK

Who's supposed to be responsible for that?

[--snip--]
> > > 
> > > Really, we must define _one_ way for people to report a bug, and how 
> > > developers are reminded is _our_ job.
> > 
> > Well, who's "we" in that context?  IOW, who's job exactly it's supposed to 
> > be?
> 
> "we" = "we kernel developers"
> 
> And Natalie seems to be the person being paid for doing such stuff...
> 
> > > >...
> > > > +Generally, the following things are appreciated in a bug report:
> > > >...
> > > 
> > > If you expect people to read and follow this, wouldn't it be easier to
> > > simply point them to open the bug in Bugzilla where we already have a
> > > template asking these questions?
> > 
> > I don't think so and please refer to the examples above.
> > 
> > > You could replace the whole contents of this file with:
> > > Go to http://bugzilla.kernel.org/ and click on "Enter a new bug report".
> > > 
> > > It's a pity that we manage to add/change an average of 100.000 bugs^Wlines
> > > of code each month, but do not have one generally accepted and working 
> > > process for bug reports.
> > 
> > It's a pity that we do not have one, indeed, and so perhaps it's a good idea
> > to try to create one?  Not necessarily focusing on the Bugzilla for a little
> > while. ;-)
> 
> I'm not focussed on Bugzilla.
> 
> But a submitter should send a bug report _once_ through one well-defined 
> medium, this should result in the bug report not being lost, and every 
> other communication of the submitter should be triggered by developers 
> requesting additional information.

I don't think that have to be only *one* medium as long as we're able to track
the bugs (see my last reply in the other thread).

> I don't care whether that's done with Bugzilla, some email based bug 
> tracker like the Debian bug tracker, someone putting emails manually 
> into some bug tracker like you are doing, or whatever else.

That last solution doesn't scale very well ...

How about using the system in which it's possible to report bugs using both
email and a web interface?

We can request that the address of the bug tracker be added to the Cc lists of
bug reports sent by email and we can make it resend reports filed with it to
the appropriate mailing lists and with the appropriate email headers.  This is
technically doable.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Adrian Bunk
On Sun, Nov 25, 2007 at 11:38:59PM +0100, Rafael J. Wysocki wrote:
> On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > >..
> > > > First of all, Bugzilla is a quite often used bug tracker in the open 
> > > > source world [1], so many users already know it.
> > > > 
> > > > But more important, "it pretends to require them to spend" isn't true 
> > > > because there's no pretending - we actually often require bug reporters 
> > > > to spend a lot of time on the bug report (e.g. when asking for 
> > > > bisecting).
> > > 
> > > But not *initially*.
> > > 
> > > We should not confuse *debugging* with *reporting bugs*.  While the 
> > > former is
> > > actually more difficult and more time consuming than writing the code in 
> > > which
> > > the bug is present, the latter should be as simple as sending an email.
> > 
> > For hardcore geeks like you and me sending an email might be easier than 
> > using some web interface.
> > 
> > Normal humans tend to be more accustomed to web interfaces, and 
> > following the instructions on some web page is _much_ easier than 
> > reading three text files for knowing what to write in an email.
> 
> Hm, this is a good argument for having such a web interface, but IMO it
> shouldn't be mandatory.  IOW, there should be a way to report a bug using 
> plain
> email, if the reporter prefers that.  We can, however, request that the 
> address
> of our bug tracking system be added to the report's Cc list.

Looking at both other open source projects and the support of commercial 
software a web interface should be enough.

But this is not the problem - the problem is what happens after the 
initial report with the bug report.

> Now, the question is what information this web interface should ask for.
> 
> IMO, first, it should ask for what the bug is against, ie.:
> - kernel version (to be obtained from 'git describe' or from /proc/version or
>   from .config, if the kernel doesn't boot)
> - architecture (x86, ARM, MIPS etc.)
> - subsystem and subsubsystem (that could be selectable from a menu and might
>   depend on the architecture)
> 
> It also should ask if the problem is a regression and what was the last known
> good kernel (I'd prefer that to be the last known major release selectable 
> from
> a list).
> 
> Also, the reporter should be required to provide a summary (subject) and
> a (concise) description of the problem and a list of email addresses to
> send the report to in addition to the regular handling (there should be a way
> to verify which addresses are acceptable).
> 
> Anything else?
> 
> Next, the report should be sent to a mailing list selected on the basis of the
> information provided (not necessarily to individual developers, unless there
> are some addresses provided explicitly by the reporter).

The architecture choice seems to be the only thing from your list that
isn't already available in the "Enter a new bug report" dialog of the
kernel Bugzilla.

> IMO, it should be possible to work on the bug using both email and the web
> interface, whichever is preferred by the participant in question, without the
> need to stick to any of them (ie. email messages sent in the corresponding
> email thread should be registered by the bug tracking system and comments
> entered into it should appear as messages in the email thread with the
> appropriate To:, From: and Cc: information).
> 
> There surely are more things that we'd like it to do, but the above seem to be
> a reasonable minimum.

Except from the From: header in outgoing emails the kernel Bugzilla 
already offers this for years.

>...
> Greetings,
> Rafael

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/27] ptrace: generic resume

2007-11-25 Thread Roland McGrath
> Could we by any chance just force every architecture using generic code
> to implement PTRACE_SINGLESTEP and PTRACE_SYSEMU?  This will lead to
> both far less messy code and a more consistant user interface.

I'd like to look into that later after most arch's have moved to using the
generic code for their existing support.  I am thoroughly in favor, but it
requires some more groundwork that can come after this initial stage.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Rafael J. Wysocki
On Sunday, 25 of November 2007, Rafael J. Wysocki wrote:
> On Sunday, 25 of November 2007, Adrian Bunk wrote:
> > On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > > On Sunday, 25 of November 2007, Adrian Bunk wrote:
[--snip--] 
> > Even worse:
> > Different people have different opinions what they need and what they 
> > don't want...
> 
> Let's collect these opitions, then, and try to find a solution that would
> satisfy all of them or at least the majority of them.

s/opitions/opinions/

Sorry.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/27] ptrace: arch_has_single_step

2007-11-25 Thread Christoph Hellwig
On Sun, Nov 25, 2007 at 01:55:07PM -0800, Roland McGrath wrote:
> This defines the new macro arch_has_single_step() in linux/ptrace.h, a
> default for when asm/ptrace.h does not define it.  It declares the new
> user_enable_single_step and user_disable_single_step functions.
> This is not used yet, but paves the way to harmonize on this interface
> for the arch-specific calls on all machines.

Why should arch_has_single_step be a function-like macro?  I can't thing
of a case were this wouln't be a compile-time constant.  And given that
this is hopefully a transitionary ifdef because eventually all architectures
would use the generic code I'd prefer ifdefs in the code that clearly mark
this as transitional in this case.

> +static inline void user_enable_single_step(struct task_struct *task)

> +static inline void user_disable_single_step(struct task_struct *task)

And I don't think these should be provided at all as generic stubs. If
an arch doesn't use the generic code it simply shouldn't compile the
code using this.

Whats the reason for the user_ prefix btw, most architectures seems to
have these functions already anyway, just without the user_ prefix.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/27] ptrace: generic resume

2007-11-25 Thread Christoph Hellwig
On Sun, Nov 25, 2007 at 02:01:09PM -0800, Roland McGrath wrote:
> This makes ptrace_request handle all the ptrace requests that wake
> up the traced task.  These do low-level ptrace implementation magic
> that is not arch-specific and should be kept out of arch code.  The
> implementations on each arch usually do the same thing.  The new
> generic code makes use of the arch_has_single_step macro and generic
> entry points to handle PTRACE_SINGLESTEP.

Nice, I've been trying to get people to move this to common code for
a while :)


> +#ifdef PTRACE_SINGLESTEP
> +#define is_singlestep(request)   ((request) == PTRACE_SINGLESTEP)
> +#else
> +#define is_singlestep(request)   0
> +#endif
> +
> +#ifdef PTRACE_SYSEMU
> +#define is_sysemu_singlestep(request)((request) == 
> PTRACE_SYSEMU_SINGLESTEP)
> +#else
> +#define is_sysemu_singlestep(request)0
> +#endif

Could we by any chance just force every architecture using generic code
to implement PTRACE_SINGLESTEP and PTRACE_SYSEMU?  This will lead to
both far less messy code and a more consistant user interface.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bugzilla is FPOS (was: Re: "buggy cmd640" message followed by soft lockup)

2007-11-25 Thread Rafael J. Wysocki
On Sunday, 25 of November 2007, Adrian Bunk wrote:
> On Sun, Nov 25, 2007 at 10:28:06PM +0100, Rafael J. Wysocki wrote:
> > On Sunday, 25 of November 2007, Adrian Bunk wrote:
> >..
> > > First of all, Bugzilla is a quite often used bug tracker in the open 
> > > source world [1], so many users already know it.
> > > 
> > > But more important, "it pretends to require them to spend" isn't true 
> > > because there's no pretending - we actually often require bug reporters 
> > > to spend a lot of time on the bug report (e.g. when asking for 
> > > bisecting).
> > 
> > But not *initially*.
> > 
> > We should not confuse *debugging* with *reporting bugs*.  While the former 
> > is
> > actually more difficult and more time consuming than writing the code in 
> > which
> > the bug is present, the latter should be as simple as sending an email.
> 
> For hardcore geeks like you and me sending an email might be easier than 
> using some web interface.
> 
> Normal humans tend to be more accustomed to web interfaces, and 
> following the instructions on some web page is _much_ easier than 
> reading three text files for knowing what to write in an email.

Hm, this is a good argument for having such a web interface, but IMO it
shouldn't be mandatory.  IOW, there should be a way to report a bug using plain
email, if the reporter prefers that.  We can, however, request that the address
of our bug tracking system be added to the report's Cc list.

Now, the question is what information this web interface should ask for.

IMO, first, it should ask for what the bug is against, ie.:
- kernel version (to be obtained from 'git describe' or from /proc/version or
  from .config, if the kernel doesn't boot)
- architecture (x86, ARM, MIPS etc.)
- subsystem and subsubsystem (that could be selectable from a menu and might
  depend on the architecture)

It also should ask if the problem is a regression and what was the last known
good kernel (I'd prefer that to be the last known major release selectable from
a list).

Also, the reporter should be required to provide a summary (subject) and
a (concise) description of the problem and a list of email addresses to
send the report to in addition to the regular handling (there should be a way
to verify which addresses are acceptable).

Anything else?

Next, the report should be sent to a mailing list selected on the basis of the
information provided (not necessarily to individual developers, unless there
are some addresses provided explicitly by the reporter).

IMO, it should be possible to work on the bug using both email and the web
interface, whichever is preferred by the participant in question, without the
need to stick to any of them (ie. email messages sent in the corresponding
email thread should be registered by the bug tracking system and comments
entered into it should appear as messages in the email thread with the
appropriate To:, From: and Cc: information).

There surely are more things that we'd like it to do, but the above seem to be
a reasonable minimum.

> > > I'm also sometimes writing bug reports in different areas, and in my 
> > > experience it doesn't matter whether it's web-based Bugzilla, the 
> > > email-based Debian bug tracker or whatever else system - the time spent 
> > > on a good bug report is not spend on pasting the text whereever or on 
> > > clicking on a few boxes, the time is spent on tracking the issue down 
> > > and writing a good bug report.
> > 
> > Apparently, you are expecting the reporters do *debug* problems, while they 
> > need
> > not be aware of how to do that.
> > 
> > IMHO, we should make reporting problems as simple as reasonably possible and
> 
> Agreed, and as said above simple = web interface.
> 
> >...
> > > What matters for a bug reporter is to get a solution for his problem 
> > > within a reasonable amount of time.
> > 
> > Still, it's annoying if you attach tons of information to the report and 
> > that
> > information does not turn out to be useful.
> 
> Agreed.
> 
> > > > Also, some developers do not consider the Bugzilla as a useful thing and
> > > > wouldn't like to use it (which is why this thread has appeared, among 
> > > > other
> > > > things ;-)).
> > > >...
> > > 
> > > And that's part of the problem.
> > > 
> > > Bugzilla is a usable tool, but it isn't the only tool available.
> > > 
> > > If there was one tool all developers would be willing to use that would 
> > > be a reason why we should switch to whatever tool this is.
> > 
> > The choice of the tool should be a result of the choice of a *method*.  IOW,
> > we have to know our needs and choose the tool that satisfies them or write 
> > one
> > if it doesn't exist.
> > 
> > For now, IMHO, we don't really know what we need.
> 
> Even worse:
> Different people have different opinions what they need and what they 
> don't want...

Let's collect these opitions, then, and try to find a solution that would
satisfy all of them or at least the majority of them.


[patch 3/4] Timerfd v3 - wire the new timerfd API to the x86 family

2007-11-25 Thread Davide Libenzi
Wires up the new timerfd API to the x86 family.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 arch/x86/ia32/ia32entry.S  |4 +++-
 arch/x86/kernel/syscall_table_32.S |4 +++-
 include/asm-x86/unistd_32.h|6 --
 include/asm-x86/unistd_64.h|9 +++--
 4 files changed, 17 insertions(+), 6 deletions(-)

Index: linux-2.6.mod/include/asm-x86/unistd_32.h
===
--- linux-2.6.mod.orig/include/asm-x86/unistd_32.h  2007-11-23 
13:55:15.0 -0800
+++ linux-2.6.mod/include/asm-x86/unistd_32.h   2007-11-24 12:49:28.0 
-0800
@@ -327,13 +327,15 @@
 #define __NR_epoll_pwait   319
 #define __NR_utimensat 320
 #define __NR_signalfd  321
-#define __NR_timerfd   322
+#define __NR_timerfd_create322
 #define __NR_eventfd   323
 #define __NR_fallocate 324
+#define __NR_timerfd_settime   325
+#define __NR_timerfd_gettime   326
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 325
+#define NR_syscalls 327
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.mod/include/asm-x86/unistd_64.h
===
--- linux-2.6.mod.orig/include/asm-x86/unistd_64.h  2007-11-23 
13:55:15.0 -0800
+++ linux-2.6.mod/include/asm-x86/unistd_64.h   2007-11-24 12:49:28.0 
-0800
@@ -629,12 +629,17 @@
 __SYSCALL(__NR_epoll_pwait, sys_epoll_pwait)
 #define __NR_signalfd  282
 __SYSCALL(__NR_signalfd, sys_signalfd)
-#define __NR_timerfd   283
-__SYSCALL(__NR_timerfd, sys_timerfd)
+#define __NR_timerfd_create283
+__SYSCALL(__NR_timerfd_create, sys_timerfd_create)
 #define __NR_eventfd   284
 __SYSCALL(__NR_eventfd, sys_eventfd)
 #define __NR_fallocate 285
 __SYSCALL(__NR_fallocate, sys_fallocate)
+#define __NR_timerfd_settime   286
+__SYSCALL(__NR_timerfd_settime, sys_timerfd_settime)
+#define __NR_timerfd_gettime   287
+__SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime)
+
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.mod/arch/x86/kernel/syscall_table_32.S
===
--- linux-2.6.mod.orig/arch/x86/kernel/syscall_table_32.S   2007-11-23 
13:55:16.0 -0800
+++ linux-2.6.mod/arch/x86/kernel/syscall_table_32.S2007-11-24 
12:49:28.0 -0800
@@ -321,6 +321,8 @@
.long sys_epoll_pwait
.long sys_utimensat /* 320 */
.long sys_signalfd
-   .long sys_timerfd
+   .long sys_timerfd_create
.long sys_eventfd
.long sys_fallocate
+   .long sys_timerfd_settime   /* 325 */
+   .long sys_timerfd_gettime
Index: linux-2.6.mod/arch/x86/ia32/ia32entry.S
===
--- linux-2.6.mod.orig/arch/x86/ia32/ia32entry.S2007-11-23 
13:55:16.0 -0800
+++ linux-2.6.mod/arch/x86/ia32/ia32entry.S 2007-11-24 12:49:28.0 
-0800
@@ -723,7 +723,9 @@
.quad sys_epoll_pwait
.quad compat_sys_utimensat  /* 320 */
.quad compat_sys_signalfd
-   .quad compat_sys_timerfd
+   .quad sys_timerfd_create
.quad sys_eventfd
.quad sys32_fallocate
+   .quad compat_sys_timerfd_settime/* 325 */
+   .quad compat_sys_timerfd_gettime
 ia32_syscall_end:

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/4] Timerfd v3 - un-break CONFIG_TIMERFD

2007-11-25 Thread Davide Libenzi
Remove the broken status to CONFIG_TIMERFD.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 init/Kconfig |1 -
 1 file changed, 1 deletion(-)

Index: linux-2.6.mod/init/Kconfig
===
--- linux-2.6.mod.orig/init/Kconfig 2007-11-23 13:55:15.0 -0800
+++ linux-2.6.mod/init/Kconfig  2007-11-24 12:49:30.0 -0800
@@ -566,7 +566,6 @@
 config TIMERFD
bool "Enable timerfd() system call" if EMBEDDED
select ANON_INODES
-   depends on BROKEN
default y
help
  Enable the timerfd() system call that allows to receive timer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] Timerfd v3 - introduce a new hrtimer_forward_now() function

2007-11-25 Thread Davide Libenzi
I think that advancing the timer against the timer's current "now" can
be a pretty common usage, so, w/out exposing hrtimer's internals, we add
a new hrtimer_forward_now() function.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 include/linux/hrtimer.h |7 +++
 1 file changed, 7 insertions(+)

Index: linux-2.6.mod/include/linux/hrtimer.h
===
--- linux-2.6.mod.orig/include/linux/hrtimer.h  2007-11-23 13:55:16.0 
-0800
+++ linux-2.6.mod/include/linux/hrtimer.h   2007-11-24 12:48:05.0 
-0800
@@ -298,6 +298,13 @@
 extern unsigned long
 hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval);
 
+/* Forward a hrtimer so it expires after the hrtimer's current now */
+static inline unsigned long hrtimer_forward_now(struct hrtimer *timer,
+   ktime_t interval)
+{
+   return hrtimer_forward(timer, timer->base->get_time(), interval);
+}
+
 /* Precise sleep: */
 extern long hrtimer_nanosleep(struct timespec *rqtp,
  struct timespec *rmtp,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] Timerfd v3 - new timerfd API

2007-11-25 Thread Davide Libenzi
This is the new timerfd API as it is implemented by the following patch:

int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);

The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not 
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or {0, 
0}
if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are supported
(with the same interface).
Here's a simple test program I used to exercise the new timerfd APIs:

http://www.xmailserver.org/timerfd-test2.c



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 fs/compat.c  |   32 ++-
 fs/timerfd.c |  199 ++-
 include/linux/compat.h   |7 +
 include/linux/syscalls.h |7 +
 4 files changed, 166 insertions(+), 79 deletions(-)

Index: linux-2.6.mod/fs/timerfd.c
===
--- linux-2.6.mod.orig/fs/timerfd.c 2007-11-23 13:55:16.0 -0800
+++ linux-2.6.mod/fs/timerfd.c  2007-11-24 12:49:21.0 -0800
@@ -25,13 +25,15 @@
struct hrtimer tmr;
ktime_t tintv;
wait_queue_head_t wqh;
+   u64 ticks;
int expired;
+   int clockid;
 };
 
 /*
  * This gets called when the timer event triggers. We set the "expired"
  * flag, but we do not re-arm the timer (in case it's necessary,
- * tintv.tv64 != 0) until the timer is read.
+ * tintv.tv64 != 0) until the timer is accessed.
  */
 static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
 {
@@ -40,13 +42,14 @@
 
spin_lock_irqsave(>wqh.lock, flags);
ctx->expired = 1;
+   ctx->ticks++;
wake_up_locked(>wqh);
spin_unlock_irqrestore(>wqh.lock, flags);
 
return HRTIMER_NORESTART;
 }
 
-static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
+static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
  const struct itimerspec *ktmr)
 {
enum hrtimer_mode htmode;
@@ -57,8 +60,9 @@
 
texp = timespec_to_ktime(ktmr->it_value);
ctx->expired = 0;
+   ctx->ticks = 0;
ctx->tintv = timespec_to_ktime(ktmr->it_interval);
-   hrtimer_init(>tmr, clockid, htmode);
+   hrtimer_init(>tmr, ctx->clockid, htmode);
ctx->tmr.expires = texp;
ctx->tmr.function = timerfd_tmrproc;
if (texp.tv64 != 0)
@@ -83,7 +87,7 @@
poll_wait(file, >wqh, wait);
 
spin_lock_irqsave(>wqh.lock, flags);
-   if (ctx->expired)
+   if (ctx->ticks)
events |= POLLIN;
spin_unlock_irqrestore(>wqh.lock, flags);
 
@@ -102,11 +106,11 @@
return -EINVAL;
spin_lock_irq(>wqh.lock);
res = -EAGAIN;
-   if (!ctx->expired && !(file->f_flags & O_NONBLOCK)) {
+   if (!ctx->ticks && !(file->f_flags & O_NONBLOCK)) {
__add_wait_queue(>wqh, );
for (res = 0;;) {
set_current_state(TASK_INTERRUPTIBLE);
-   if (ctx->expired) {
+   if (ctx->ticks) {
res = 0;
break;
}
@@ -121,22 +125,21 @@
__remove_wait_queue(>wqh, );
__set_current_state(TASK_RUNNING);
}
-   if (ctx->expired) {
-   ctx->expired = 0;
-   if (ctx->tintv.tv64 != 0) {
+   if (ctx->ticks) {
+   ticks = ctx->ticks;
+   if (ctx->expired && ctx->tintv.tv64) {
/*
 * If tintv.tv64 != 0, this is a periodic timer that
 * needs to be re-armed. We avoid doing it in the timer
 * callback to avoid DoS attacks specifying a very
 * short timer period.
 */
-   ticks = (u64)
-   hrtimer_forward(>tmr,
-   hrtimer_cb_get_time(>tmr),
-   ctx->tintv);
+   ticks += (u64) hrtimer_forward_now(>tmr,
+  ctx->tintv) - 1;
hrtimer_restart(>tmr);
-   } else
-   ticks = 1;
+   }
+   ctx->expired = 0;

[PATCH 27/27] x86: PTRACE_SINGLEBLOCK

2007-11-25 Thread Roland McGrath

This adds the PTRACE_SINGLEBLOCK request on x86, matching the ia64 feature.
The implementation comes from the generic ptrace code and relies on the
low-level machine support provided by arch_has_block_step() et al.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/ia32/ptrace32.c |1 +
 include/asm-x86/ptrace-abi.h |2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ptrace32.c b/arch/x86/ia32/ptrace32.c
index 5661abd..d1fe78c 100644
--- a/arch/x86/ia32/ptrace32.c
+++ b/arch/x86/ia32/ptrace32.c
@@ -212,6 +212,7 @@ asmlinkage long sys32_ptrace(long request, u32 pid, u32 
addr, u32 data)
case PTRACE_KILL:
case PTRACE_CONT:
case PTRACE_SINGLESTEP:
+   case PTRACE_SINGLEBLOCK:
case PTRACE_DETACH:
case PTRACE_SYSCALL:
case PTRACE_OLDSETOPTIONS:
diff --git a/include/asm-x86/ptrace-abi.h b/include/asm-x86/ptrace-abi.h
index 7524e12..adce6b5 100644
--- a/include/asm-x86/ptrace-abi.h
+++ b/include/asm-x86/ptrace-abi.h
@@ -78,4 +78,6 @@
 # define PTRACE_SYSEMU_SINGLESTEP 32
 #endif
 
+#define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */
+
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/27] x86: debugctlmsr arch_has_block_step

2007-11-25 Thread Roland McGrath

This implements user-mode step-until-branch on x86 using the BTF bit
in MSR_IA32_DEBUGCTLMSR.  It's just like single-step, only less so.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/step.c |   64 +--
 arch/x86/kernel/traps_32.c |6 
 arch/x86/kernel/traps_64.c |6 
 include/asm-x86/ptrace.h   |7 +
 4 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 243bff6..cf4b9da 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -107,7 +107,10 @@ static int is_setting_trap_flag(struct task_struct *child, 
struct pt_regs *regs)
return 0;
 }
 
-void user_enable_single_step(struct task_struct *child)
+/*
+ * Enable single-stepping.  Return nonzero if user mode is not using TF itself.
+ */
+static int enable_single_step(struct task_struct *child)
 {
struct pt_regs *regs = task_pt_regs(child);
 
@@ -122,7 +125,7 @@ void user_enable_single_step(struct task_struct *child)
 * If TF was already set, don't do anything else
 */
if (regs->eflags & X86_EFLAGS_TF)
-   return;
+   return 0;
 
/* Set TF on the kernel stack.. */
regs->eflags |= X86_EFLAGS_TF;
@@ -133,13 +136,68 @@ void user_enable_single_step(struct task_struct *child)
 * won't clear it by hand later.
 */
if (is_setting_trap_flag(child, regs))
-   return;
+   return 0;
 
set_tsk_thread_flag(child, TIF_FORCED_TF);
+
+   return 1;
+}
+
+/*
+ * Install this value in MSR_IA32_DEBUGCTLMSR whenever child is running.
+ */
+static void write_debugctlmsr(struct task_struct *child, unsigned long val)
+{
+   child->thread.debugctlmsr = val;
+
+   if (child != current)
+   return;
+
+#ifdef CONFIG_X86_64
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
+#else
+   wrmsr(MSR_IA32_DEBUGCTLMSR, val, 0);
+#endif
+}
+
+/*
+ * Enable single or block step.
+ */
+static void enable_step(struct task_struct *child, bool block)
+{
+   /*
+* Make sure block stepping (BTF) is not enabled unless it should be.
+* Note that we don't try to worry about any is_setting_trap_flag()
+* instructions after the first when using block stepping.
+* So noone should try to use debugger block stepping in a program
+* that uses user-mode single stepping itself.
+*/
+   if (enable_single_step(child) && block) {
+   set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
+   write_debugctlmsr(child, DEBUGCTLMSR_BTF);
+   } else if (test_and_clear_tsk_thread_flag(child, TIF_DEBUGCTLMSR)) {
+   write_debugctlmsr(child, 0);
+   }
+}
+
+void user_enable_single_step(struct task_struct *child)
+{
+   enable_step(child, 0);
+}
+
+void user_enable_block_step(struct task_struct *child)
+{
+   enable_step(child, 1);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
+   /*
+* Make sure block stepping (BTF) is disabled.
+*/
+   if (test_and_clear_tsk_thread_flag(child, TIF_DEBUGCTLMSR))
+   write_debugctlmsr(child, 0);
+
/* Always clear TIF_SINGLESTEP... */
clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 
diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
index 298d13e..03d5b41 100644
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -830,6 +830,12 @@ fastcall void __kprobes do_debug(struct pt_regs * regs, 
long error_code)
 
get_debugreg(condition, 6);
 
+   /*
+* The processor cleared BTF, so don't mark that we need it set.
+*/
+   clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
+   tsk->thread.debugctlmsr = 0;
+
if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
SIGTRAP) == NOTIFY_STOP)
return;
diff --git a/arch/x86/kernel/traps_64.c b/arch/x86/kernel/traps_64.c
index daf35a8..ec70f5c 100644
--- a/arch/x86/kernel/traps_64.c
+++ b/arch/x86/kernel/traps_64.c
@@ -848,6 +848,12 @@ asmlinkage void __kprobes do_debug(struct pt_regs * regs,
 
get_debugreg(condition, 6);
 
+   /*
+* The processor cleared BTF, so don't mark that we need it set.
+*/
+   clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
+   tsk->thread.debugctlmsr = 0;
+
if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
SIGTRAP) == NOTIFY_STOP)
return;
diff --git a/include/asm-x86/ptrace.h b/include/asm-x86/ptrace.h
index d223dec..04204f3 100644
--- a/include/asm-x86/ptrace.h
+++ b/include/asm-x86/ptrace.h
@@ -150,6 +150,13 @@ enum {
 extern void user_enable_single_step(struct task_struct *);
 extern void user_disable_single_step(struct task_struct *);
 
+extern void user_enable_block_step(struct task_struct *);

[PATCH 24/27] x86: debugctlmsr context switch

2007-11-25 Thread Roland McGrath

This adds low-level support for a per-thread value of MSR_IA32_DEBUGCTLMSR.
The per-thread value is switched in when TIF_DEBUGCTLMSR is set.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |6 +-
 arch/x86/kernel/process_64.c |3 +++
 include/asm-x86/processor_32.h   |2 ++
 include/asm-x86/processor_64.h   |2 ++
 include/asm-x86/thread_info_32.h |6 --
 include/asm-x86/thread_info_64.h |4 +++-
 6 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index f59544e..3a822e3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -581,10 +581,14 @@ static noinline void
 __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 struct tss_struct *tss)
 {
-   struct thread_struct *next;
+   struct thread_struct *prev, *next;
 
+   prev = _p->thread;
next = _p->thread;
 
+   if (next->debugctlmsr != prev->debugctlmsr)
+   wrmsr(MSR_IA32_DEBUGCTLMSR, next->debugctlmsr, 0);
+
if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
set_debugreg(next->debugreg[0], 0);
set_debugreg(next->debugreg[1], 1);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 586f88e..c1e2e9a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -544,6 +544,9 @@ static inline void __switch_to_xtra(struct task_struct 
*prev_p,
prev = _p->thread,
next = _p->thread;
 
+   if (next->debugctlmsr != prev->debugctlmsr)
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, next->debugctlmsr);
+
if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
loaddebug(next, 0);
loaddebug(next, 1);
diff --git a/include/asm-x86/processor_32.h b/include/asm-x86/processor_32.h
index 34e8063..660d9b0 100644
--- a/include/asm-x86/processor_32.h
+++ b/include/asm-x86/processor_32.h
@@ -370,6 +370,8 @@ struct thread_struct {
unsigned long   iopl;
 /* max allowed port in the bitmap, in bytes: */
unsigned long   io_bitmap_max;
+/* MSR_IA32_DEBUGCTLMSR value to switch in if TIF_DEBUGCTLMSR is set.  */
+   unsigned long   debugctlmsr;
 };
 
 #define INIT_THREAD  { \
diff --git a/include/asm-x86/processor_64.h b/include/asm-x86/processor_64.h
index 2dd739a..1d6daa0 100644
--- a/include/asm-x86/processor_64.h
+++ b/include/asm-x86/processor_64.h
@@ -239,6 +239,8 @@ struct thread_struct {
int ioperm;
unsigned long   *io_bitmap_ptr;
unsigned io_bitmap_max;
+/* MSR_IA32_DEBUGCTLMSR value to switch in if TIF_DEBUGCTLMSR is set.  */
+   unsigned long   debugctlmsr;
 /* cached TLS descriptors. */
u64 tls_array[GDT_ENTRY_TLS_ENTRIES];
 } __attribute__((aligned(16)));
diff --git a/include/asm-x86/thread_info_32.h b/include/asm-x86/thread_info_32.h
index 8a6483f..d5ae1e9 100644
--- a/include/asm-x86/thread_info_32.h
+++ b/include/asm-x86/thread_info_32.h
@@ -138,6 +138,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_FREEZE 19  /* is freezing for suspend */
 #define TIF_NOTSC  20  /* TSC is not accessible in userland */
 #define TIF_FORCED_TF  21  /* true if TF in eflags artificially */
+#define TIF_DEBUGCTLMSR22  /* uses 
thread_struct.debugctlmsr */
 
 #define _TIF_SYSCALL_TRACE (1

[PATCH 26/27] x86: debugctlmsr kprobes

2007-11-25 Thread Roland McGrath

This adjusts the x86 kprobes implementation to cope with per-thread
MSR_IA32_DEBUGCTLMSR being set for user mode.  I haven't delved deep
enough into the kprobes code to be really sure this covers all the
cases where the user-mode BTF setting needs to be cleared or restored.
It looks about right to me.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/kprobes_32.c |   15 +++
 arch/x86/kernel/kprobes_64.c |   15 +++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes_32.c b/arch/x86/kernel/kprobes_32.c
index d87a523..f151f06 100644
--- a/arch/x86/kernel/kprobes_32.c
+++ b/arch/x86/kernel/kprobes_32.c
@@ -217,8 +217,21 @@ static void __kprobes set_current_kprobe(struct kprobe *p, 
struct pt_regs *regs,
kcb->kprobe_saved_eflags &= ~IF_MASK;
 }
 
+static __always_inline void clear_btf(void)
+{
+   if (test_thread_flag(TIF_DEBUGCTLMSR))
+   wrmsr(MSR_IA32_DEBUGCTLMSR, 0, 0);
+}
+
+static __always_inline void restore_btf(void)
+{
+   if (test_thread_flag(TIF_DEBUGCTLMSR))
+   wrmsr(MSR_IA32_DEBUGCTLMSR, current->thread.debugctlmsr, 0);
+}
+
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
 {
+   clear_btf();
regs->eflags |= TF_MASK;
regs->eflags &= ~IF_MASK;
/*single step inline if the instruction is an int3*/
@@ -542,6 +555,8 @@ static void __kprobes resume_execution(struct kprobe *p,
regs->eip = orig_eip + (regs->eip - copy_eip);
 
 no_change:
+   restore_btf();
+
return;
 }
 
diff --git a/arch/x86/kernel/kprobes_64.c b/arch/x86/kernel/kprobes_64.c
index 3db3611..d3be418 100644
--- a/arch/x86/kernel/kprobes_64.c
+++ b/arch/x86/kernel/kprobes_64.c
@@ -256,8 +256,21 @@ static void __kprobes set_current_kprobe(struct kprobe *p, 
struct pt_regs *regs,
kcb->kprobe_saved_rflags &= ~IF_MASK;
 }
 
+static __always_inline void clear_btf(void)
+{
+   if (test_thread_flag(TIF_DEBUGCTLMSR))
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, 0);
+}
+
+static __always_inline void restore_btf(void)
+{
+   if (test_thread_flag(TIF_DEBUGCTLMSR))
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, current->thread.debugctlmsr);
+}
+
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
 {
+   clear_btf();
regs->eflags |= TF_MASK;
regs->eflags &= ~IF_MASK;
/*single step inline if the instruction is an int3*/
@@ -534,6 +547,8 @@ static void __kprobes resume_execution(struct kprobe *p,
} else {
regs->rip = orig_rip + (regs->rip - copy_rip);
}
+
+   restore_btf();
 }
 
 int __kprobes post_kprobe_handler(struct pt_regs *regs)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/27] x86: debugctlmsr constants

2007-11-25 Thread Roland McGrath

This adds constant macros for a few of the bits in MSR_IA32_DEBUGCTLMSR.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 include/asm-x86/msr-index.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/msr-index.h b/include/asm-x86/msr-index.h
index a494473..4045bbe 100644
--- a/include/asm-x86/msr-index.h
+++ b/include/asm-x86/msr-index.h
@@ -63,6 +63,13 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+/* DEBUGCTLMSR bits (others vary by model): */
+#define _DEBUGCTLMSR_LBR   0 /* last branch recording */
+#define _DEBUGCTLMSR_BTF   1 /* single-step on branches */
+
+#define DEBUGCTLMSR_LBR(1UL << _DEBUGCTLMSR_LBR)
+#define DEBUGCTLMSR_BTF(1UL << _DEBUGCTLMSR_BTF)
+
 #define MSR_IA32_MC0_CTL   0x0400
 #define MSR_IA32_MC0_STATUS0x0401
 #define MSR_IA32_MC0_ADDR  0x0402
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/27] x86: debugctlmsr kconfig

2007-11-25 Thread Roland McGrath

This adds the (internal) Kconfig macro CONFIG_X86_DEBUGCTLMSR,
to be defined when configuring to support only hardware that
definitely supports MSR_IA32_DEBUGCTLMSR with the BTF flag.

The Intel documentation says "P6 family" and later processors all have it.
I think the Kconfig dependencies are right to have it set for those and
unset for others (i.e., when 586 and earlier are supported).

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/Kconfig.cpu |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index c301622..69e2ee4 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -399,3 +399,7 @@ config X86_MINIMUM_CPU_FAMILY
default "4" if X86_32 && (X86_XADD || X86_CMPXCHG || X86_BSWAP || 
X86_WP_WORKS_OK)
default "3"
 
+config X86_DEBUGCTLMSR
+   bool
+   depends on !(M586MMX || M586TSC || M586 || M486 || M386)
+   default y
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/27] ptrace: arch_has_block_step

2007-11-25 Thread Roland McGrath

This defines the new macro arch_has_block_step() in linux/ptrace.h, a
default for when asm/ptrace.h does not define it.  This is the analog
of arch_has_single_step() for step-until-branch on machines that have
it.  It declares the new user_enable_block_step function, which goes
with the existing user_enable_single_step and user_disable_single_step.
This is not used yet, but paves the way to harmonize on this interface
for the arch-specific calls on all machines.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 include/linux/ptrace.h |   37 +
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index a6effc8..dd8f751 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -154,7 +154,8 @@ int generic_ptrace_pokedata(struct task_struct *tsk, long 
addr, long data);
  *
  * This can only be called when arch_has_single_step() has returned nonzero.
  * Set @task so that when it returns to user mode, it will trap after the
- * next single instruction executes.
+ * next single instruction executes.  If arch_has_block_step() is defined,
+ * this must clear the effects of user_enable_block_step() too.
  */
 static inline void user_enable_single_step(struct task_struct *task)
 {
@@ -165,15 +166,43 @@ static inline void user_enable_single_step(struct 
task_struct *task)
  * user_disable_single_step - cancel user-mode single-step
  * @task: either current or a task stopped in %TASK_TRACED
  *
- * Clear @task of the effects of user_enable_single_step().  This can
- * be called whether or not user_enable_single_step() was ever called
- * on @task, and even if arch_has_single_step() returned zero.
+ * Clear @task of the effects of user_enable_single_step() and
+ * user_enable_block_step().  This can be called whether or not either
+ * of those was ever called on @task, and even if arch_has_single_step()
+ * returned zero.
  */
 static inline void user_disable_single_step(struct task_struct *task)
 {
 }
 #endif /* arch_has_single_step */
 
+#ifndef arch_has_block_step
+/**
+ * arch_has_block_step - does this CPU support user-mode block-step?
+ *
+ * If this is defined, then there must be a function declaration or inline
+ * for user_enable_block_step(), and arch_has_single_step() must be defined
+ * too.  arch_has_block_step() should evaluate to nonzero iff the machine
+ * supports step-until-branch for user mode.  It can be a constant or it
+ * can test a CPU feature bit.
+ */
+#define arch_has_single_step() (0)
+
+/**
+ * user_enable_block_step - step until branch in user-mode task
+ * @task: either current or a task stopped in %TASK_TRACED
+ *
+ * This can only be called when arch_has_block_step() has returned nonzero,
+ * and will never be called when single-instruction stepping is being used.
+ * Set @task so that when it returns to user mode, it will trap after the
+ * next branch or trap taken.
+ */
+static inline void user_enable_block_step(struct task_struct *task)
+{
+   BUG();  /* This can never be called.  */
+}
+#endif /* arch_has_block_step */
+
 #endif
 
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/27] ptrace: generic PTRACE_SINGLEBLOCK

2007-11-25 Thread Roland McGrath

This makes ptrace_request handle PTRACE_SINGLEBLOCK along with
PTRACE_CONT et al.  The new generic code makes use of the
arch_has_block_step macro and generic entry points on machines
that define them.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 kernel/ptrace.c |   15 ++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 309796a..2824726 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -373,6 +373,12 @@ static int ptrace_setsiginfo(struct task_struct *child, 
siginfo_t __user * data)
 #define is_singlestep(request) 0
 #endif
 
+#ifdef PTRACE_SINGLEBLOCK
+#define is_singleblock(request)((request) == 
PTRACE_SINGLEBLOCK)
+#else
+#define is_singleblock(request)0
+#endif
+
 #ifdef PTRACE_SYSEMU
 #define is_sysemu_singlestep(request)  ((request) == PTRACE_SYSEMU_SINGLESTEP)
 #else
@@ -396,7 +402,11 @@ static int ptrace_resume(struct task_struct *child, long 
request, long data)
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
 #endif
 
-   if (is_singlestep(request) || is_sysemu_singlestep(request)) {
+   if (is_singleblock(request)) {
+   if (unlikely(!arch_has_block_step()))
+   return -EIO;
+   user_enable_block_step(child);
+   } else if (is_singlestep(request) || is_sysemu_singlestep(request)) {
if (unlikely(!arch_has_single_step()))
return -EIO;
user_enable_single_step(child);
@@ -438,6 +448,9 @@ int ptrace_request(struct task_struct *child, long request,
 #ifdef PTRACE_SINGLESTEP
case PTRACE_SINGLESTEP:
 #endif
+#ifdef PTRACE_SINGLEBLOCK
+   case PTRACE_SINGLEBLOCK:
+#endif
 #ifdef PTRACE_SYSEMU
case PTRACE_SYSEMU:
case PTRACE_SYSEMU_SINGLESTEP:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/27] x86-64 ptrace debugreg cleanup

2007-11-25 Thread Roland McGrath

This cleans up the 64-bit ptrace code to separate the guts of the
debug register access from the implementation of PTRACE_PEEKUSR and
PTRACE_POKEUSR.  The new functions ptrace_[gs]et_debugreg are made
global so that the ia32 code can later be changed to call them too.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_64.c |  140 ---
 include/asm-x86/ptrace.h|3 +
 2 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index 8123ecb..bad8b3c 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -183,9 +183,63 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 
 }
 
+unsigned long ptrace_get_debugreg(struct task_struct *child, int n)
+{
+   switch (n) {
+   case 0: return child->thread.debugreg0;
+   case 1: return child->thread.debugreg1;
+   case 2: return child->thread.debugreg2;
+   case 3: return child->thread.debugreg3;
+   case 6: return child->thread.debugreg6;
+   case 7: return child->thread.debugreg7;
+   }
+   return 0;
+}
+
+int ptrace_set_debugreg(struct task_struct *child, int n, unsigned long data)
+{
+   int i;
+
+   if (n < 4) {
+   int dsize = test_tsk_thread_flag(child, TIF_IA32) ? 3 : 7;
+   if (unlikely(data >= TASK_SIZE_OF(child) - dsize))
+   return -EIO;
+   }
+
+   switch (n) {
+   case 0: child->thread.debugreg0 = data; break;
+   case 1: child->thread.debugreg1 = data; break;
+   case 2: child->thread.debugreg2 = data; break;
+   case 3: child->thread.debugreg3 = data; break;
+
+   case 6:
+   if (data >> 32)
+   return -EIO;
+   child->thread.debugreg6 = data;
+   break;
+
+   case 7:
+   /*
+* See ptrace_32.c for an explanation of this awkward check.
+*/
+   data &= ~DR_CONTROL_RESERVED;
+   for (i = 0; i < 4; i++)
+   if ((0x5554 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
+   return -EIO;
+   child->thread.debugreg7 = data;
+   if (data)
+   set_tsk_thread_flag(child, TIF_DEBUG);
+   else
+   clear_tsk_thread_flag(child, TIF_DEBUG);
+   break;
+   }
+
+   return 0;
+}
+
 long arch_ptrace(struct task_struct *child, long request, long addr, long data)
 {
-   long i, ret;
+   long ret;
unsigned ui;
 
switch (request) {
@@ -204,32 +258,14 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
addr > sizeof(struct user) - 7)
break;
 
-   switch (addr) {
-   case 0 ... sizeof(struct user_regs_struct) - sizeof(long):
+   tmp = 0;
+   if (addr < sizeof(struct user_regs_struct))
tmp = getreg(child, addr);
-   break;
-   case offsetof(struct user, u_debugreg[0]):
-   tmp = child->thread.debugreg0;
-   break;
-   case offsetof(struct user, u_debugreg[1]):
-   tmp = child->thread.debugreg1;
-   break;
-   case offsetof(struct user, u_debugreg[2]):
-   tmp = child->thread.debugreg2;
-   break;
-   case offsetof(struct user, u_debugreg[3]):
-   tmp = child->thread.debugreg3;
-   break;
-   case offsetof(struct user, u_debugreg[6]):
-   tmp = child->thread.debugreg6;
-   break;
-   case offsetof(struct user, u_debugreg[7]):
-   tmp = child->thread.debugreg7;
-   break;
-   default:
-   tmp = 0;
-   break;
+   else if (addr >= offsetof(struct user, u_debugreg[0])) {
+   addr -= offsetof(struct user, u_debugreg[0]);
+   tmp = ptrace_get_debugreg(child, addr / sizeof(long));
}
+
ret = put_user(tmp,(unsigned long __user *) data);
break;
}
@@ -241,63 +277,19 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
break;
 
case PTRACE_POKEUSR: /* write the word at location addr in the USER 
area */
-   {
-   int dsize = test_tsk_thread_flag(child, TIF_IA32) ? 3 : 7;
ret = -EIO;
if ((addr & 7) ||
addr > sizeof(struct user) - 7)
break;
 
-   switch 

[PATCH 19/27] x86-32 ptrace debugreg cleanup

2007-11-25 Thread Roland McGrath

This cleans up the 32-bit ptrace code to separate the guts of the
debug register access from the implementation of PTRACE_PEEKUSR and
PTRACE_POKEUSR.  The new functions ptrace_[gs]et_debugreg match the
new 64-bit entry points for parity, but they don't need to be global.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_32.c |  119 +--
 1 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index 7c33244..0aa3756 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -119,6 +119,72 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 }
 
 /*
+ * This function is trivial and will be inlined by the compiler.
+ * Having it separates the implementation details of debug
+ * registers from the interface details of ptrace.
+ */
+static unsigned long ptrace_get_debugreg(struct task_struct *child, int n)
+{
+   return child->thread.debugreg[n];
+}
+
+static int ptrace_set_debugreg(struct task_struct *child,
+  int n, unsigned long data)
+{
+   if (unlikely(n == 4 || n == 5))
+   return -EIO;
+
+   if (n < 4 && unlikely(data >= TASK_SIZE - 3))
+   return -EIO;
+
+   if (n == 7) {
+   /*
+* Sanity-check data. Take one half-byte at once with
+* check = (val >> (16 + 4*i)) & 0xf. It contains the
+* R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
+* 2 and 3 are LENi. Given a list of invalid values,
+* we do mask |= 1 << invalid_value, so that
+* (mask >> check) & 1 is a correct test for invalid
+* values.
+*
+* R/Wi contains the type of the breakpoint /
+* watchpoint, LENi contains the length of the watched
+* data in the watchpoint case.
+*
+* The invalid values are:
+* - LENi == 0x10 (undefined), so mask |= 0x0f00.
+* - R/Wi == 0x10 (break on I/O reads or writes), so
+*   mask |= 0x.
+* - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
+*   0x1110.
+*
+* Finally, mask = 0x0f00 | 0x | 0x1110 == 0x5f54.
+*
+* See the Intel Manual "System Programming Guide",
+* 15.2.4
+*
+* Note that LENi == 0x10 is defined on x86_64 in long
+* mode (i.e. even for 32-bit userspace software, but
+* 64-bit kernel), so the x86_64 mask value is 0x5454.
+* See the AMD manual no. 24593 (AMD64 System Programming)
+*/
+   int i;
+   data &= ~DR_CONTROL_RESERVED;
+   for (i = 0; i < 4; i++)
+   if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
+   return -EIO;
+   if (data)
+   set_tsk_thread_flag(child, TIF_DEBUG);
+   else
+   clear_tsk_thread_flag(child, TIF_DEBUG);
+   }
+
+   child->thread.debugreg[n] = data;
+
+   return 0;
+}
+
+/*
  * Called by kernel/ptrace.c when detaching..
  *
  * Make sure the single step bit is not set.
@@ -158,7 +224,7 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
   addr <= (long) >u_debugreg[7]){
addr -= (long) >u_debugreg[0];
addr = addr >> 2;
-   tmp = child->thread.debugreg[addr];
+   tmp = ptrace_get_debugreg(child, addr);
}
ret = put_user(tmp, datap);
break;
@@ -188,56 +254,9 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
  ret = -EIO;
  if(addr >= (long) >u_debugreg[0] &&
 addr <= (long) >u_debugreg[7]){
-
- if(addr == (long) >u_debugreg[4]) break;
- if(addr == (long) >u_debugreg[5]) break;
- if(addr < (long) >u_debugreg[4] &&
-((unsigned long) data) >= TASK_SIZE-3) break;
-
- /* Sanity-check data. Take one half-byte at once with
-  * check = (val >> (16 + 4*i)) & 0xf. It contains the
-  * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-  * 2 and 3 are LENi. Given a list of invalid values,
-  * we do mask |= 1 << invalid_value, so that
-  * (mask >> check) & 1 is a correct test for invalid
-  * values.
-  *
-  * R/Wi contains the 

[2.6 patch] scsi/qla2xxx/qla_os.c section fix

2007-11-25 Thread Adrian Bunk
qla2x00_remove_one() mustn't be __devexit since it's called from 
qla2xxx_pci_error_detected().

This patch fixes the following section mismatch:

<--  snip  -->

...
WARNING: vmlinux.o(.text+0x2a4462): Section mismatch: reference to 
.exit.text:qla2x00_remove_one (between 'qla2xxx_pci_error_detected' and 
'qla2x00_stop_timer')
...

<--  snip  -->

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/scsi/qla2xxx/qla_os.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

764ebbed3c09f765963c20a3a326cf651685a81a 
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index a5bcf1f..8ecc047 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1831,7 +1831,7 @@ probe_out:
return ret;
 }
 
-static void __devexit
+static void
 qla2x00_remove_one(struct pci_dev *pdev)
 {
scsi_qla_host_t *ha;
@@ -2965,7 +2965,7 @@ static struct pci_driver qla2xxx_pci_driver = {
},
.id_table   = qla2xxx_pci_tbl,
.probe  = qla2x00_probe_one,
-   .remove = __devexit_p(qla2x00_remove_one),
+   .remove = qla2x00_remove_one,
.err_handler= _err_handler,
 };
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/27] x86-64 ia32 ptrace debugreg cleanup

2007-11-25 Thread Roland McGrath

This cleans up the ia32 compat ptrace code to use shared code from
native ptrace for the implementation guts of debug register access.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/ia32/ptrace32.c |   63 ++
 1 files changed, 8 insertions(+), 55 deletions(-)

diff --git a/arch/x86/ia32/ptrace32.c b/arch/x86/ia32/ptrace32.c
index a9a5cd4..5661abd 100644
--- a/arch/x86/ia32/ptrace32.c
+++ b/arch/x86/ia32/ptrace32.c
@@ -40,7 +40,6 @@
 
 static int putreg32(struct task_struct *child, unsigned regno, u32 val)
 {
-   int i;
__u64 *stack = (__u64 *)task_pt_regs(child);
 
switch (regno) {
@@ -95,43 +94,10 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
break;
}
 
-   case offsetof(struct user32, u_debugreg[4]): 
-   case offsetof(struct user32, u_debugreg[5]):
-   return -EIO;
-
-   case offsetof(struct user32, u_debugreg[0]):
-   child->thread.debugreg0 = val;
-   break;
-
-   case offsetof(struct user32, u_debugreg[1]):
-   child->thread.debugreg1 = val;
-   break;
-
-   case offsetof(struct user32, u_debugreg[2]):
-   child->thread.debugreg2 = val;
-   break;
-
-   case offsetof(struct user32, u_debugreg[3]):
-   child->thread.debugreg3 = val;
-   break;
-
-   case offsetof(struct user32, u_debugreg[6]):
-   child->thread.debugreg6 = val;
-   break; 
-
-   case offsetof(struct user32, u_debugreg[7]):
-   val &= ~DR_CONTROL_RESERVED;
-   /* See arch/i386/kernel/ptrace.c for an explanation of
-* this awkward check.*/
-   for(i=0; i<4; i++)
-   if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1)
-  return -EIO;
-   child->thread.debugreg7 = val; 
-   if (val)
-   set_tsk_thread_flag(child, TIF_DEBUG);
-   else
-   clear_tsk_thread_flag(child, TIF_DEBUG);
-   break; 
+   case offsetof(struct user32, u_debugreg[0]) ...
+   offsetof(struct user32, u_debugreg[7]):
+   regno -= offsetof(struct user32, u_debugreg[0]);
+   return ptrace_set_debugreg(child, regno / 4, val);

default:
if (regno > sizeof(struct user32) || (regno & 3))
@@ -188,23 +154,10 @@ static int getreg32(struct task_struct *child, unsigned 
regno, u32 *val)
*val &= ~X86_EFLAGS_TF;
break;
 
-   case offsetof(struct user32, u_debugreg[0]): 
-   *val = child->thread.debugreg0; 
-   break; 
-   case offsetof(struct user32, u_debugreg[1]): 
-   *val = child->thread.debugreg1; 
-   break; 
-   case offsetof(struct user32, u_debugreg[2]): 
-   *val = child->thread.debugreg2; 
-   break; 
-   case offsetof(struct user32, u_debugreg[3]): 
-   *val = child->thread.debugreg3; 
-   break; 
-   case offsetof(struct user32, u_debugreg[6]): 
-   *val = child->thread.debugreg6; 
-   break; 
-   case offsetof(struct user32, u_debugreg[7]): 
-   *val = child->thread.debugreg7; 
+   case offsetof(struct user32, u_debugreg[0]) ...
+   offsetof(struct user32, u_debugreg[7]):
+   regno -= offsetof(struct user32, u_debugreg[0]);
+   *val = ptrace_get_debugreg(child, regno / 4);
break; 

default:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] finish the VID_HARDWARE_* removal

2007-11-25 Thread Adrian Bunk
This patch removes a few remainders of the VID_HARDWARE_* removal.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 Documentation/DocBook/videobook.tmpl  |9 -
 drivers/media/video/usbvision/usbvision.h |4 
 2 files changed, 13 deletions(-)

643d01fb38b6f376cced035549f4e193018776e7 
diff --git a/Documentation/DocBook/videobook.tmpl 
b/Documentation/DocBook/videobook.tmpl
index b629da3..b3d93ee 100644
--- a/Documentation/DocBook/videobook.tmpl
+++ b/Documentation/DocBook/videobook.tmpl
@@ -96,7 +96,6 @@ static struct video_device my_radio
 {
 "My radio",
 VID_TYPE_TUNER,
-VID_HARDWARE_MYRADIO,
 radio_open.
 radio_close,
 NULL,/* no read */
@@ -119,13 +118,6 @@ static struct video_device my_radio
 way to change channel so it is tuneable.
   
   
-The VID_HARDWARE_ types are unique to each device. Numbers are 
assigned by
-[EMAIL PROTECTED] when device drivers are going to be 
released. Until then you
-can pull a suitably large number out of your hat and use it. 1 
should be
-safe for a very long time even allowing for the huge number of vendors
-making new and different radio cards at the moment.
-  
-  
 We declare an open and close routine, but we do not need read or write,
 which are used to read and write video data to or from the card 
itself. As
 we have no read or write there is no poll function.
@@ -844,7 +836,6 @@ static struct video_device my_camera
 "My Camera",
 VID_TYPE_OVERLAY|VID_TYPE_SCALES|\
 VID_TYPE_CAPTURE|VID_TYPE_CHROMAKEY,
-VID_HARDWARE_MYCAMERA,
 camera_open.
 camera_close,
 camera_read,  /* no read */
diff --git a/drivers/media/video/usbvision/usbvision.h 
b/drivers/media/video/usbvision/usbvision.h
index c5b6c50..2b7c1bf 100644
--- a/drivers/media/video/usbvision/usbvision.h
+++ b/drivers/media/video/usbvision/usbvision.h
@@ -40,10 +40,6 @@
 
 #define USBVISION_DEBUG/* Turn on debug messages */
 
-#ifndef VID_HARDWARE_USBVISION
-   #define VID_HARDWARE_USBVISION 34   /* USBVision Video Grabber */
-#endif
-
 #define USBVISION_PWR_REG  0x00
#define USBVISION_SSPND_EN  (1 << 1)
#define USBVISION_RES2  (1 << 2)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/27] x86-64 ptrace: use task_pt_regs

2007-11-25 Thread Roland McGrath

This cleans up the 64-bit ptrace code to use task_pt_regs instead of its
own redundant code that does the same thing a different way.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_64.c |   60 --
 1 files changed, 12 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index 85fba7b..8123ecb 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -43,44 +43,6 @@
 #define FLAG_MASK 0x54dd5UL
 
 /*
- * eflags and offset of eflags on child stack..
- */
-#define EFLAGS offsetof(struct pt_regs, eflags)
-#define EFL_OFFSET ((int)(EFLAGS-sizeof(struct pt_regs)))
-
-/*
- * this routine will get a word off of the processes privileged stack.
- * the offset is how far from the base addr as stored in the TSS.
- * this routine assumes that all the privileged stacks are in our
- * data space.
- */
-static inline unsigned long get_stack_long(struct task_struct *task, int 
offset)
-{
-   unsigned char *stack;
-
-   stack = (unsigned char *)task->thread.rsp0;
-   stack += offset;
-   return (*((unsigned long *)stack));
-}
-
-/*
- * this routine will put a word on the processes privileged stack.
- * the offset is how far from the base addr as stored in the TSS.
- * this routine assumes that all the privileged stacks are in our
- * data space.
- */
-static inline long put_stack_long(struct task_struct *task, int offset,
-   unsigned long data)
-{
-   unsigned char * stack;
-
-   stack = (unsigned char *) task->thread.rsp0;
-   stack += offset;
-   *(unsigned long *) stack = data;
-   return 0;
-}
-
-/*
  * Called by kernel/ptrace.c when detaching..
  *
  * Make sure the single step bit is not set.
@@ -90,11 +52,16 @@ void ptrace_disable(struct task_struct *child)
user_disable_single_step(child);
 }
 
+static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long 
offset)
+{
+   BUILD_BUG_ON(offsetof(struct pt_regs, r15) != 0);
+   return >r15 + (offset / sizeof(regs->r15));
+}
+
 static int putreg(struct task_struct *child,
unsigned long regno, unsigned long value)
 {
-   unsigned long tmp;
-
+   struct pt_regs *regs = task_pt_regs(child);
switch (regno) {
case offsetof(struct user_regs_struct,fs):
if (value && (value & 3) != 3)
@@ -152,9 +119,7 @@ static int putreg(struct task_struct *child,
clear_tsk_thread_flag(child, TIF_FORCED_TF);
else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
value |= X86_EFLAGS_TF;
-   tmp = get_stack_long(child, EFL_OFFSET);
-   tmp &= ~FLAG_MASK;
-   value |= tmp;
+   value |= regs->eflags & ~FLAG_MASK;
break;
case offsetof(struct user_regs_struct,cs):
if ((value & 3) != 3)
@@ -162,12 +127,13 @@ static int putreg(struct task_struct *child,
value &= 0x;
break;
}
-   put_stack_long(child, regno - sizeof(struct pt_regs), value);
+   *pt_regs_access(regs, regno) = value;
return 0;
 }
 
 static unsigned long getreg(struct task_struct *child, unsigned long regno)
 {
+   struct pt_regs *regs = task_pt_regs(child);
unsigned long val;
switch (regno) {
case offsetof(struct user_regs_struct, fs):
@@ -202,16 +168,14 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
/*
 * If the debugger set TF, hide it from the readout.
 */
-   regno = regno - sizeof(struct pt_regs);
-   val = get_stack_long(child, regno);
+   val = regs->eflags;
if (test_tsk_thread_flag(child, TIF_IA32))
val &= 0x;
if (test_tsk_thread_flag(child, TIF_FORCED_TF))
val &= ~X86_EFLAGS_TF;
return val;
default:
-   regno = regno - sizeof(struct pt_regs);
-   val = get_stack_long(child, regno);
+   val = *pt_regs_access(regs, regno);
if (test_tsk_thread_flag(child, TIF_IA32))
val &= 0x;
return val;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/27] powerpc: arch_has_single_step

2007-11-25 Thread Roland McGrath

This defines the new standard arch_has_single_step macro.  It makes the
existing set_single_step and clear_single_step entry points global, and
renames them to the new standard names user_enable_single_step and
user_disable_single_step, respectively.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/powerpc/kernel/ptrace.c |   12 ++--
 include/asm-powerpc/ptrace.h |7 +++
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3e17d15..b970d79 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -256,7 +256,7 @@ static int set_evrregs(struct task_struct *task, unsigned 
long *data)
 #endif /* CONFIG_SPE */
 
 
-static void set_single_step(struct task_struct *task)
+void user_enable_single_step(struct task_struct *task)
 {
struct pt_regs *regs = task->thread.regs;
 
@@ -271,7 +271,7 @@ static void set_single_step(struct task_struct *task)
set_tsk_thread_flag(task, TIF_SINGLESTEP);
 }
 
-static void clear_single_step(struct task_struct *task)
+void user_disable_single_step(struct task_struct *task)
 {
struct pt_regs *regs = task->thread.regs;
 
@@ -313,7 +313,7 @@ static int ptrace_set_debugreg(struct task_struct *task, 
unsigned long addr,
 void ptrace_disable(struct task_struct *child)
 {
/* make sure the single step bit is not set. */
-   clear_single_step(child);
+   user_disable_single_step(child);
 }
 
 /*
@@ -456,7 +456,7 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
child->exit_code = data;
/* make sure the single step bit is not set. */
-   clear_single_step(child);
+   user_disable_single_step(child);
wake_up_process(child);
ret = 0;
break;
@@ -473,7 +473,7 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
break;
child->exit_code = SIGKILL;
/* make sure the single step bit is not set. */
-   clear_single_step(child);
+   user_disable_single_step(child);
wake_up_process(child);
break;
}
@@ -483,7 +483,7 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
if (!valid_signal(data))
break;
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   set_single_step(child);
+   user_enable_single_step(child);
child->exit_code = data;
/* give it a chance to run. */
wake_up_process(child);
diff --git a/include/asm-powerpc/ptrace.h b/include/asm-powerpc/ptrace.h
index 13fccc5..3063363 100644
--- a/include/asm-powerpc/ptrace.h
+++ b/include/asm-powerpc/ptrace.h
@@ -119,6 +119,13 @@ do {   
  \
 } while (0)
 #endif /* __powerpc64__ */
 
+/*
+ * These are defined as per linux/ptrace.h, which see.
+ */
+#define arch_has_single_step() (1)
+extern void user_enable_single_step(struct task_struct *);
+extern void user_disable_single_step(struct task_struct *);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/27] x86-32 ptrace: use task_pt_regs

2007-11-25 Thread Roland McGrath

This cleans up the 32-bit ptrace code to use task_pt_regs instead of its
own redundant code that does the same thing a different way.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_32.c |   68 ++
 1 files changed, 16 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index 50882b3..7c33244 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -37,53 +37,20 @@
  */
 #define FLAG_MASK 0x00050dd5
 
-/*
- * Offset of eflags on child stack..
- */
-#define EFL_OFFSET offsetof(struct pt_regs, eflags)
-
-static inline struct pt_regs *get_child_regs(struct task_struct *task)
-{
-   void *stack_top = (void *)task->thread.esp0;
-   return stack_top - sizeof(struct pt_regs);
-}
-
-/*
- * This routine will get a word off of the processes privileged stack.
- * the offset is bytes into the pt_regs structure on the stack.
- * This routine assumes that all the privileged stacks are in our
- * data space.
- */
-static inline int get_stack_long(struct task_struct *task, int offset)
+static long *pt_regs_access(struct pt_regs *regs, unsigned long regno)
 {
-   unsigned char *stack;
-
-   stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
-   stack += offset;
-   return (*((int *)stack));
-}
-
-/*
- * This routine will put a word on the processes privileged stack.
- * the offset is bytes into the pt_regs structure on the stack.
- * This routine assumes that all the privileged stacks are in our
- * data space.
- */
-static inline int put_stack_long(struct task_struct *task, int offset,
-   unsigned long data)
-{
-   unsigned char * stack;
-
-   stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
-   stack += offset;
-   *(unsigned long *) stack = data;
-   return 0;
+   BUILD_BUG_ON(offsetof(struct pt_regs, ebx) != 0);
+   if (regno > FS)
+   --regno;
+   return >ebx + regno;
 }
 
 static int putreg(struct task_struct *child,
unsigned long regno, unsigned long value)
 {
-   switch (regno >> 2) {
+   struct pt_regs *regs = task_pt_regs(child);
+   regno >>= 2;
+   switch (regno) {
case GS:
if (value && (value & 3) != 3)
return -EIO;
@@ -113,26 +80,25 @@ static int putreg(struct task_struct *child,
clear_tsk_thread_flag(child, TIF_FORCED_TF);
else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
value |= X86_EFLAGS_TF;
-   value |= get_stack_long(child, EFL_OFFSET) & ~FLAG_MASK;
+   value |= regs->eflags & ~FLAG_MASK;
break;
}
-   if (regno > FS*4)
-   regno -= 1*4;
-   put_stack_long(child, regno, value);
+   *pt_regs_access(regs, regno) = value;
return 0;
 }
 
-static unsigned long getreg(struct task_struct *child,
-   unsigned long regno)
+static unsigned long getreg(struct task_struct *child, unsigned long regno)
 {
+   struct pt_regs *regs = task_pt_regs(child);
unsigned long retval = ~0UL;
 
-   switch (regno >> 2) {
+   regno >>= 2;
+   switch (regno) {
case EFL:
/*
 * If the debugger set TF, hide it from the readout.
 */
-   retval = get_stack_long(child, EFL_OFFSET);
+   retval = regs->eflags;
if (test_tsk_thread_flag(child, TIF_FORCED_TF))
retval &= ~X86_EFLAGS_TF;
break;
@@ -147,9 +113,7 @@ static unsigned long getreg(struct task_struct *child,
retval = 0x;
/* fall through */
default:
-   if (regno > FS*4)
-   regno -= 1*4;
-   retval &= get_stack_long(child, regno);
+   retval &= *pt_regs_access(regs, regno);
}
return retval;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/27] powerpc: ptrace generic resume

2007-11-25 Thread Roland McGrath

This removes the handling for PTRACE_CONT et al from the powerpc
ptrace code, so it uses the new generic code via ptrace_request.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/powerpc/kernel/ptrace.c |   46 --
 1 files changed, 0 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index b970d79..8b056d2 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -445,52 +445,6 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
break;
}
 
-   case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall 
*/
-   case PTRACE_CONT: { /* restart after signal. */
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-   if (request == PTRACE_SYSCALL)
-   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   else
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   child->exit_code = data;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   ret = 0;
-   break;
-   }
-
-/*
- * make the child exit.  Best I can do is send it a sigkill.
- * perhaps it should be put in the status that it wants to
- * exit.
- */
-   case PTRACE_KILL: {
-   ret = 0;
-   if (child->exit_state == EXIT_ZOMBIE)   /* already dead */
-   break;
-   child->exit_code = SIGKILL;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   break;
-   }
-
-   case PTRACE_SINGLESTEP: {  /* set the trap flag. */
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   user_enable_single_step(child);
-   child->exit_code = data;
-   /* give it a chance to run. */
-   wake_up_process(child);
-   ret = 0;
-   break;
-   }
-
case PTRACE_GET_DEBUGREG: {
ret = -EINVAL;
/* We only support one DABR and no IABRS at the moment */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/27] x86-64: ptrace generic resume

2007-11-25 Thread Roland McGrath

This removes the handling for PTRACE_CONT et al from the 64-bit
ptrace code, so it uses the new generic code via ptrace_request.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_64.c |   45 ---
 1 files changed, 0 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index d8453da..85fba7b 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -334,23 +334,6 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
}
break;
}
-   case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall 
*/
-   case PTRACE_CONT:/* restart after signal. */
-
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-   if (request == PTRACE_SYSCALL)
-   set_tsk_thread_flag(child,TIF_SYSCALL_TRACE);
-   else
-   clear_tsk_thread_flag(child,TIF_SYSCALL_TRACE);
-   clear_tsk_thread_flag(child, TIF_SINGLESTEP);
-   child->exit_code = data;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   ret = 0;
-   break;
 
 #ifdef CONFIG_IA32_EMULATION
/* This makes only sense with 32bit programs. Allow a
@@ -378,34 +361,6 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
ret = do_arch_prctl(child, data, addr);
break;
 
-/*
- * make the child exit.  Best I can do is send it a sigkill.
- * perhaps it should be put in the status that it wants to
- * exit.
- */
-   case PTRACE_KILL:
-   ret = 0;
-   if (child->exit_state == EXIT_ZOMBIE)   /* already dead */
-   break;
-   clear_tsk_thread_flag(child, TIF_SINGLESTEP);
-   child->exit_code = SIGKILL;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   break;
-
-   case PTRACE_SINGLESTEP:/* set the trap flag. */
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-   clear_tsk_thread_flag(child,TIF_SYSCALL_TRACE);
-   user_enable_single_step(child);
-   child->exit_code = data;
-   /* give it a chance to run. */
-   wake_up_process(child);
-   ret = 0;
-   break;
-
case PTRACE_GETREGS: { /* Get all gp regs from the child. */
if (!access_ok(VERIFY_WRITE, (unsigned __user *)data,
   sizeof(struct user_regs_struct))) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Hi,

 I have a Linksys NSLU2 running 2.6.21 (I can replicate the problem on 
2.6.23 but it isn't fully supported on SlugOS).  It is a armv5teb device 
with 32MB of RAM, 400+ MB swap on its 160GB USB2 root disk.  The machine is 
used as a fileserver and to build packages for other ARM devices.  It may be 
underpowered by today's standard but is a whole lot faster than my first 
Linux system (386sx20 with 4MB RAM) but the whole system with disk uses <8 
watts and is silent.


 The problem comes when I try to untar a large file (in this case 
linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
eventually the oom-killer will appear and kill either the tar or the shell. 
I've tried every tuning option I and my buddy Google could find including 
(/proc/sys/vm/overcommit*) with no success.  I'm not worried about paging 
impacting performance.


 I'd appreciate any help, pointers, or gentle taps with the cluebat.

-Josh

Error output to console: http://www.pastebin.ca/797155

config ->  http://www.pastebin.ca/797206

slug2>$ uname -a
Linux slug2 2.6.21 #1 PREEMPT Fri Nov 9 11:54:06 MST 2007 armv5teb unknown

slug2:~$ free
total   used   free sharedbuffers cached
Mem: 30352  29124   1228  0  10196   9468
-/+ buffers/cache:   9460  20892
Swap:   465876  0 465876

cat /proc/swaps
FilenameTypeSizeUsed 
Priority

/dev/sda4   partition   465876  0   -1

slug2:~$ lsmod
Module  Size  Used by
nfsd  186556  8
exportfs4320  1 nfsd
lockd  51416  2 nfsd
sunrpc131952  2 nfsd,lockd
reiserfs  255380  1
ixp4xx_mac 14644  0
ixp4xx_qmgr 5388  5 ixp4xx_mac
mii 3424  1 ixp4xx_mac
ext3  110472  2
jbd47784  1 ext3
mbcache 5604  1 ext3
ohci_hcd   16804  0
ehci_hcd   30252  0

slug2>$ dmesg
<5>Linux version 2.6.21 ([EMAIL PROTECTED]) (gcc version 4.1.1) #1 PREEMPT Fri Nov 9 
11:54:06 MST 2007

<4>CPU: XScale-IXP42x Family [690541f1] revision 1 (ARMv5TE), cr=39ff
<4>Machine: Linksys NSLU2
<4>Memory policy: ECC disabled, Data cache writeback
<7>On node 0 totalpages: 8192
<7>  DMA zone: 64 pages used for memmap
<7>  DMA zone: 0 pages reserved
<7>  DMA zone: 8128 pages, LIFO batch:0
<7>  Normal zone: 0 pages used for memmap
<4>CPU0: D VIVT undefined 5 cache
<4>CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
<4>CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
<4>Built 1 zonelists.  Total pages: 8128
<5>Kernel command line: rtc-x1205.probe=0,0x6f console=ttyS0,115200n8 
root=/dev/mtdblock4 rootfstype=jffs2 rw init=/linuxrc noirqdebug

<6>IRQ lockup detection disabled
<4>PID hash table entries: 128 (order: 7, 512 bytes)
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
<6>Memory: 32MB = 32MB total
<5>Memory: 30268KB available (1940K code, 154K data, 84K init)
<7>Calibrating delay loop... 266.24 BogoMIPS (lpj=1331200)
<4>Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok
<6>NET: Registered protocol family 16
<4>IXP4xx: Using 16MiB expansion bus window size
<4>PCI: IXP4xx is host
<4>PCI: IXP4xx Using direct access for memory space
<6>PCI: bus0: Fast back to back transfers disabled
<6>dmabounce: registered device :00:01.0 on pci bus
<6>dmabounce: registered device :00:01.1 on pci bus
<6>dmabounce: registered device :00:01.2 on pci bus
<5>SCSI subsystem initialized
<6>usbcore: registered new interface driver usbfs
<6>usbcore: registered new interface driver hub
<6>usbcore: registered new device driver usb
<6>Time: OSTS clocksource has been installed.
<6>NET: Registered protocol family 2
<4>IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
<4>TCP established hash table entries: 1024 (order: 1, 8192 bytes)
<4>TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
<6>TCP: Hash tables configured (established 1024 bind 1024)
<6>TCP reno registered
<4>NetWinder Floating Point Emulator V0.97 (double precision)
<6>JFFS2 version 2.2. (NAND) (C) 2001-2006 Red Hat, Inc.
<6>io scheduler noop registered
<6>io scheduler deadline registered (default)
<6>Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
<6>serial8250.0: ttyS0 at MMIO 0xc800 (irq = 15) is a XScale
<6>serial8250.0: ttyS1 at MMIO 0xc8001000 (irq = 13) is a XScale
<4>RAMDISK driver initialized: 4 RAM disks of 10240K size 1024 blocksize
<6>IXP4XX NPE driver Version 0.3.0 initialized
<6>NFTL driver: nftlcore.c $Revision: 1.98 $, nftlmount.c $Revision: 1.41 $
<6>IXP4XX-Flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
<7>IXP4XX-Flash.0: Found an alias at 0x80 for the chip at 0x0
<4> Intel/Sharp 

[PATCH 12/27] x86-32: ptrace generic resume

2007-11-25 Thread Roland McGrath

This removes the handling for PTRACE_CONT et al from the 32-bit
ptrace code, so it uses the new generic code via ptrace_request.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_32.c |   57 ---
 1 files changed, 0 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index a493017..50882b3 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -277,63 +277,6 @@ long arch_ptrace(struct task_struct *child, long request, 
long addr, long data)
  }
  break;
 
-   case PTRACE_SYSEMU: /* continue and stop at next syscall, which will 
not be executed */
-   case PTRACE_SYSCALL:/* continue and stop at next (return from) 
syscall */
-   case PTRACE_CONT:   /* restart after signal. */
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-   if (request == PTRACE_SYSEMU) {
-   set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   } else if (request == PTRACE_SYSCALL) {
-   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
-   } else {
-   clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   }
-   child->exit_code = data;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   ret = 0;
-   break;
-
-/*
- * make the child exit.  Best I can do is send it a sigkill.
- * perhaps it should be put in the status that it wants to
- * exit.
- */
-   case PTRACE_KILL:
-   ret = 0;
-   if (child->exit_state == EXIT_ZOMBIE)   /* already dead */
-   break;
-   child->exit_code = SIGKILL;
-   /* make sure the single step bit is not set. */
-   user_disable_single_step(child);
-   wake_up_process(child);
-   break;
-
-   case PTRACE_SYSEMU_SINGLESTEP: /* Same as SYSEMU, but singlestep if not 
syscall */
-   case PTRACE_SINGLESTEP: /* set the trap flag. */
-   ret = -EIO;
-   if (!valid_signal(data))
-   break;
-
-   if (request == PTRACE_SYSEMU_SINGLESTEP)
-   set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
-   else
-   clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
-
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   user_enable_single_step(child);
-   child->exit_code = data;
-   /* give it a chance to run. */
-   wake_up_process(child);
-   ret = 0;
-   break;
-
case PTRACE_GETREGS: { /* Get all gp regs from the child. */
if (!access_ok(VERIFY_WRITE, datap, FRAME_SIZE*sizeof(long))) {
ret = -EIO;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/27] ptrace: generic resume

2007-11-25 Thread Roland McGrath

This makes ptrace_request handle all the ptrace requests that wake
up the traced task.  These do low-level ptrace implementation magic
that is not arch-specific and should be kept out of arch code.  The
implementations on each arch usually do the same thing.  The new
generic code makes use of the arch_has_single_step macro and generic
entry points to handle PTRACE_SINGLESTEP.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 kernel/ptrace.c |   61 +++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7c76f2f..309796a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -366,6 +366,50 @@ static int ptrace_setsiginfo(struct task_struct *child, 
siginfo_t __user * data)
return error;
 }
 
+
+#ifdef PTRACE_SINGLESTEP
+#define is_singlestep(request) ((request) == PTRACE_SINGLESTEP)
+#else
+#define is_singlestep(request) 0
+#endif
+
+#ifdef PTRACE_SYSEMU
+#define is_sysemu_singlestep(request)  ((request) == PTRACE_SYSEMU_SINGLESTEP)
+#else
+#define is_sysemu_singlestep(request)  0
+#endif
+
+static int ptrace_resume(struct task_struct *child, long request, long data)
+{
+   if (!valid_signal(data))
+   return -EIO;
+
+   if (request == PTRACE_SYSCALL)
+   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+   else
+   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+
+#ifdef TIF_SYSCALL_EMU
+   if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP)
+   set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+   else
+   clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+#endif
+
+   if (is_singlestep(request) || is_sysemu_singlestep(request)) {
+   if (unlikely(!arch_has_single_step()))
+   return -EIO;
+   user_enable_single_step(child);
+   }
+   else
+   user_disable_single_step(child);
+
+   child->exit_code = data;
+   wake_up_process(child);
+
+   return 0;
+}
+
 int ptrace_request(struct task_struct *child, long request,
   long addr, long data)
 {
@@ -390,6 +434,23 @@ int ptrace_request(struct task_struct *child, long request,
case PTRACE_DETACH:  /* detach a process that was attached. */
ret = ptrace_detach(child, data);
break;
+
+#ifdef PTRACE_SINGLESTEP
+   case PTRACE_SINGLESTEP:
+#endif
+#ifdef PTRACE_SYSEMU
+   case PTRACE_SYSEMU:
+   case PTRACE_SYSEMU_SINGLESTEP:
+#endif
+   case PTRACE_SYSCALL:
+   case PTRACE_CONT:
+   return ptrace_resume(child, request, data);
+
+   case PTRACE_KILL:
+   if (child->exit_state)  /* already dead */
+   return 0;
+   return ptrace_resume(child, request, SIGKILL);
+
default:
break;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/27] x86: single_step: share code

2007-11-25 Thread Roland McGrath

This removes the single-step code from ptrace_32.c and uses the step.c code
shared with the 64-bit kernel.  The two versions of the code were nearly
identical already, so the shared code has only a couple of simple #ifdef's.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/Makefile_32 |1 +
 arch/x86/kernel/ptrace_32.c |  125 ---
 arch/x86/kernel/step.c  |   14 +
 3 files changed, 15 insertions(+), 125 deletions(-)

diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index e660584..959ad3c 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -11,6 +11,7 @@ obj-y := process_32.o signal_32.o entry_32.o traps_32.o 
irq_32.o \
quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o
 
 obj-y  += tls.o
+obj-y  += step.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
 obj-y  += acpi/
diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index d1d74e1..e599db5 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -137,131 +137,6 @@ static unsigned long getreg(struct task_struct *child,
return retval;
 }
 
-#define LDT_SEGMENT 4
-
-static unsigned long convert_eip_to_linear(struct task_struct *child, struct 
pt_regs *regs)
-{
-   unsigned long addr, seg;
-
-   addr = regs->eip;
-   seg = regs->xcs & 0x;
-   if (regs->eflags & VM_MASK) {
-   addr = (addr & 0x) + (seg << 4);
-   return addr;
-   }
-
-   /*
-* We'll assume that the code segments in the GDT
-* are all zero-based. That is largely true: the
-* TLS segments are used for data, and the PNPBIOS
-* and APM bios ones we just ignore here.
-*/
-   if (seg & LDT_SEGMENT) {
-   u32 *desc;
-   unsigned long base;
-
-   seg &= ~7UL;
-
-   mutex_lock(>mm->context.lock);
-   if (unlikely((seg >> 3) >= child->mm->context.size))
-   addr = -1L; /* bogus selector, access would fault */
-   else {
-   desc = child->mm->context.ldt + seg;
-   base = ((desc[0] >> 16) |
-   ((desc[1] & 0xff) << 16) |
-   (desc[1] & 0xff00));
-
-   /* 16-bit code segment? */
-   if (!((desc[1] >> 22) & 1))
-   addr &= 0x;
-   addr += base;
-   }
-   mutex_unlock(>mm->context.lock);
-   }
-   return addr;
-}
-
-static inline int is_setting_trap_flag(struct task_struct *child, struct 
pt_regs *regs)
-{
-   int i, copied;
-   unsigned char opcode[15];
-   unsigned long addr = convert_eip_to_linear(child, regs);
-
-   copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0);
-   for (i = 0; i < copied; i++) {
-   switch (opcode[i]) {
-   /* popf and iret */
-   case 0x9d: case 0xcf:
-   return 1;
-   /* opcode and address size prefixes */
-   case 0x66: case 0x67:
-   continue;
-   /* irrelevant prefixes (segment overrides and repeats) */
-   case 0x26: case 0x2e:
-   case 0x36: case 0x3e:
-   case 0x64: case 0x65:
-   case 0xf0: case 0xf2: case 0xf3:
-   continue;
-
-   /*
-* pushf: NOTE! We should probably not let
-* the user see the TF bit being set. But
-* it's more pain than it's worth to avoid
-* it, and a debugger could emulate this
-* all in user space if it _really_ cares.
-*/
-   case 0x9c:
-   default:
-   return 0;
-   }
-   }
-   return 0;
-}
-
-void user_enable_single_step(struct task_struct *child)
-{
-   struct pt_regs *regs = get_child_regs(child);
-
-   /*
-* Always set TIF_SINGLESTEP - this guarantees that
-* we single-step system calls etc..  This will also
-* cause us to set TF when returning to user mode.
-*/
-   set_tsk_thread_flag(child, TIF_SINGLESTEP);
-
-   /*
-* If TF was already set, don't do anything else
-*/
-   if (regs->eflags & X86_EFLAGS_TF)
-   return;
-
-   /* Set TF on the kernel stack.. */
-   regs->eflags |= X86_EFLAGS_TF;
-
-   /*
-* ..but if TF is changed by the instruction we will trace,
-* don't mark it as being "us" that set it, so that we
-* won't clear it by hand later.
-*/
-   if (is_setting_trap_flag(child, regs))
-   return;
-
- 

[PATCH 09/27] x86 single_step: TIF_FORCED_TF

2007-11-25 Thread Roland McGrath

This changes the single-step support to use a new thread_info flag
TIF_FORCED_TF instead of the PT_DTRACE flag in task_struct.ptrace.
This keeps arch implementation uses out of this non-arch field.

This changes the ptrace access to eflags to mask TF and maintain
the TIF_FORCED_TF flag directly if userland sets TF, instead of
relying on ptrace_signal_deliver.  The 64-bit and 32-bit kernels
are harmonized on this same behavior.  The ptrace_signal_deliver
approach works now, but this change makes the low-level register
access code reliable when called from different contexts than a
ptrace stop, which will be possible in the future.

The 64-bit do_debug exception handler is also changed not to clear TF
from user-mode registers.  This matches the 32-bit kernel's behavior.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/ia32/ptrace32.c |   20 ++--
 arch/x86/kernel/process_32.c |3 ---
 arch/x86/kernel/process_64.c |5 -
 arch/x86/kernel/ptrace_32.c  |   17 +
 arch/x86/kernel/ptrace_64.c  |   20 
 arch/x86/kernel/signal_32.c  |   12 +---
 arch/x86/kernel/signal_64.c  |   14 +-
 arch/x86/kernel/step.c   |9 +++--
 arch/x86/kernel/traps_64.c   |   23 +--
 include/asm-x86/signal.h |   11 ++-
 include/asm-x86/thread_info_32.h |2 ++
 include/asm-x86/thread_info_64.h |3 ++-
 12 files changed, 79 insertions(+), 60 deletions(-)

diff --git a/arch/x86/ia32/ptrace32.c b/arch/x86/ia32/ptrace32.c
index 4a233ad..a9a5cd4 100644
--- a/arch/x86/ia32/ptrace32.c
+++ b/arch/x86/ia32/ptrace32.c
@@ -82,6 +82,15 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
case offsetof(struct user32, regs.eflags): {
__u64 *flags = [offsetof(struct pt_regs, eflags)/8];
val &= FLAG_MASK;
+   /*
+* If the user value contains TF, mark that
+* it was not "us" (the debugger) that set it.
+* If not, make sure it stays set if we had.
+*/
+   if (val & X86_EFLAGS_TF)
+   clear_tsk_thread_flag(child, TIF_FORCED_TF);
+   else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
+   val |= X86_EFLAGS_TF;
*flags = val | (*flags & ~FLAG_MASK);
break;
}
@@ -168,9 +177,17 @@ static int getreg32(struct task_struct *child, unsigned 
regno, u32 *val)
R32(eax, rax);
R32(orig_eax, orig_rax);
R32(eip, rip);
-   R32(eflags, eflags);
R32(esp, rsp);
 
+   case offsetof(struct user32, regs.eflags):
+   /*
+* If the debugger set TF, hide it from the readout.
+*/
+   *val = stack[offsetof(struct pt_regs, eflags)/8];
+   if (test_tsk_thread_flag(child, TIF_FORCED_TF))
+   *val &= ~X86_EFLAGS_TF;
+   break;
+
case offsetof(struct user32, u_debugreg[0]): 
*val = child->thread.debugreg0; 
break; 
@@ -401,4 +418,3 @@ asmlinkage long sys32_ptrace(long request, u32 pid, u32 
addr, u32 data)
put_task_struct(child);
return ret;
 }
-
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ebbbfc5..f59544e 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -796,9 +796,6 @@ asmlinkage int sys_execve(struct pt_regs regs)
(char __user * __user *) regs.edx,
);
if (error == 0) {
-   task_lock(current);
-   current->ptrace &= ~PT_DTRACE;
-   task_unlock(current);
/* Make sure we don't return using sysenter.. */
set_thread_flag(TIF_IRET);
}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3fdbf78..586f88e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -698,11 +698,6 @@ long sys_execve(char __user *name, char __user * __user 
*argv,
if (IS_ERR(filename))
return error;
error = do_execve(filename, argv, envp, );
-   if (error == 0) {
-   task_lock(current);
-   current->ptrace &= ~PT_DTRACE;
-   task_unlock(current);
-   }
putname(filename);
return error;
 }
diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index e599db5..a493017 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -104,6 +104,15 @@ static int putreg(struct task_struct *child,
break;
case EFL:
value &= FLAG_MASK;
+   /*
+* If the user value contains TF, mark that
+* it was not "us" (the debugger) 

[PATCH 05/27] x86: single_step moved

2007-11-25 Thread Roland McGrath

This moves the single-step support code from ptrace_64.c into a new file
step.c, verbatim.  This paves the way for consolidating this code between
64-bit and 32-bit versions.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/Makefile_64 |2 +
 arch/x86/kernel/ptrace_64.c |  134 -
 arch/x86/kernel/step.c  |  140 +++
 3 files changed, 142 insertions(+), 134 deletions(-)

diff --git a/arch/x86/kernel/Makefile_64 b/arch/x86/kernel/Makefile_64
index 203a9d8..d35ee6f 100644
--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -13,6 +13,8 @@ obj-y := process_64.o signal_64.o entry_64.o traps_64.o 
irq_64.o \
pci-dma_64.o pci-nommu_64.o alternative.o hpet.o tsc_64.o 
bugs_64.o \
i8253.o
 
+obj-y  += step.o
+
 obj-$(CONFIG_IA32_EMULATION)   += tls.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index c2e1a13..52479b1 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -80,140 +80,6 @@ static inline long put_stack_long(struct task_struct *task, 
int offset,
return 0;
 }
 
-#define LDT_SEGMENT 4
-
-unsigned long convert_rip_to_linear(struct task_struct *child, struct pt_regs 
*regs)
-{
-   unsigned long addr, seg;
-
-   addr = regs->rip;
-   seg = regs->cs & 0x;
-
-   /*
-* We'll assume that the code segments in the GDT
-* are all zero-based. That is largely true: the
-* TLS segments are used for data, and the PNPBIOS
-* and APM bios ones we just ignore here.
-*/
-   if (seg & LDT_SEGMENT) {
-   u32 *desc;
-   unsigned long base;
-
-   seg &= ~7UL;
-
-   mutex_lock(>mm->context.lock);
-   if (unlikely((seg >> 3) >= child->mm->context.size))
-   addr = -1L; /* bogus selector, access would fault */
-   else {
-   desc = child->mm->context.ldt + seg;
-   base = ((desc[0] >> 16) |
-   ((desc[1] & 0xff) << 16) |
-   (desc[1] & 0xff00));
-
-   /* 16-bit code segment? */
-   if (!((desc[1] >> 22) & 1))
-   addr &= 0x;
-   addr += base;
-   }
-   mutex_unlock(>mm->context.lock);
-   }
-
-   return addr;
-}
-
-static int is_setting_trap_flag(struct task_struct *child, struct pt_regs 
*regs)
-{
-   int i, copied;
-   unsigned char opcode[15];
-   unsigned long addr = convert_rip_to_linear(child, regs);
-
-   copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0);
-   for (i = 0; i < copied; i++) {
-   switch (opcode[i]) {
-   /* popf and iret */
-   case 0x9d: case 0xcf:
-   return 1;
-
-   /* CHECKME: 64 65 */
-
-   /* opcode and address size prefixes */
-   case 0x66: case 0x67:
-   continue;
-   /* irrelevant prefixes (segment overrides and repeats) */
-   case 0x26: case 0x2e:
-   case 0x36: case 0x3e:
-   case 0x64: case 0x65:
-   case 0xf2: case 0xf3:
-   continue;
-
-   case 0x40 ... 0x4f:
-   if (regs->cs != __USER_CS)
-   /* 32-bit mode: register increment */
-   return 0;
-   /* 64-bit mode: REX prefix */
-   continue;
-
-   /* CHECKME: f2, f3 */
-
-   /*
-* pushf: NOTE! We should probably not let
-* the user see the TF bit being set. But
-* it's more pain than it's worth to avoid
-* it, and a debugger could emulate this
-* all in user space if it _really_ cares.
-*/
-   case 0x9c:
-   default:
-   return 0;
-   }
-   }
-   return 0;
-}
-
-void user_enable_single_step(struct task_struct *child)
-{
-   struct pt_regs *regs = task_pt_regs(child);
-
-   /*
-* Always set TIF_SINGLESTEP - this guarantees that
-* we single-step system calls etc..  This will also
-* cause us to set TF when returning to user mode.
-*/
-   set_tsk_thread_flag(child, TIF_SINGLESTEP);
-
-   /*
-* If TF was already set, don't do anything else
-*/
-   if (regs->eflags & X86_EFLAGS_TF)
-   return;
-
-   /* Set TF on the kernel stack.. */
-   regs->eflags |= X86_EFLAGS_TF;
-
-   /*
-* ..but if TF is changed by the instruction we will trace,
-   

  1   2   3   4   >