Re: [PATCH] Possible AMD8111e free irq issue

2005-03-01 Thread Panagiotis Issaris
Hi,
Jeff Garzik wrote:
diff -uprN linux-2.6.11-rc5-bk2/drivers/net/amd8111e.c 
linux-2.6.11-rc5-bk2-pi/drivers/net/amd8111e.c
--- linux-2.6.11-rc5-bk2/drivers/net/amd8111e.c2005-02-28 
13:44:46.0 +0100
+++ linux-2.6.11-rc5-bk2-pi/drivers/net/amd8111e.c2005-02-28 
13:45:09.0 +0100
@@ -1381,6 +1381,8 @@ static int amd8111e_open(struct net_devi
 
 if(amd8111e_restart(dev)){
 spin_unlock_irq(>lock);
+if (dev->irq)
+free_irq(dev->irq, dev);
 return -ENOMEM;

Yes, this is a needed fix.  Thanks.
Should the release of the irq happen before or after unlocking the 
spinlock? I wasn't really
sure about it.

With friendly regards,
Takis
--
 K.U.Leuven, Mechanical Eng.,  Mechatronics & Robotics Research Group
 http://people.mech.kuleuven.ac.be/~pissaris/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove dead cyrix/centaur mtrr init code

2005-03-01 Thread Andries Brouwer
On Tue, Mar 01, 2005 at 11:52:44PM +, Alan Cox wrote:
> On Llu, 2005-02-28 at 19:20, Andries Brouwer wrote:
> > One such case is the mtrr code, where struct mtrr_ops has an
> > init field pointing at __init functions. Unless I overlook
> > something, this case may be easy to settle, since the .init
> > field is never used.
> 
> The failure to invoke the ->init operator appears to be the bug.
> The centaur code definitely wants the mcr init function to be called.

Yes, I expected that to be the answer. Therefore #if 0 instead of deleting.
But if calling ->init() is needed, and it has not been done the past
three years, the question arises whether there are any users.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-03-01 Thread Guillaume Thouvenin
On Tue, 2005-03-01 at 10:06 -0800, Jay Lan wrote:
> Sorry I was not clear on my point.
> 
> I was trying to point out that, an exit hook for BSD and CSA is
> essential to save accounting data before the data is gone. That
> can not be done with a netlink.
> 
> So, my patch was to keep acct_process as a wrapper, which
> would then call do_exit_csa() for CSA and call do_acct_process
> for BSD.

Is it possible to merge BSD and CSA? I mean with CSA, there is a part
that does per-process accounting. For exemple in the
linux-2.6.9.acct_mm.patch the two functions update_mem_hiwater() and
csa_update_integrals() update fields in the current (and parent)
process. So maybe you can improve the BSD per-process accounting or
maybe CSA can replace the BSD per-process accounting?

Guillaume  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix module paramater permissions in radeon_base.c

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 23:11 -0800, Greg KH wrote:
> You really don't want -2 for the file mode in sysfs.  It creates:
>   -rwsrwsrwT  1 root root 4096 Mar  1 22:59 
> /sys/module/radeonfb/parameters/default_dynclk
> 
> on my box.  Here's a fix against a clean 2.6.11-rc5 kernel, please
> forward onward as you see fit.
> 
> 
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> 
> 
> --- 1.27/drivers/video/aty/radeon_base.c  2005-02-24 11:40:00 -08:00
> +++ edited/drivers/video/aty/radeon_base.c2005-03-01 23:09:12 -08:00
> @@ -2551,7 +2551,7 @@
>  MODULE_DESCRIPTION("framebuffer driver for ATI Radeon chipset");
>  MODULE_LICENSE("GPL");
>  module_param(noaccel, bool, 0);
> -module_param(default_dynclk, int, -2);
> +module_param(default_dynclk, int, 0);
>  MODULE_PARM_DESC(default_dynclk, "int: -2=enable on mobility only,-1=do not 
> change,0=off,1=on");
>  MODULE_PARM_DESC(noaccel, "bool: disable acceleration");
>  module_param(nomodeset, bool, 0);

Right, that is bogus, thanks.

Ben.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix module paramater permissions in radeon_base.c

2005-03-01 Thread Greg KH
You really don't want -2 for the file mode in sysfs.  It creates:
  -rwsrwsrwT  1 root root 4096 Mar  1 22:59 
/sys/module/radeonfb/parameters/default_dynclk

on my box.  Here's a fix against a clean 2.6.11-rc5 kernel, please
forward onward as you see fit.


Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>


--- 1.27/drivers/video/aty/radeon_base.c2005-02-24 11:40:00 -08:00
+++ edited/drivers/video/aty/radeon_base.c  2005-03-01 23:09:12 -08:00
@@ -2551,7 +2551,7 @@
 MODULE_DESCRIPTION("framebuffer driver for ATI Radeon chipset");
 MODULE_LICENSE("GPL");
 module_param(noaccel, bool, 0);
-module_param(default_dynclk, int, -2);
+module_param(default_dynclk, int, 0);
 MODULE_PARM_DESC(default_dynclk, "int: -2=enable on mobility only,-1=do not 
change,0=off,1=on");
 MODULE_PARM_DESC(noaccel, "bool: disable acceleration");
 module_param(nomodeset, bool, 0);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SCSI Target Mode issue...... pls help

2005-03-01 Thread Nauman
hello all the gurus out there, 
i have written simple Target for SCSI device. its in very early stage.
I started to handle simple commands from the INITIATOR like INQUIRY,
READ CAPACITY , REPORT LUN.
Now i am upto READ and WRITE. I have responded READ properly. Problem
is in WRITE command. For instance there is a case when i get multiple
WRITE command from INITIATOR
i queue command as i receive it. CTIO has to be sent to firmware for
each recieved command  . in my case i send CTIO as i recieve the
command. now firmware has to send back the response for each CTIO i
sent. here is whats happening
i get 2 commands for WRITE. send CTIO for cmd1 and cmd2 and what i get
back from firmware is response of second cmd which is cmd2. cmd1's
command time out occurs and it fails to respond.

if any one has done basic handshake and handled READ and WRITE for
TARGET mode then please share ur knowledge..
Best Regards,
-- 
When the going gets tough, The tough gets going...!
Peace ,  
Nauman.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.11-rc4-mm1 patch] fix buggy IEEE80211_CRYPT_* selects

2005-03-01 Thread Jeff Garzik
Adrian Bunk wrote:
+	select CRYPTO
 	select CRYPTO_AES
 	---help---
 	Include software based cipher suites in support of IEEE 802.11i 
 	(aka TGi, WPA, WPA2, WPA-PSK, etc.) for use with CCMP enabled 
 	networks.
@@ -54,10 +55,11 @@
 	"ieee80211_crypt_ccmp".
 
 config IEEE80211_CRYPT_TKIP
 	tristate "IEEE 802.11i TKIP encryption"
 	depends on IEEE80211
+	select CRYPTO
 	select CRYPTO_MICHAEL_MIC

'select CRYPTO_AES' should 'select CRYPTO' automatically, I would hope.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT on 2.4 ext3

2005-03-01 Thread Andreas Dilger
On Mar 01, 2005  21:34 -0800, Junfeng Yang wrote:
> I tried to read from a regular ext3 file opened as O_DIRECT, but got the
> "Invalid argument" error.  Running the same test program on a block device
> succeeded.

ext3 doesn't support the direct_IO method in 2.4 kernels, though there
was a patch at one time.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpRKdFyTZvk0.pgp
Description: PGP signature


Re: via 6420 pata/sata controller

2005-03-01 Thread Jeff Garzik
If I had to guess, I would try the attached patch.  The via82cxxx.c 
driver is a bit annoying in that, here we do not talk to the ISA bridge 
but to the PCI device 0x4149 itself.

If this doesn't work, I could probably whip together a quick PATA driver 
for libata that works on this hardware.

Jeff

= drivers/ide/pci/via82cxxx.c 1.27 vs edited =
--- 1.27/drivers/ide/pci/via82cxxx.c2005-02-03 02:24:29 -05:00
+++ edited/drivers/ide/pci/via82cxxx.c  2005-03-02 01:28:26 -05:00
@@ -79,6 +79,7 @@
u8 rev_max;
u16 flags;
 } via_isa_bridges[] = {
+   { "vt6420", 0x4149, 0x00, 0x2f, VIA_UDMA_133 | 
VIA_BAD_AST },
{ "vt8237", PCI_DEVICE_ID_VIA_8237, 0x00, 0x2f, VIA_UDMA_133 | 
VIA_BAD_AST },
{ "vt8235", PCI_DEVICE_ID_VIA_8235, 0x00, 0x2f, VIA_UDMA_133 | 
VIA_BAD_AST },
{ "vt8233a",PCI_DEVICE_ID_VIA_8233A,0x00, 0x2f, VIA_UDMA_133 | 
VIA_BAD_AST },
@@ -635,9 +636,10 @@
 }
 
 static struct pci_device_id via_pci_tbl[] = {
-   { PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C576_1, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 0},
-   { PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 0},
-   { 0, },
+   { PCI_DEVICE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C576_1) },
+   { PCI_DEVICE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1) },
+   { PCI_DEVICE(PCI_VENDOR_ID_VIA, 0x4149) },
+   { },/* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, via_pci_tbl);
 


Re: [PATCH] Possible VIA-Rhine free irq issue

2005-03-01 Thread Jeff Garzik
Panagiotis Issaris wrote:
Hi,
It seems to me that in the VIA Rhine device driver the requested irq might
not be freed in case the alloc_ring() function fails. alloc_ring()
can fail with a ENOMEM return value because of possible
pci_alloc_consistent() failures.
This patch applies to 2.6.11-rc5-bk2.
diff -uprN linux-2.6.11-rc5-bk2/drivers/net/via-rhine.c linux-2.6.11-rc5-bk2-pi/drivers/net/via-rhine.c
--- linux-2.6.11-rc5-bk2/drivers/net/via-rhine.c	2005-02-28 13:44:37.0 +0100
+++ linux-2.6.11-rc5-bk2-pi/drivers/net/via-rhine.c	2005-02-28 13:44:31.0 +0100
@@ -1198,7 +1198,10 @@ static int rhine_open(struct net_device 
 
 	rc = alloc_ring(dev);
 	if (rc)
+	{
+		free_irq(rp->pdev->irq, dev);
 		return rc;
+	}
Yes, this is a needed fix.  Thanks,
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Possible AMD8111e free irq issue

2005-03-01 Thread Jeff Garzik
Panagiotis Issaris wrote:
Hi,
It seems to me that if in the amd8111e_open() fuction dev->irq isn't
zero and the irq request succeeds it might not get released anymore.
Specifically, on failure of the amd8111e_restart() call the function
returns -ENOMEM without releasing the irq. The amd8111e_restart()
function can fail because of various pci_alloc_consistent() and
dev_alloc_skb() calls in amd8111e_init_ring() which is being
called by amd8111e_restart.
1374 if(dev->irq ==0 || request_irq(dev->irq, amd8111e_interrupt, SA_SHIRQ,
1375  dev->name, dev))
1376 return -EAGAIN;
	
The patch applies to 2.6.11-rc5-bk2. 

If I'm right about the above, I'm not I'm not sure if the free_irq() should
happen before or after releasing the spinlock.
With friendly regards,
Takis
diff -uprN linux-2.6.11-rc5-bk2/drivers/net/amd8111e.c linux-2.6.11-rc5-bk2-pi/drivers/net/amd8111e.c
--- linux-2.6.11-rc5-bk2/drivers/net/amd8111e.c	2005-02-28 13:44:46.0 +0100
+++ linux-2.6.11-rc5-bk2-pi/drivers/net/amd8111e.c	2005-02-28 13:45:09.0 +0100
@@ -1381,6 +1381,8 @@ static int amd8111e_open(struct net_devi
 
 	if(amd8111e_restart(dev)){
 		spin_unlock_irq(>lock);
+		if (dev->irq)
+			free_irq(dev->irq, dev);
 		return -ENOMEM;
Yes, this is a needed fix.  Thanks.
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


some /proc understandings

2005-03-01 Thread linux lover
Hello,
 1) I want to know how much can i write to
/proc entry file?? Is there any limitation on file
size???
 2)Also how can i call /proc entry files
proc_read_myfile function on that file by another
kernel module call? What parameters i require to pass
and how? Say i have read functions as 

struct myfile_data_t {
  char value[8];
};
struct proc_dir_entry *myfile_file;
struct myfile_data_t myfile_data;

int proc_read_myfile(char *page, char **start, off_t
off, int count, int *eof, void *data)
{
  int len;

/* cast the void pointer of data to myfile_data_t*/
  struct myfile_data_t *myfile_data=(struct
myfile_data_t *)data;

/* use sprintf to fill the page array with a string */
  len = sprintf(page, "%s", myfile_data->value);
return len;
}

Then can it possible that i can call proc_read_myfile
from another kernel module?? Instead read file from
user level call?
   3) Also Is following code valid of creating /proc
files with different file name created by passing
function cr_proc(fname)?

struct proc_dir_entry *entnew;
int cr_proc(char *fname)
{
if ((entnew1 = create_proc_entry(fname,
S_IRUGO | S_IWUSR, NULL)) == NULL)
return -EACCES;
   entnew1->proc_fops = _file_operations;
}
static struct file_operations proc_file_operations = {
open:   proc_open,
release:proc_release,
read:   proc_read,
write:  proc_write,
};

  What will happen if dynamic file names are going
to use same all above 4 functions???
regards,
linux_lover

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Hidetoshi Seto
Linas Vepstas wrote:
>> I'd prefer to see it as ioerr_clear(), ioerr_read() ...
>
> I'd prefer pci_io_start() and pci_io_check_err()
>
> The names should have "pci" in them.
>
> I don't like "ioerr_clear" because it implies we are clearing the io error; we are not; we are clearing the checker 
for io errors.

My intention was "clear/read checker(called iochk) to check my I/O."
(bitmask would be better for error flag, but bits are not defined yet.)
So I agree that ioerr_clear/read() would be one of good alternatives.
But still I'd prefer iochk_*, because it doesn't clear error but checker.
iochecker_* would be bit long.
And then, I don't think it need to have "pci" ... limitation of this
API's target. It would not be match if there are a recoverable device
over some PCI to XXX bridge, or if there are some special arch where
don't have PCI but other recoverable bus system, or if future bus system
doesn't called pci...
Currently we would deal only pci, but in future possibly not.
> Do we really need a cookie?
Some do, some not.
For example, if arch has only a counter of error exception, saving value
of the counter to the cookie would be make sense.
> Yes, they should be no-ops. save/restore interrupts would be a bad idea.
I expect that we should not do any operation requires enabled interrupt
between iochk_clear and iochk_read. If their defaults are no-ops, device
maintainers who develops their driver on not-implemented arch should be
more careful. Or are there any bad thing other than waste of steps?
Thanks,
H.Seto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch ide-dev 8/9] make ide_task_ioctl() use REQ_DRIVE_TASKFILE

2005-03-01 Thread Jeff Garzik
Bartlomiej Zolnierkiewicz wrote:
Yes but it seems that you've assumed that ioctl == flagged taskfile
and fs/internal == normal taskfile which is _not_ what I aim for.
I want fully-flagged taskfile handling like flagged_taskfile() and "hot path"
simpler taskfile handling like do_rw_taskfile() (at least for now - we can
remove "hot path" later) where both can be used for fs/internal/ioctl requests
(depending on the flags).

There is no effective difference in performance between
writeb()
writeb()
writeb()
writeb()
and
if (bit 1)
writeb()
if (bit 2)
writeb()
if (bit 3)
writeb()
if (bit 4)
writeb()
The cost of a repeated bit test on the same unsigned long is _zero_. 
It's already in L1 cache.  The I/Os are slow, and adding bit tests will 
not measurably decrease performance.  (this is the reason why I do not 
object to using ioread32() and iowrite32()...  it just adds a simple test)

Plus, it is better to have a single path for all taskfiles, to ensure 
that the path is well-tested.

libata's ->tf_load() and ->tf_read() hooks should be updated to use the 
more fine-grained flags that Tejun is proposing.

Note that on SATA, this is largely irrelevant.  The functions 
ata_tf_read() and ata_tf_load() should be updated for flagged taskfiles, 
because these will be used with PATA drivers.

The hooks implemented in individual SATA drivers will not be updated. 
The reason is that SATA transmits an entire copy of the taskfile to/from 
the device all at once, in the form of a Frame Information Structure 
(FIS) -- essentially a SATA packet.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc3 01/11] ide: task_end_request() fix

2005-03-01 Thread Jeff Garzik
Bartlomiej Zolnierkiewicz wrote:
If somebody implements SG_IO ioctl and SCSI command pass-through
from libata for IDE driver (and add possibility for discrete taskfiles), we can
just deprecate HDIO_DRIVE_TASKFILE, forget about it and some time later
remove this FPOS.
Can you explain what you mean by "add possibility for discrete taskfiles"?
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Undefined symbols in 2.6.11-rc5-mm1

2005-03-01 Thread Keenan Pepper
Hi everybody, I just joined the LKML!
Don't worry, this is not just a test message, I do actually have 
something to say. I just compiled 2.6.11-rc5-mm1 and got undefined 
symbols "match_int", "match_octal", "match_token", and "match_strdup" in 
several modules. This is using binutils 2.15 and gcc 3.4.4 from Debian.
I grepped around and found those functions in lib/parser.c, so I just 
looked at the output of "make V=1" and invoked "ld" manually, adding in 
lib/lib.a, and the modules work fine now. However, I don't know enough 
about the kernel build process to make a patch to fix this, so I'm just 
notifying people of the problem.

BTW, I just got a new hard disk and put Reiser4 on it. It works great! 
Keep up the good work guys!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


O_DIRECT on 2.4 ext3

2005-03-01 Thread Junfeng Yang

Hi,

I tried to read from a regular ext3 file opened as O_DIRECT, but got the
"Invalid argument" error.  Running the same test program on a block device
succeeded.

uname -a shows
Linux *** 2.4.27-2-686-smp #1 SMP Thu Jan 20 11:02:39 JST 2005 i686
GNU/Linux

My test case is
#include 
#include 
#include 
#include 
#include 

#define BLK (4096U)
main()
{
char buf[BLK * 2];
char *p = (char*)unsigned)buf) + (BLK-1)) & ~(BLK-1));
int fd, l;

fprintf(stderr, "buf = %p, p = %p\n", buf, p);
if((fd=open("sbd0", O_RDONLY|O_DIRECT)) < 0) {
perror("open");
assert(0);
}
if((l=pread(fd, p, BLK, 0)) < 0) {
perror("pread");
assert(0);
}
fprintf(stderr, "pread returns %d\n", l);
close (fd);
}

Does anyone know what's going on?

Thanks,
-Junfeng

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: fix printk warning (sparc64)

2005-03-01 Thread Randy.Dunlap

aoeblk: mac_addr() returns u64, coerce to unsigned long long to printk it:
(sparc64 build warning)

drivers/block/aoe/aoeblk.c:245: warning: long long unsigned int format, u64 arg 
(arg 2)
drivers/block/aoe/aoeblk.c:31: warning: long long unsigned int format, u64 arg 
(arg 4)

cross-compile results:
https://www.osdl.org/plm-cgi/plm?module=patch_info_id=4239

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

diffstat:=
 drivers/block/aoe/aoeblk.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff -Naurp ./drivers/block/aoe/aoeblk.c~aoe_printk ./drivers/block/aoe/aoeblk.c
--- ./drivers/block/aoe/aoeblk.c~aoe_printk 2005-02-25 10:54:42.0 
-0800
+++ ./drivers/block/aoe/aoeblk.c2005-03-01 17:22:29.735503376 -0800
@@ -28,7 +28,8 @@ static ssize_t aoedisk_show_mac(struct g
 {
struct aoedev *d = disk->private_data;
 
-   return snprintf(page, PAGE_SIZE, "%012llx\n", mac_addr(d->addr));
+   return snprintf(page, PAGE_SIZE, "%012llx\n",
+   (unsigned long long)mac_addr(d->addr));
 }
 static ssize_t aoedisk_show_netif(struct gendisk * disk, char *page)
 {
@@ -241,7 +242,8 @@ aoeblk_gdalloc(void *vp)
aoedisk_add_sysfs(d);

printk(KERN_INFO "aoe: %012llx e%lu.%lu v%04x has %llu "
-   "sectors\n", mac_addr(d->addr), d->aoemajor, d->aoeminor,
+   "sectors\n", (unsigned long long)mac_addr(d->addr),
+   d->aoemajor, d->aoeminor,
d->fw_ver, (long long)d->ssize);
 }
 

---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: question about sockfd_lookup( )

2005-03-01 Thread MingJie Chang
I can't use sockfd_put(sock)  directly.
I trace its code, the code is

extern __inline__ void sockfd_put(struct socket *sock)
{
fput(sock->file);
}

so I use fput(sock->file)

but it has problems too

1) execute "ls" in the ftp is also block
2) kernel prints "socki_lookup: socket file changed!"
3) execute "ftp localhost" after rmmod, it will crash

and why the sockfd_put is needed after sockfd_lookup

Thanak again

MingChieh Chang
Taiwan
===
- Hide quoted text -

On Tue, 01 Mar 2005 08:56:19 +0100, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> Hi
>
> Try adding sockfd_put(sock) ;
>
> MingJie Chang wrote:
> > Dear all,
> >
> > I want to get socket information by the sockfd while accetping,
> >
> > so I write a module to test sockfd_lookup(),
> >
> > but I got some problems when I test it.
> >
> > I hope someone can help me...
> >
> > Thank you
> >
> > following text is my code and error message
> > ===
> > === code ===
> >
> > int my_socketcall(int call,unsigned long *args)
> > {
> >int ret,err;
> >struct socket * sock;
> >
> >ret = run_org_socket_call(call,args);   //orignal sys_sockcall()
> >
> >if(call==SYS_ACCEPT&>=0)
> >{
> >   sock=sockfd_lookup(ret,);
> >   printk("lookup done\n");
>
> if (sock) sockfd_put(sock) ;
>
> >}
> >return ret;
> > }
>
> Eric Dumazet
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cciss CSMI via sysfs for 2.6

2005-03-01 Thread Christoph Hellwig
On Fri, Feb 18, 2005 at 12:05:52PM -0800, Greg KH wrote:
> On Fri, Feb 18, 2005 at 07:46:28PM +, Christoph Hellwig wrote:
> > >  /*
> > > + * sysfs stuff
> > > + * this should be moved to it's own file, maybe cciss_sysfs.h
> > > + */
> > > +
> > > +static ssize_t cciss_firmver_show(struct device *dev, char *buf)
> > > +{
> > > + ctlr_info_t *h = dev->driver_data;
> > > +return sprintf(buf,"%c%c%c%c\n", h->firm_ver[0], h->firm_ver[1],
> > > +h->firm_ver[2], h->firm_ver[3]);
> > > +}
> > 
> > I really wish we had a common firmver release attribut in the driver
> > core, as mentioned in the fc transport class thread.  Greg?
> 
> For a device?  It seems a huge overkill to add this attribute for
> _every_ device in the system, when only a small minority can actually
> use it.  Just put it as a default scsi or transport class attribute
> instead.

it's not related to scsi or a transport at all.  I'd rather have the
notation of optional generic attributes so that every driver that
wantsa to publish it does so in the same way.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] put newly registered shrinkers at the tail of the list

2005-03-01 Thread Christoph Hellwig
This way we actually share dentries before inodes and thus mark more
inodes reclaimable once we shake them.


--- 1.240/mm/vmscan.c   2005-02-04 01:53:32 +01:00
+++ edited/mm/vmscan.c  2005-03-02 07:09:00 +01:00
@@ -137,7 +137,7 @@ struct shrinker *set_shrinker(int seeks,
shrinker->seeks = seeks;
shrinker->nr = 0;
down_write(_rwsem);
-   list_add(>list, _list);
+   list_add_tail(>list, _list);
up_write(_rwsem);
}
return shrinker;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Complicated networking problem

2005-03-01 Thread Daniel Gryniewicz
On Wed, 2005-03-02 at 13:27 +1000, Jarne Cook wrote:
>On Tuesday 01 March 2005 12:35, you wrote:
>> On Monday 28 February 2005 21:02, [EMAIL PROTECTED] wrote:
>> > On Mon, 28 Feb 2005 14:59:31 +1000, Jarne Cook said:
>> > > They are both using dhcp to the same simple network.  That's right. 
>> > > Same network.  They both end up with gateway=192.168.0.1,
>> > > netmask=255.255.255.0. But ofcourse they do not have the same IP
>> > > addresses.
>> >
>> > I don't suppose your network people would be willing to change it thusly:
>> >
>> > wired ports:  gateway 192.168.0.1, netmask 255.255.255.128.0
>> > wireless: gateway 192.168.128.1, netmask 255.255.255.128.0
>> >
>> > Or move the wireless up to 192.168.1.1 if they think that would confuse
>> > things too much.
>> >
>> > There's a limit to how far we should bend over backwards to support
>> > stupid networking decisions. 192.168 *is* a /16, might as well use it. ;)
>> >
>> > If they won't, you're pretty much stuck with binding applications to one
>> > interface or another.
>>
>> If the goal is to primarily use wired link and seamlessly swith to wireless
>> then look into bonding driver in failover mode with wired interface as
>> primary. This way you have only one address and userspace does not notice
>> anything.
>
>Damn
>
>Having to configure the interfaces using bonding was not really the answer I 
>was expecting.
>
>I did not think linux would be that rigid.  I figured if poodoze is able to do 
>it (seamlessly mind you), surely linux (with some tinkering) would be able to 
>do it also.
>
>The goal was to have the networking on the laptop work as perfectly as 
>crapdoze does.  
>
>Perhaps I should and this topic to my list of software issues that no-one else 
>cares about. "man that list is getting big".  maybe one day I'll develop the 
>balls to get deep into the code.
>
>

Check out NetworkManager.  It will do what you want.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-03-01 Thread Paul Jackson
Just a thought - perhaps you could see if Jay can test the performance
scaling of these changes on larger systems (8 to 64 CPUs, give or take,
small for SGI, but big for some vendors.)

Things like a global lock, for example, might be harmless on smaller
systems, but hurt big time on bigger systems.  I don't know if you have
any such constructs ... perhaps this doesn't matter.

At the very least, we need to know that performance and scaling are not
significantly impacted, on systems not using accounting, either because
it is obvious from the code, or because someone has tested it.

And if performance or scaling was impacted when accounting was enabled,
then at least we would want to know how much performance was impacted,
so that users would know what to expect when they use accounting.

> the process-creation/destruction performance on following three environment.

I think this is a good choice of what to measure, and where.  Thank-you.

> kernel was also locked up after 366th-fork() 

I have no idea what this is -- good luck finding it.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Page fault scalability patch V18: Drop first acquisition of ptl

2005-03-01 Thread Christoph Lameter
The page fault handler attempts to use the page_table_lock only for short
time periods. It repeatedly drops and reacquires the lock. When the lock
is reacquired, checks are made if the underlying pte has changed before
replacing the pte value. These locations are a good fit for the use of
ptep_cmpxchg.

The following patch allows to remove the first time the page_table_lock is
acquired and uses atomic operations on the page table instead. A section
using atomic pte operations is begun with

page_table_atomic_start(struct mm_struct *)

and ends with

page_table_atomic_stop(struct mm_struct *)

Both of these become spin_lock(page_table_lock) and
spin_unlock(page_table_lock) if atomic page table operations are not
configured (CONFIG_ATOMIC_TABLE_OPS undefined).

The atomic operations with pte_xchg and pte_cmpxchg only work for the lowest
layer of the page table. Higher layers may also be populated in an atomic
way by defining pmd_test_and_populate() etc. The generic versions of these
functions fall back to the page_table_lock (populating higher level page
table entries is rare and therefore this is not likely to be performance
critical). For ia64 the definition of higher level atomic operations is
included.

This patch depends on the pte_cmpxchg patch to be applied first and will
only remove the first use of the page_table_lock in the page fault handler.
This will allow the following page table operations without acquiring
the page_table_lock:

1. Updating of access bits (handle_mm_faults)
2. Anonymous read faults (do_anonymous_page)

The page_table_lock is still acquired for creating a new pte for an anonymous
write fault and therefore the problems with rss that were addressed by splitting
rss into the task structure do not yet occur.

The patch also adds some diagnostic features by counting the number of cmpxchg
failures (useful for verification if this patch works right) and the number of 
patches
received that led to no change in the page table. Statistics may be viewed via
/proc/meminfo

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.10/mm/memory.c
===
--- linux-2.6.10.orig/mm/memory.c   2005-02-24 19:42:17.0 -0800
+++ linux-2.6.10/mm/memory.c2005-02-24 19:42:21.0 -0800
@@ -36,6 +36,8 @@
  * ([EMAIL PROTECTED])
  *
  * Aug/Sep 2004 Changed to four level page tables (Andi Kleen)
+ * Jan 2005Scalability improvement by reducing the use and the length of 
time
+ * the page table lock is held (Christoph Lameter)
  */

 #include 
@@ -1275,8 +1277,8 @@ static inline void break_cow(struct vm_a
  * change only once the write actually happens. This avoids a few races,
  * and potentially makes it more efficient.
  *
- * We hold the mm semaphore and the page_table_lock on entry and exit
- * with the page_table_lock released.
+ * We hold the mm semaphore and have started atomic pte operations,
+ * exit with pte ops completed.
  */
 static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
unsigned long address, pte_t *page_table, pmd_t *pmd, pte_t pte)
@@ -1294,7 +1296,7 @@ static int do_wp_page(struct mm_struct *
pte_unmap(page_table);
printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n",
address);
-   spin_unlock(>page_table_lock);
+   page_table_atomic_stop(mm);
return VM_FAULT_OOM;
}
old_page = pfn_to_page(pfn);
@@ -1306,22 +1308,25 @@ static int do_wp_page(struct mm_struct *
flush_cache_page(vma, address);
entry = maybe_mkwrite(pte_mkyoung(pte_mkdirty(pte)),
  vma);
-   ptep_set_access_flags(vma, address, page_table, entry, 
1);
-   update_mmu_cache(vma, address, entry);
+   /*
+* If the bits are not updated then another fault
+* will be generated with another chance of updating.
+*/
+   if (ptep_cmpxchg(page_table, pte, entry))
+   update_mmu_cache(vma, address, entry);
+   else
+   inc_page_state(cmpxchg_fail_flag_reuse);
pte_unmap(page_table);
-   spin_unlock(>page_table_lock);
+   page_table_atomic_stop(mm);
return VM_FAULT_MINOR;
}
}
pte_unmap(page_table);
+   page_table_atomic_stop(mm);

/*
 * Ok, we need to copy. Oh, well..
 */
-   if (!PageReserved(old_page))
-   page_cache_get(old_page);
-   spin_unlock(>page_table_lock);
-
if (unlikely(anon_vma_prepare(vma)))
goto no_new_page;
if (old_page 

Page fault scalability patch V18: No page table lock in do_anonymous_page

2005-03-01 Thread Christoph Lameter
Do not use the page_table_lock in do_anonymous_page. This will significantly
increase the parallelism in the page fault handler in SMP systems. The patch
also modifies the definitions of _mm_counter functions so that rss and anon_rss
become atomic.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.10/mm/memory.c
===
--- linux-2.6.10.orig/mm/memory.c   2005-02-24 19:42:21.0 -0800
+++ linux-2.6.10/mm/memory.c2005-02-24 19:42:25.0 -0800
@@ -1832,12 +1832,12 @@ do_anonymous_page(struct mm_struct *mm,
 vma->vm_page_prot)),
  vma);

-   spin_lock(>page_table_lock);
+   page_table_atomic_start(mm);

if (!ptep_cmpxchg(page_table, orig_entry, entry)) {
pte_unmap(page_table);
page_cache_release(page);
-   spin_unlock(>page_table_lock);
+   page_table_atomic_stop(mm);
inc_page_state(cmpxchg_fail_anon_write);
return VM_FAULT_MINOR;
}
@@ -1855,7 +1855,7 @@ do_anonymous_page(struct mm_struct *mm,

update_mmu_cache(vma, addr, entry);
pte_unmap(page_table);
-   spin_unlock(>page_table_lock);
+   page_table_atomic_stop(mm);

return VM_FAULT_MINOR;
 }
Index: linux-2.6.10/include/linux/sched.h
===
--- linux-2.6.10.orig/include/linux/sched.h 2005-02-24 19:42:17.0 
-0800
+++ linux-2.6.10/include/linux/sched.h  2005-02-24 19:42:25.0 -0800
@@ -203,10 +203,26 @@ arch_get_unmapped_area_topdown(struct fi
 extern void arch_unmap_area(struct vm_area_struct *area);
 extern void arch_unmap_area_topdown(struct vm_area_struct *area);

+#ifdef CONFIG_ATOMIC_TABLE_OPS
+/*
+ * Atomic page table operations require that the counters are also
+ * incremented atomically
+*/
+#define set_mm_counter(mm, member, value) atomic_set(&(mm)->member, value)
+#define get_mm_counter(mm, member) ((unsigned long)atomic_read(&(mm)->member))
+#define update_mm_counter(mm, member, value) atomic_add(value, &(mm)->member)
+#define MM_COUNTER_T atomic_t
+
+#else
+/*
+ * No atomic page table operations. Counters are protected by
+ * the page table lock
+ */
 #define set_mm_counter(mm, member, value) (mm)->member = (value)
 #define get_mm_counter(mm, member) ((mm)->member)
 #define update_mm_counter(mm, member, value) (mm)->member += (value)
 #define MM_COUNTER_T unsigned long
+#endif

 struct mm_struct {
struct vm_area_struct * mmap;   /* list of VMAs */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Page fault scalability patch V18: abstract rss counter ops

2005-03-01 Thread Christoph Lameter
This patch extracts all the operations on rss into definitions in
include/linux/sched.h. All rss operations are performed through
the following three macros:

get_mm_counter(mm, member)  -> Obtain the value of a counter
set_mm_counter(mm, member, value)   -> Set the value of a counter
update_mm_counter(mm, member, value)-> Add a value to a counter

The simple definitions provided in this patch result in no change to
to the generated code.

With this patch it becomes easier to add new counters and it is possible
to redefine the method of counter handling (f.e. the page fault scalability
patches may want to use atomic operations or split rss).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.10/include/linux/sched.h
===
--- linux-2.6.10.orig/include/linux/sched.h 2005-02-24 19:41:49.0 
-0800
+++ linux-2.6.10/include/linux/sched.h  2005-02-24 19:42:17.0 -0800
@@ -203,6 +203,10 @@ arch_get_unmapped_area_topdown(struct fi
 extern void arch_unmap_area(struct vm_area_struct *area);
 extern void arch_unmap_area_topdown(struct vm_area_struct *area);

+#define set_mm_counter(mm, member, value) (mm)->member = (value)
+#define get_mm_counter(mm, member) ((mm)->member)
+#define update_mm_counter(mm, member, value) (mm)->member += (value)
+#define MM_COUNTER_T unsigned long

 struct mm_struct {
struct vm_area_struct * mmap;   /* list of VMAs */
@@ -219,7 +223,7 @@ struct mm_struct {
atomic_t mm_count;  /* How many references to 
"struct mm_struct" (users count as 1) */
int map_count;  /* number of VMAs */
struct rw_semaphore mmap_sem;
-   spinlock_t page_table_lock; /* Protects page tables, 
mm->rss, mm->anon_rss */
+   spinlock_t page_table_lock; /* Protects page tables and 
some counters */

struct list_head mmlist;/* List of maybe swapped mm's.  
These are globally strung
 * together off init_mm.mmlist, 
and are protected
@@ -229,9 +233,13 @@ struct mm_struct {
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
-   unsigned long rss, anon_rss, total_vm, locked_vm, shared_vm;
+   unsigned long total_vm, locked_vm, shared_vm;
unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;

+   /* Special counters protected by the page_table_lock */
+   MM_COUNTER_T rss;
+   MM_COUNTER_T anon_rss;
+
unsigned long saved_auxv[42]; /* for /proc/PID/auxv */

unsigned dumpable:1;
Index: linux-2.6.10/mm/memory.c
===
--- linux-2.6.10.orig/mm/memory.c   2005-02-24 19:42:12.0 -0800
+++ linux-2.6.10/mm/memory.c2005-02-24 19:42:17.0 -0800
@@ -313,9 +313,9 @@ copy_one_pte(struct mm_struct *dst_mm,
pte = pte_mkclean(pte);
pte = pte_mkold(pte);
get_page(page);
-   dst_mm->rss++;
+   update_mm_counter(dst_mm, rss, 1);
if (PageAnon(page))
-   dst_mm->anon_rss++;
+   update_mm_counter(dst_mm, anon_rss, 1);
set_pte(dst_pte, pte);
page_dup_rmap(page);
 }
@@ -517,7 +517,7 @@ static void zap_pte_range(struct mmu_gat
if (pte_dirty(pte))
set_page_dirty(page);
if (PageAnon(page))
-   tlb->mm->anon_rss--;
+   update_mm_counter(tlb->mm, anon_rss, -1);
else if (pte_young(pte))
mark_page_accessed(page);
tlb->freed++;
@@ -1340,13 +1340,14 @@ static int do_wp_page(struct mm_struct *
spin_lock(>page_table_lock);
page_table = pte_offset_map(pmd, address);
if (likely(pte_same(*page_table, pte))) {
-   if (PageAnon(old_page))
-   mm->anon_rss--;
+   if (PageAnon(old_page))
+   update_mm_counter(mm, anon_rss, -1);
if (PageReserved(old_page)) {
-   ++mm->rss;
+   update_mm_counter(mm, rss, 1);
acct_update_integrals();
update_mem_hiwater();
} else
+
page_remove_rmap(old_page);
break_cow(vma, new_page, address, page_table);
lru_cache_add_active(new_page);
@@ -1750,7 +1751,7 @@ static int do_swap_page(struct mm_struct
if (vm_swap_full())
remove_exclusive_swap_page(page);

-   mm->rss++;
+   update_mm_counter(mm, rss, 1);
acct_update_integrals();
update_mem_hiwater();

@@ -1817,7 +1818,7 

Re: Page fault scalability patch V18: atomic pte ops, pte_cmpxchg and pte_xchg

2005-03-01 Thread Christoph Lameter
The current way of updating ptes in the Linux vm includes first clearing
a pte before setting it to another value. The clearing is performed while
holding the page_table_lock to insure that the entry will not be modified
by the CPU directly (clearing the pte clears the present bit),
by an arch specific interrupt handler or another page fault handler
running on another CPU. This approach is necessary for some
architectures that cannot perform atomic updates of page table entries.

If a page table entry is cleared then a second CPU may generate a page fault
for that entry. The fault handler on the second CPU will then attempt to
acquire the page_table_lock and wait until the first CPU has completed
updating the page table entry. The fault handler on the second CPU will then
discover that everything is ok and simply do nothing (apart from incrementing
the counters for a minor fault and marking the page again as accessed).

However, most architectures actually support atomic operations on page
table entries. The use of atomic operations on page table entries would
allow the update of a page table entry in a single atomic operation instead
of writing to the page table entry twice. There would also be no danger of
generating a spurious page fault on other CPUs.

The following patch introduces two new atomic operations ptep_xchg and
ptep_cmpxchg that may be provided by an architecture. The fallback in
include/asm-generic/pgtable.h is to simulate both operations through the
existing ptep_get_and_clear function. So there is essentially no change if
atomic operations on ptes have not been defined. Architectures that do
not support atomic operations on ptes may continue to use the clearing of
a pte for locking type purposes.

Atomic operations may be enabled in the kernel configuration on
i386, ia64 and x86_64 if a suitable CPU is configured in SMP mode.
Generic atomic definitions for ptep_xchg and ptep_cmpxchg
have been provided based on the existing xchg() and cmpxchg() functions
that already work atomically on many platforms. It is very
easy to implement this for any architecture by adding the appropriate
definitions to arch/xx/Kconfig.

The provided generic atomic functions may be overridden as usual by defining
the appropriate__HAVE_ARCH_xxx constant and providing an implementation.

My aim to reduce the use of the page_table_lock in the page fault handler
rely on a pte never being clear if the pte is in use even when the
page_table_lock is not held. Clearing a pte before setting it to another
values could result in a situation in which a fault generated by
another cpu could install a pte which is then immediately overwritten by
the first CPU setting the pte to a valid value again. This patch is
important for future work on reducing the use of spinlocks in the vm.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.10/mm/rmap.c
===
--- linux-2.6.10.orig/mm/rmap.c 2005-02-24 19:41:50.0 -0800
+++ linux-2.6.10/mm/rmap.c  2005-02-24 19:42:12.0 -0800
@@ -575,11 +575,6 @@ static int try_to_unmap_one(struct page

/* Nuke the page table entry. */
flush_cache_page(vma, address);
-   pteval = ptep_clear_flush(vma, address, pte);
-
-   /* Move the dirty bit to the physical page now the pte is gone. */
-   if (pte_dirty(pteval))
-   set_page_dirty(page);

if (PageAnon(page)) {
swp_entry_t entry = { .val = page->private };
@@ -594,11 +589,15 @@ static int try_to_unmap_one(struct page
list_add(>mmlist, _mm.mmlist);
spin_unlock(_lock);
}
-   set_pte(pte, swp_entry_to_pte(entry));
+   pteval = ptep_xchg_flush(vma, address, pte, 
swp_entry_to_pte(entry));
BUG_ON(pte_file(*pte));
mm->anon_rss--;
-   }
+   } else
+   pteval = ptep_clear_flush(vma, address, pte);

+   /* Move the dirty bit to the physical page now that the pte is gone. */
+   if (pte_dirty(pteval))
+   set_page_dirty(page);
mm->rss--;
acct_update_integrals();
page_remove_rmap(page);
@@ -691,15 +690,15 @@ static void try_to_unmap_cluster(unsigne
if (ptep_clear_flush_young(vma, address, pte))
continue;

-   /* Nuke the page table entry. */
flush_cache_page(vma, address);
-   pteval = ptep_clear_flush(vma, address, pte);

/* If nonlinear, store the file page offset in the pte. */
if (page->index != linear_page_index(vma, address))
-   set_pte(pte, pgoff_to_pte(page->index));
+   pteval = ptep_xchg_flush(vma, address, pte, 
pgoff_to_pte(page->index));
+   else
+   pteval = ptep_clear_flush(vma, address, pte);

-   /* Move the 

Page fault scalability patch V18: Overview

2005-03-01 Thread Christoph Lameter
Is there any chance that this patchset could go into mm now? This has been
discussed since last August

Changelog:

V17->V18 Rediff against 2.6.11-rc5-bk4
V16->V17 Do not increment page_count in do_wp_page. Performance data
posted.
V15->V16 of this patch: Redesign to allow full backback
for architectures that do not supporting atomic operations.

An introduction to what this patch does and a patch archive can be found on
http://oss.sgi.com/projects/page_fault_performance. The archive also has the
result of various performance tests (LMBench, Microbenchmark and
kernel compiles).

The basic approach in this patchset is the same as used in SGI's 2.4.X
based kernels which have been in production use in ProPack 3 for a long time.

The patchset is composed of 4 patches (and was tested against 2.6.11-rc5-bk4):

1/4: ptep_cmpxchg and ptep_xchg to avoid intermittent zeroing of ptes

The current way of synchronizing with the CPU or arch specific
interrupts updating page table entries is to first set a pte
to zero before writing a new value. This patch uses ptep_xchg
and ptep_cmpxchg to avoid writing the zero for certain
configurations.

The patch introduces CONFIG_ATOMIC_TABLE_OPS that may be
enabled as a experimental feature during kernel configuration
if the hardware is able to support atomic operations and if
an SMP kernel is being configured. A Kconfig update for i386,
x86_64 and ia64 has been provided. On i386 this options is
restricted to CPUs better than a 486 and non PAE mode (that
way all the cmpxchg issues on old i386 CPUS and the problems
with 64bit atomic operations on recent i386 CPUS are avoided).

If CONFIG_ATOMIC_TABLE_OPS is not set then ptep_xchg and
ptep_xcmpxchg are realized by falling back to clearing a pte
before updating it.

The patch does not change the use of mm->page_table_lock and
the only performance improvement is the replacement of
xchg-with-zero-and-then-write-new-pte-value with an xchg with
the new value for SMP on some architectures if
CONFIG_ATOMIC_TABLE_OPS is configured. It should not do anything
major to VM operations.

2/4: Macros for mm counter manipulation

There are various approaches to handling mm counters if the
page_table_lock is no longer acquired. This patch defines
macros in include/linux/sched.h to handle these counters and
makes sure that these macros are used throughout the kernel
to access and manipulate rss and anon_rss. There should be
no change to the generated code as a result of this patch.

3/4: Drop the first use of the page_table_lock in handle_mm_fault

The patch introduces two new functions:

page_table_atomic_start(mm), page_table_atomic_stop(mm)

that fall back to the use of the page_table_lock if
CONFIG_ATOMIC_TABLE_OPS is not defined.

If CONFIG_ATOMIC_TABLE_OPS is defined those functions may
be used to prep the CPU for atomic table ops (i386 in PAE mode
may f.e. get the MMX register ready for 64bit atomic ops) but
are simply empty by default.

Two operations may then be performed on the page table without
acquiring the page table lock:

a) updating access bits in pte
b) anonymous read faults installed a mapping to the zero page.

All counters are still protected with the page_table_lock thus
avoiding any issues there.

Some additional statistics are added to /proc/meminfo to
give some statistics. Also counts spurious faults with no
effect. There is a surprisingly high number of those on ia64
(used to populate the cpu caches with the pte??)

4/4: Drop the use of the page_table_lock in do_anonymous_page

The second acquisition of the page_table_lock is removed
from do_anonymous_page and allows the anonymous
write fault to be possible without the page_table_lock.

The macros for manipulating rss and anon_rss in include/linux/sched.h
are changed if CONFIG_ATOMIC_TABLE_OPS is set to use atomic
operations for rss and anon_rss (safest solution for now, other
solutions may easily be implemented by changing those macros).

This patch typically yield significant increases in page fault
performance for threaded applications on SMP systems.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Complicated networking problem

2005-03-01 Thread Kyle Moffett
On Mar 01, 2005, at 22:27, Jarne Cook wrote:
Damn
Having to configure the interfaces using bonding was not really the 
answer I
was expecting.

I did not think linux would be that rigid.  I figured if poodoze is 
able to do
it (seamlessly mind you), surely linux (with some tinkering) would be 
able to
do it also.

The goal was to have the networking on the laptop work as perfectly as
crapdoze does.
Perhaps I should and this topic to my list of software issues that 
no-one else
cares about. "man that list is getting big".  maybe one day I'll 
develop the
balls to get deep into the code.
Well, what exactly is the desired behavior for you?  If you have two 
network
interfaces to the same local network, the default config will pick a 
random
one (They're both equal-cost unless you tell it otherwise) and send 
ARPs and
everything else through that one interface.  If you take it down, it may
require a minute or so to update the rest of the network to the new 
hardware
address, but eventually they will figure it out.  I suppose if that is 
the
expected config, you could tell the box to send out a gratuitous ARP 
packet
when you reconfigure interfaces, but that's a userspace issue in any 
case.

As far as networking is concerned, a subnet is an atomic networking 
unit.
Everything on it is considered directly and equally attached to 
everything
else, unless informed otherwise via a switch protocol.  Any system that
doesn't follow that rule is broken.

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Complicated networking problem

2005-03-01 Thread Jarne Cook
On Tuesday 01 March 2005 12:35, you wrote:
> On Monday 28 February 2005 21:02, [EMAIL PROTECTED] wrote:
> > On Mon, 28 Feb 2005 14:59:31 +1000, Jarne Cook said:
> > > They are both using dhcp to the same simple network.  That's right. 
> > > Same network.  They both end up with gateway=192.168.0.1,
> > > netmask=255.255.255.0. But ofcourse they do not have the same IP
> > > addresses.
> >
> > I don't suppose your network people would be willing to change it thusly:
> >
> > wired ports:  gateway 192.168.0.1, netmask 255.255.255.128.0
> > wireless: gateway 192.168.128.1, netmask 255.255.255.128.0
> >
> > Or move the wireless up to 192.168.1.1 if they think that would confuse
> > things too much.
> >
> > There's a limit to how far we should bend over backwards to support
> > stupid networking decisions. 192.168 *is* a /16, might as well use it. ;)
> >
> > If they won't, you're pretty much stuck with binding applications to one
> > interface or another.
>
> If the goal is to primarily use wired link and seamlessly swith to wireless
> then look into bonding driver in failover mode with wired interface as
> primary. This way you have only one address and userspace does not notice
> anything.

Damn

Having to configure the interfaces using bonding was not really the answer I 
was expecting.

I did not think linux would be that rigid.  I figured if poodoze is able to do 
it (seamlessly mind you), surely linux (with some tinkering) would be able to 
do it also.

The goal was to have the networking on the laptop work as perfectly as 
crapdoze does.  

Perhaps I should and this topic to my list of software issues that no-one else 
cares about. "man that list is getting big".  maybe one day I'll develop the 
balls to get deep into the code.


-- 
Jarne Cook <[EMAIL PROTECTED]>
Siliconriver.com.au
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Network speed Linux-2.6.10

2005-03-01 Thread Paul Dickson
On Wed, 02 Mar 2005 01:02:50 +, Baruch Even wrote:

> > Might this be related to the broken BicTCP implementations in the 2.6.6+
> > kernels?  A fix was added around 2.6.11-rc3 or 4.
> 
> Unlikely, the problem with BIC would have shown itself only at high 
> speeds over long latency links, not over a lan connection.

I only mentioned the possibility because I saw the same profile given by
the PDF (the link was mentioned in the patch) while downloading gnoppix
via my cable modem.  The oscillations of speed varied from 40K to 500+K.
The average ended up around 270K.  (I was using wget for the download).

-Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Complicated networking problem

2005-03-01 Thread Jarne Cook
On Tuesday 01 March 2005 12:35, you wrote:
> On Monday 28 February 2005 21:02, [EMAIL PROTECTED] wrote:
> > On Mon, 28 Feb 2005 14:59:31 +1000, Jarne Cook said:
> > > They are both using dhcp to the same simple network.  That's right. 
> > > Same network.  They both end up with gateway=192.168.0.1,
> > > netmask=255.255.255.0. But ofcourse they do not have the same IP
> > > addresses.
> >
> > I don't suppose your network people would be willing to change it thusly:
> >
> > wired ports:  gateway 192.168.0.1, netmask 255.255.255.128.0
> > wireless: gateway 192.168.128.1, netmask 255.255.255.128.0
> >
> > Or move the wireless up to 192.168.1.1 if they think that would confuse
> > things too much.
> >
> > There's a limit to how far we should bend over backwards to support
> > stupid networking decisions. 192.168 *is* a /16, might as well use it. ;)
> >
> > If they won't, you're pretty much stuck with binding applications to one
> > interface or another.
>
> If the goal is to primarily use wired link and seamlessly swith to wireless
> then look into bonding driver in failover mode with wired interface as
> primary. This way you have only one address and userspace does not notice
> anything.

Damn

Having to configure the interfaces using bonding was not really the answer I 
was expecting.

I did not think linux would be that rigid.  I figured if poodoze is able to do 
it (seamlessly mind you), surely linux (with some tinkering) would be able to 
do it also.

The goal was to have the networking on the laptop work as perfectly as 
crapdoze does.  

Perhaps I should and this topic to my list of software issues that no-one else 
cares about. "man that list is getting big".  maybe one day I'll develop the 
balls to get deep into the code.

-- 
Jarne Cook <[EMAIL PROTECTED]>
Siliconriver.com.au
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Hidetoshi Seto
Jesse Barnes wrote:
This was my thought too last time we had this discussion.  A completely 
asynchronous call is probably needed in addition to Hidetoshi's proposed API, 
since as you point out, the driver may not be running when an error occurs 
(e.g. in the case of a DMA error or more general bus problem).  The async 
->error callback could do a total reset of the card, or something along those 
lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could 
potentially deal with errors as they happen (probably in the case of PIO 
related errors), when the additional context may allow us to be smarter about 
recovery.
Depend on the bridge implementation, special error handling of PCI-X would 
be
available in the case of a DMA error.
PCI-X Command register has Uncorrectable Data Error Recovery Enable bit to
avoid asserting SERR on error. Some bridge generates poisoned data and pass
it to destination instead of asserting error or passing broken data.
The device driver would be interrupted on the completion of DMA, and check
status register of controlling device to find a error during the DMA.
If there was a error, driver could attempt to recover from the error.
I don't know whether this is actually possible or not, and also there are
upcoming drivers implementing such special handling.
Though, when and how we should call drivers to do device specific staff is
one of the problem. My API would provide "a chance" which could be defined by
driver, at least.
Thanks,
H.Seto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] explicitly bind idle tasks

2005-03-01 Thread Zwane Mwaikambo
On Tue, 1 Mar 2005, Nathan Lynch wrote:

> On Sun, Feb 27, 2005 at 02:49:28PM -0800, Andrew Morton wrote:
> > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> > >
> > > > -   if (cpu_is_offline(smp_processor_id()) &&
> > >  > +  if (cpu_is_offline(_smp_processor_id()) &&
> > >  >system_state == SYSTEM_RUNNING)
> > >  >cpu_die();
> > >  >}
> > >  > _
> > > 
> > >  This is the idle loop. Is that ever supposed to be preempted ?
> > 
> > Nope, it's a false positive.  We had to do the same in x86's idle loop and
> > probably others will hit it.
> 
> Perhaps I'm missing something, but is there any reason we can't do
> the following?  I've tested it on ppc64, doesn't seem to break anything.
> 
> With hotplug cpu and preempt, we tend to see smp_processor_id warnings
> from idle loop code because it's always checking whether its cpu has
> gone offline.  Replacing every use of smp_processor_id with
> _smp_processor_id in all idle loop code is one solution; another way
> is explicitly binding idle threads to their cpus (the smp_processor_id
> warning does not fire if the caller is bound only to the calling cpu).
> This has the (admittedly slight) advantage of letting us know if an
> idle thread ever runs on the wrong cpu.

Makes sense to me, for some reason i thought the smp_processor_id() 
function did a cpu_rq->idle check of some sort.

Thanks,
Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [PATCH] Custom power states for non-ACPI systems

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 18:03 -0800, Todd Poynor wrote:
> Advertise custom sets of system power states for non-ACPI systems.
> Currently, /sys/power/state shows and accepts a static set of choices
> that are not necessarily meaningful on all platforms (for example,
> suspend-to-disk is an option even on diskless embedded systems, and the
> meaning of standby vs. suspend-to-mem is not well-defined on
> non-ACPI-systems).  This patch allows the platform to register power
> states with meaningful names that correspond to the platform's
> conventions (for example, "big sleep" and "deep sleep" on TI OMAP), and
> only those states that make sense for the platform.
> .../...

Note that I'd like to rework the whole notion of power states
ultimately. Devices themselves need custom state if we want anything
sane other than global system wide suspend.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Custom power states for non-ACPI systems

2005-03-01 Thread Todd Poynor
An example of custom power states for the TI OMAP family.
/sys/power/states supports a state named "deepsleep", which corresponds
to the platform state actually entered by the present-day system suspend
handler.  It no longer offers the option of "disk" suspend which would
not normally be available in an OMAP-based system, nor does it offer the
choices "standby" or "mem", which are currently somewhat arbitrarily
mapped to actual platform power states on OMAPs.  In the future the OMAP
could be extended to offer the choice of "big sleep" as well, another
platform-specific low-power mode that falls under the general category
of suspend-to-mem, once it is feasible to no longer use the same set of
system suspend state values for all platforms and drivers (as mentioned
in the base note).

Index: linux-2.6.10/arch/arm/mach-omap/pm.c
===
--- linux-2.6.10.orig/arch/arm/mach-omap/pm.c   2005-03-02 01:10:27.0 
+
+++ linux-2.6.10/arch/arm/mach-omap/pm.c2005-03-02 01:13:41.0 
+
@@ -576,8 +576,20 @@
 }
 
 
+static struct pm_suspend_method omap_pm_suspend_methods[] = {
+   {
+   .name = "deepsleep",
+   .state = PM_SUSPEND_MEM,
+   },
+   {
+   .name = NULL,
+   },
+};
+
+
 struct pm_ops omap_pm_ops ={
.pm_disk_mode = 0,
+   .pm_suspend_methods = omap_pm_suspend_methods,
 .prepare= omap_pm_prepare,
 .enter  = omap_pm_enter,
 .finish = omap_pm_finish,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[SATA] libata-dev queue updated

2005-03-01 Thread Jeff Garzik
A minor update, mostly to update to the latest kernel.
BK users:

bk pull bk://gkernel.bkbits.net/libata-dev-2.6

Patch:
http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.11-rc5-bk4-libata-dev1.patch.bz2

This will update the following files:

 drivers/scsi/Kconfig |   18 
 drivers/scsi/Makefile|2 
 drivers/scsi/ahci.c  |  107 -
 drivers/scsi/ata_adma.c  |  778 +++
 drivers/scsi/libata-core.c   |  312 +
 drivers/scsi/libata-scsi.c   |  701 --
 drivers/scsi/libata.h|6 
 drivers/scsi/pata_pdc2027x.c |  771 ++
 drivers/scsi/sata_promise.c  |   84 
 drivers/scsi/sata_qstor.c|   20 -
 drivers/scsi/sata_vsc.c  |4 
 include/linux/ata.h  |   15 
 include/linux/libata.h   |   10 
 include/scsi/scsi.h  |3 
 14 files changed, 2531 insertions(+), 300 deletions(-)

through these ChangeSets:

:
  o [libata scsi] support 12-byte passthru CDB
  o [libata scsi] passthru CDB check condition processing
  o T10/04-262 ATA pass thru - patch

:
  o [libata sata_promise] support PATA ports on SATA controllers

:
  o drivers/scsi/ahci: Use the DMA_{64,32}BIT_MASK constants
  o drivers/scsi/sata_vsc: Use the DMA_{64,32}BIT_MASK constants

Adam J. Richter:
  o ata_pci_remove_one used freed memory

Albert Lee:
  o [libata] use init-device-params ATA command where needed
  o [libata] ata_scsi_verify_xlat() fix
  o pdc2027x timing register fix for 100MHz
  o [libata] CHS support: add CHS support to ata_scsi_verify_xlat(), 
ata_scsi_rw_xlat() and ata_scsiop_read_cap().
  o [libata] CHS support: reorganize read/write translation in 
ata_scsi_rw_xlat()
  o [libata] CHS support: rename vars (s/sector/block/) in 
ata_scsi_verify_xlat()
  o [libata] CHS support: detect C/H/S at IDENTIFY DEVICE time
  o [libata] CHS support: add definitions to headers
  o pdc2027x timing register bug fix
  o [libata pdc2027x] fix incorrect pio and mwdma masks
  o [libata pdc2027x] remove quirks and ROM enable
  o [libata] add driver for Promise PATA 2027x

Brad Campbell:
  o libata basic detection and errata for PATA->SATA bridges

Jeff Garzik:
  o [libata ahci] Print out port id on error messages
  o [libata ahci] support PCI MSI interrupt vector
  o [libata adma] Add init code, fix CPB submission code
  o [libata ahci] finish ATAPI support
  o [libata adma] trivial whitespace cleanup
  o [libata dma] fix DMA mode config; add some more initialization code
  o [libata adma] add support for configuring PIO/DMA modes
  o [libata] turn on ATAPI support
  o [libata sata_promise] merge Tobias Lorenz' pdc20619 patch, part 2
  o [libata] small cleanups
  o [libata] remove unused execute-device-diagnostic reset method
  o [libata] add new driver ata_adma
  o [libata pdc2027x] update for upstream struct device conversion
  o [libata sata_promise] fix merge bugs
  o [libata] fix build breakage
  o [libata] fix SATA->PATA bridge detect compile breakage
  o [libata] fix printk warning

John W. Linville:
  o libata: update ATA pass thru opcodes
  o libata: minor style changes in ata_scsi_pass_thru
  o libata: filter SET_FEATURES - XFER MODE from ATA pass thru
  o libata: sync SMART ioctls with ATA pass thru spec (T10/04-262r7)
  o libata: fix command queue leak when xlat_func fails
  o libata: SMART support via ATA pass-thru

Mark Lord:
  o [libata qstor] minor update per LKML comments

Tobias Lorenz:
  o [libata sata_promise] pdc20619 (PATA) support
  o libata-scsi: get-identity ioctl support



Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Hidetoshi Seto
Matthew Wilcox wrote:
I think what Jeff meant was "this new API handles none of this".
And that's true, it doesn't handle DMA errors.  But I think that's just
something that hasn't been written/designed yet.
Yes, this API just supports drivers wanting to be more RAS-aware.
It would be happy if how implement it could be separate in two part:
 - arch-specific part
Capability would depend on arch, can only generic thing but couldn't
be device specific. Device/bus isolation could be(with help of hotplug
and so on), but re-enable them would not be easily.
 - generic part
Capability would depend on drivers, should be more device specific.
How divide and connect them is now in discussion and consideration.
So how should we handle it?  Obviously the driver may not be executing
when a PCI parity error occurs, so we probably get to find out about
this through some architecture-specific whole-system error, let's call
it an MCA.
The MCA handler has to go and figure out what the hell just happened
(was it a DIMM error, PCI bus error, etc).  OK, fine, it finds that it
was an error on PCI bus 73.  At this point, I think the architecture
error handler needs to call into the PCI subsystem and say "Hey, there
was an error, you deal with it".
If we're lucky, we get all the information that allows us to figure
out which device it was (eg a destination address that matches a BAR),
then we could have a ->error method in the pci_driver that handles it.
If there's no ->error method, at leat call ->remove so one device only
takes itself down.
Does this make sense?
Note that here is a difficulty: the MCA handler on some arch would run on
special context - MCA environment. In other words, since some MCA handler
would be called by non-maskable interrupt(e.g. NMI), so it's difficult to
call some driver's callback using protected kernel locks from MCA context.
Therefore what MCA handler could do is just indicates a error was there,
by something like status flag which drivers can refer. And after possible
deley, we would be able to call callbacks.
Thanks,
H.Seto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Custom power states for non-ACPI systems

2005-03-01 Thread Todd Poynor
Advertise custom sets of system power states for non-ACPI systems.
Currently, /sys/power/state shows and accepts a static set of choices
that are not necessarily meaningful on all platforms (for example,
suspend-to-disk is an option even on diskless embedded systems, and the
meaning of standby vs. suspend-to-mem is not well-defined on
non-ACPI-systems).  This patch allows the platform to register power
states with meaningful names that correspond to the platform's
conventions (for example, "big sleep" and "deep sleep" on TI OMAP), and
only those states that make sense for the platform.

For the time being, the canned set of PM_SUSPEND_STANDBY/MEM/DISK
etc. symbols are preserved, since knowledge of the meanings of those
values have crept into drivers.  There is a separate effort underway to
divorce driver suspend flags from the platform suspend state
identifiers.  Once that is accomplished, we can then replace the suspend
states available with an entirely custom set.  For example, various
embedded platforms have multiple power states that roughly correspond to
suspend-to-mem, and each could be advertised and requested via the PM
interfaces, once drivers no longer look for the one and only
PM_SUSPEND_MEM system suspend state.

If the platform does not register a custom set of power states then the
present-day set remains available as a default.  Will send separately a
patch for an embedded platform to show usage.  Comments appreciated.

Index: linux-2.6.10/include/linux/pm.h
===
--- linux-2.6.10.orig/include/linux/pm.h2005-03-02 00:41:43.0 
+
+++ linux-2.6.10/include/linux/pm.h 2005-03-02 01:12:14.0 +
@@ -216,8 +216,14 @@
 #definePM_DISK_REBOOT  ((__force suspend_disk_method_t) 4)
 #definePM_DISK_MAX ((__force suspend_disk_method_t) 5)
 
+struct pm_suspend_method {
+   char *name;
+   suspend_state_t state;
+};
+
 struct pm_ops {
suspend_disk_method_t pm_disk_mode;
+   struct pm_suspend_method *pm_suspend_methods;
int (*prepare)(suspend_state_t state);
int (*enter)(suspend_state_t state);
int (*finish)(suspend_state_t state);
Index: linux-2.6.10/kernel/power/main.c
===
--- linux-2.6.10.orig/kernel/power/main.c   2005-03-02 00:41:41.0 
+
+++ linux-2.6.10/kernel/power/main.c2005-03-02 01:15:21.0 +
@@ -228,11 +228,22 @@
 
 
 
-char * pm_states[] = {
-   [PM_SUSPEND_STANDBY]= "standby",
-   [PM_SUSPEND_MEM]= "mem",
-   [PM_SUSPEND_DISK]   = "disk",
-   NULL,
+struct pm_suspend_method pm_default_suspend_methods[] = {
+   {
+   .name = "standby",
+   .state = PM_SUSPEND_STANDBY,
+   },
+   {
+   .name = "mem",
+   .state = PM_SUSPEND_MEM,
+   },
+   {
+   .name = "disk",
+   .state = PM_SUSPEND_DISK,
+   },
+   {
+   .name = NULL,
+   },
 };
 
 
@@ -324,19 +335,22 @@
 {
int i;
char * s = buf;
+   struct pm_suspend_method *methods = pm_ops->pm_suspend_methods;
+
+   if (! methods)
+   methods = pm_default_suspend_methods;
+
+   for (i=0; methods[i].name; i++)
+   s += sprintf(s,"%s ",methods[i].name);
 
-   for (i = 0; i < PM_SUSPEND_MAX; i++) {
-   if (pm_states[i])
-   s += sprintf(s,"%s ",pm_states[i]);
-   }
s += sprintf(s,"\n");
return (s - buf);
 }
 
 static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t 
n)
 {
-   suspend_state_t state = PM_SUSPEND_STANDBY;
-   char ** s;
+   struct pm_suspend_method *methods = pm_ops->pm_suspend_methods;
+   int i;
char *p;
int error;
int len;
@@ -344,12 +358,15 @@
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
 
-   for (s = _states[state]; state < PM_SUSPEND_MAX; s++, state++) {
-   if (*s && !strncmp(buf, *s, len))
+   if (! methods)
+   methods = pm_default_suspend_methods;
+
+   for (i = 0; methods[i].name; i++) {
+   if (!strncmp(buf, methods[i].name, len))
break;
}
-   if (*s)
-   error = enter_state(state);
+   if (methods[i].name)
+   error = enter_state(methods[i].state);
else
error = -EINVAL;
return error ? error : n;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] explicitly bind idle tasks

2005-03-01 Thread Nathan Lynch
On Sun, Feb 27, 2005 at 02:49:28PM -0800, Andrew Morton wrote:
> Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> >
> > > - if (cpu_is_offline(smp_processor_id()) &&
> >  > +if (cpu_is_offline(_smp_processor_id()) &&
> >  >  system_state == SYSTEM_RUNNING)
> >  >  cpu_die();
> >  >  }
> >  > _
> > 
> >  This is the idle loop. Is that ever supposed to be preempted ?
> 
> Nope, it's a false positive.  We had to do the same in x86's idle loop and
> probably others will hit it.

Perhaps I'm missing something, but is there any reason we can't do
the following?  I've tested it on ppc64, doesn't seem to break anything.

With hotplug cpu and preempt, we tend to see smp_processor_id warnings
from idle loop code because it's always checking whether its cpu has
gone offline.  Replacing every use of smp_processor_id with
_smp_processor_id in all idle loop code is one solution; another way
is explicitly binding idle threads to their cpus (the smp_processor_id
warning does not fire if the caller is bound only to the calling cpu).
This has the (admittedly slight) advantage of letting us know if an
idle thread ever runs on the wrong cpu.


Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>

Index: linux-2.6.11-rc5-mm1/init/main.c
===
--- linux-2.6.11-rc5-mm1.orig/init/main.c   2005-03-02 00:12:07.0 
+
+++ linux-2.6.11-rc5-mm1/init/main.c2005-03-02 00:53:04.0 +
@@ -638,6 +638,10 @@
 {
lock_kernel();
/*
+* init can run on any cpu.
+*/
+   set_cpus_allowed(current, CPU_MASK_ALL);
+   /*
 * Tell the world that we're going to be the grim
 * reaper of innocent orphaned children.
 *
Index: linux-2.6.11-rc5-mm1/kernel/sched.c
===
--- linux-2.6.11-rc5-mm1.orig/kernel/sched.c2005-03-02 00:12:07.0 
+
+++ linux-2.6.11-rc5-mm1/kernel/sched.c 2005-03-02 00:47:14.0 +
@@ -4092,6 +4092,7 @@
idle->array = NULL;
idle->prio = MAX_PRIO;
idle->state = TASK_RUNNING;
+   idle->cpus_allowed = cpumask_of_cpu(cpu);
set_task_cpu(idle, cpu);
 
spin_lock_irqsave(>lock, flags);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.10 1/1] netfilter: fix crash on nat+icmp packets

2005-03-01 Thread mukesh agrawal
This patch fixes a kernel crashing bug when using NAT. The crash occurs in 
the case when we send out a UDP packet to a closed port on another host, 
with the UDP packet being SNATed. The remote host replies with an ICMP 
port unreachable (type 3, code 3). We need to adjust the ICMP packet, 
because the UDP packet was SNATed.

The cause of the crash is that udp_manip_pkt reads *pskb into iph before 
calling skb_ip_make_writable, and fails to update iph after the call. 
Since skb_ip_make_writable may delete the original skb when it makes a 
copy, a page fault may occur when udp_manip_pkt later dereferences iph.

I suspect that normally, the relevant skbuff holds a UDP packet, and 
either the skbuff is unshared (so no copy is made, and the skbuff 
remains valid), or is shared, but remains so while udp_manip_pkt is 
running (and hence, the old reference to the skbuff is still okay).

When we get an ICMP reply for a SNATed UDP packet, ip_nat_proto_udp is 
asked to modify the UDP header inside the payload of the ICMP packet. (The 
call chain is ip_nat_fn -> icmp_reply_translation -> ip_nat_manip_pkt -> 
udp_manip_pkt.)

Since the UDP header is beyond the ICMP header, skb_ip_make_writable 
copies the skbuff, and deletes the original. Then udp_manip_pkt's iph is 
invalid, and dereferencing it causes a page fault.

Glancing at the code for tcp_manip_pkt, I think it would have the same 
problem, but I haven't tested that case. The patch below fixes the UDP 
case only.

(For the record, my kernel tree also has the tproxy patch from 
http://www.balabit.com/downloads/tproxy/linux-2.4/cttproxy-2.6.10-2.0.0.tar.gz 
applied. But I think this bug is independent of that patch.)

diff -uprN linux-2.6.10.orig/net/ipv4/netfilter/ip_nat_proto_udp.c 
linux-2.6.10.fixed/net/ipv4/netfilter/ip_nat_proto_udp.c
--- linux-2.6.10.orig/net/ipv4/netfilter/ip_nat_proto_udp.c 2004-12-24 
16:34:01.0 -0500
+++ linux-2.6.10.fixed/net/ipv4/netfilter/ip_nat_proto_udp.c2005-03-01 
19:32:21.0 -0500
@@ -95,6 +95,9 @@ udp_manip_pkt(struct sk_buff **pskb,
if (!skb_ip_make_writable(pskb, hdroff + sizeof(hdr)))
return 0;
+   /* skb_ip_make_writable may have copied the skb, and deleted
+  the original */
+   iph = (struct iphdr *)((*pskb)->data + iphdroff);
hdr = (void *)(*pskb)->data + hdroff;
if (maniptype == IP_NAT_MANIP_SRC) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Trap number: how a system software recognise it?

2005-03-01 Thread tony osborne
Hi,
I wish to be personally CC'ed the answers/comments posted to the list in 
response to this post

I have done some reading about system calls and memory management but some 
issues are not yet that clear for me, so I hope some of you will assist 
me...

PART1
--
Assume within a C user program there is a read (fid, buf, nbytes) procedure 
call. According to Tanenbaum book, this will call a C-system read function 
or what is known as C-stub function read_stb.
read_stb includes some initial instructions and in particularly a trap 
number to make a system call to the Kernel. This number will be used to 
index the trap vector or table stored in the kernel in order to retrieve the 
address of the trap handler routine which performs the read operation.

My question is how the read_stb knew about the trap number used by the 
kernel to perform read operation from the device?
Although my question refers to c library, my question holds for other system 
softwares that might perform the same operation.

Does the reading operation from a disk have a fix trap number? What about 
other I/O peripherals (scanner, webcam, digital camera, printer)

For the above peripherals, we need generally to install a device driver. If 
we take the scanner as an example and say the user wants to zoom out a 
scanned section. This operation is associated with some initial instructions 
and a system call through a trap number call. This trap number will be used 
to point to the relevant device driver routine as explained above.

Will this variable get assigned a value during the installation of 
peripheral device drivers?

We know that each device has its interface commands that the kernel can 
call. Each procedure will be stored at a particular location in the kernel 
memory. Once saved, I presume that the OS updates its trap table and 
allocate a trap number to each device procedure. Is that right?

What about the devices which are recognised without installing device 
drivers, have they a fixed location and trap numbers? Is this documented for 
 whoever want to write a device driver?

PART 2
--
The main CPU initiates the I/O operation by instructing the device 
controller with a high level commands (writing to the device registers and 
so on). Such high level commands are then translated to lower instruction by 
the device controller. Then it is up to the device processor to take these 
commands and branch to the relevant device driver code at the kernel space 
to perform the low level instructions. Upon completion an interrupt will be 
sent to the CPU to flag (hopefully) the completion of the task.

So could we deduce that the device processor have *full* access to the 
kernel memory (fill privileges)?

If we take as an example: reading a block of data from a stored file. Each 
file is associated with a File Control Block that contains the file’s 
metatdata, i.e. among others the description of file organisation.

Will the command sent by the CPU to the disk controller (after inspecting 
the file FCB) be similar to RETRIEVE BLOCK X. and this will be translated by 
the device controller to cylinder C, PLATTER P, SECTOR S?

What about if the OS wants to retrieve more block. Will just be written into 
the device controller memory?

Many thanks
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11-rc5-mm1: (seemingly non-fatal) NULL pointer dereference on startup

2005-03-01 Thread Bernhard Rosenkraenzer
I got this right after the initramfs script was finished and the root 
filesystem was mounted:

Unable to handle kernel NULL pointer dereference at virtual address 
printing eip:
c02f52fa
*pde = 
Oops: 0002 [#1]
PREEMPT
Modules linked in:
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.11-0.rc5.1ark)
EIP is at __down+0x10a/0x130
eax:    ebx: cf63652c   ecx: cfd6d020   edx: 
esi: 0286   edi: cfbcd000   ebp: cfbb3020   esp: cfbcddc0
ds: 007b   es: 007b   ss: 0068
Process hotplug (pid: 286, threadinfo=cfbcd000 task=cfbb3020)
Stack: cf636534 0001 cfbb3020 c0116cb0 00100100 00200200 cfbcde04 9bbf6ac4
   d70e78df cf727d14 cfbcdf54 cfbcde6c cffe4140 c02f51c7 0010 cf6364bc
   c01711ad cf727d14 cfbcde6c c016c7b6 cf6364bc 0001 cf6349fa cfbcde6c
Call Trace:
[] default_wake_function+0x0/0x20
[] __down_failed+0x7/0xc
[] .text.lock.namei+0x8/0x1db
[] permission+0xe6/0xf0
[] link_path_walk+0x882/0xf80
[] __up+0x1c/0x20
[] .text.lock.namei+0x22/0x1db
[] link_path_walk+0x9c1/0xf80
[] handle_mm_fault+0x1ea/0x540
[] path_lookup+0x83/0x150
[] open_namei+0x8f/0x620
[] filp_open+0x3b/0x70
[] get_unused_fd+0x2c/0xd0
[] sys_open+0x57/0xf0
[] syscall_call+0x7/0xb
Code: ff 21 e0 ff 48 14 8b 40 08 a8 08 75 19 c7 45 00 00 00 00 00 83 c4 24 
5b5e 5f 5d c3 e8 00 07 00 00 e9 73 ff ff ff e8 f6 06 00 00 <00> 00 00 00 00 
00 eb da 0f 0b a4 00 7b 48 30 c0 eb 8e 0f 0b a5


Same box, same kernel, with hotplug disabled boots up fine and produces a 
similar oops later:

Unable to handle kernel NULL pointer dereference at virtual address 
printing eip:
c02f52fa
*pde = 
Oops: 0002 [#1]
PREEMPT
Modules linked in: usbkbd usbhid snd_cmipci gameport snd_pcm snd_page_alloc 
snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device 
snd soundcore psmouse binfmt_misc lp parport md5 ipv6 8139too mii af_packet 
8250 serial_core ide_cd cdrom ohci_hcd usbcore video thermal sony_acpi 
processor pcc_acpi fan container button battery ac genrtc
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00210246   (2.6.11-0.rc5.1ark)
EIP is at __down+0x10a/0x130
eax:    ebx: c0346644   ecx: cda20540   edx: 
esi: 00200286   edi: c6b6f000   ebp: c4c68020   esp: c6b6fed4
ds: 007b   es: 007b   ss: 0068
Process iwconfig (pid: 3080, threadinfo=c6b6f000 task=c4c68020)
Stack: c034664c 0001 c4c68020 c0116cb0 00100100 00200200 30746973 
   0001 cfe71ee0 cfe71ee0 c55006e0 c55006e0 c02f51c7  c4c68020
   c02f6c94 c019279f    c019288f  c55006e0
Call Trace:
[] default_wake_function+0x0/0x20
[] __down_failed+0x7/0xc
[] .text.lock.kernel_lock+0x28/0x37
[] de_put+0xf/0xa0
[] proc_delete_inode+0x5f/0xc0
[] proc_delete_inode+0x0/0xc0
[] generic_delete_inode+0xb5/0x190
[] iput+0x3c/0x90
[] dput+0x6b/0x2b0
[] __fput+0x11e/0x1c0
[] filp_close+0x52/0xa0
[] sys_close+0x58/0xa0
[] sysenter_past_esp+0x54/0x75
Code: ff 21 e0 ff 48 14 8b 40 08 a8 08 75 19 c7 45 00 00 00 00 00 83 c4 24 
5b5e 5f 5d c3 e8 00 07 00 00 e9 73 ff ff ff e8 f6 06 00 00 <00> 00 00 00 00 
00 eb da 0f 0b a4 00 7b 48 30 c0 eb 8e 0f 0b a5
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BK] cvs export

2005-03-01 Thread Larry McVoy
A while back someone complained about the CVS exporter because it
sometimes groups a pile of BK changesets into one commit.  That's true,
it does.

I've been running tests over the BK tree and I think we can do better.
Here's the scoop: when we do an export we are going from a very bushy
graph structure to a linear graph structure.  The BK graph structure
represents what happened in all the BK repos that ever came together,
the CVS graph structure is more like what would happen if all the work had
been done in CVS.  What that means in practice is that the linearization
sometimes results in a single CVS commit which has multiple changesets
in it.  Pavel or someone complained that the problem with that is that
if you are looking for a bug and you are searching through commits, that
works fine *unless* your bug happens to be in one of the commits which
is really a pile of changesets.  Is that accurate Pavel/Andrea/Roman/etc?

In the last flamefest about BK there was all this fuss that there wasn't
enough info in the CVS export and I think that the problem described
above is the basis for 99% (or maybe 100%) of the flameage.  Is that
also accurate?

I tried to point out the following and think it was lost in the noise:
while the repository commits themselves construct a very bushy graph,
the files are not at all bushy, they are extremely linear.  So what does
that actually mean?  The history of the repository is very parallel,
that's what creates a bushy revision history graph, but in spite of that,
there is very little parallelism in the actual files.  The result of
that is that when we do the CVS export, it is true that the number of
commits is about half the number of BK commits.  But the number of file
deltas is only 4% less than the number of file deltas in BK.

Pavel/Andrea/Roman/etc still were unhappy and they are justified in being
unhappy because even though we have almost all of the file history what
we don't have is all of the patch boundaries.  And when you are hunting
down a bug, if you look at Documentation/BUG-HUNTING (which I wrote back
in 1996 amusingly enough) the idea is to do binary search over a range
of changes in order to narrow down the cause.

Which leads us back to the problem.  If you narrow things down but where
you land is one of the clustered commits which has many changesets in it
then you are stuck with having to wade through a big pile of diffs to
find the bug, those diffs consisting of multiple patches.  Sound right
to you Pavel/Andrea/Roman/etc?

If so, I have to agree with you, this is a limitation of the CVS exporter.
So I've been thinking about how to fix this and have the following idea.
I want all the CVS export users to pay close attention because this either
should make you happy or not and I want to know the answer.

When we do the export we do a couple of things to make things pleasant
for you.  We make sure that the timestamps on all the files in the
same commit are the same, that makes timestamp based tools work.
We also shove a comment into each file's history that looks like so:
(Logical change 1.12345) so that tools that try and group things based
on comments can work.

It's that second feature that I think we can use to solve the problem,
we're finally getting to the idea.  If we have a commit that is really 200
patches which touch 400 files then we can do better.  Suppose that the
files in the patches are disjoint, i.e., each patch touches a different
set of files, there is no overlap.  If that's true then we could change
the comment to (Logical change 1.12345.$PATCH).  It's still all one CVS
commit but if you need to go working through that commit to get at the
individual patches you could, right?

One problem is that the set of files in patches may not be disjoint,
the same file may participate in multiple patches.  I think we can handle
that in the following way, we put multiple comments, one for each patch,
so you'd see

(Logical change 1.12345.5)
(Logical change 1.12345.11)
(Logical change 1.12345.79)

That's not a perfect answer because now that file participates in
multiple patches and if it's the one that has the problem you'll have
to wade through the diffs for that file for that commit.  But that's an
extreme corner case as far as I can tell (I have faith I'll be "educated"
if I'm wrong about that).

So, everyone including the Pavel/Andrea/Roman/etc camp, how do you feel
about this?  If we were to hack the exporter to add this info do you think
that would address the problems you have with the exporter?  The reason
I ask is that while I was going to just hack this in, I went to do that
and it turned into a nasty problem, both engineering and CPU wise when
exporting.  So if this isn't what you wanted then I won't bother to do it.

I'm not asking if this is the same as GPLing BK or giving people free 
access to 100% of the BK internal data structures, etc.  What I'm asking
is if this will make the CVS export tree something you 

Re: Network speed Linux-2.6.10

2005-03-01 Thread Baruch Even
Paul Dickson wrote:
On Tue, 1 Mar 2005 14:29:24 -0500 (EST), linux-os wrote:
Intel NIC e100 device driver. Two identical machines.
Private network, no other devices. Connected using a Netgear switch.
Test data is the same thing sent from memory on one machine
to a discard server on another, using TCP/IP SOCK_STREAM.
If I set both machines to auto-negotiation OFF and half duplex,
I get about 9 to 9.5 megabytes/second across the private wire
network.
If I set one machine to full duplex and the other to half-duplex
I get 10 to 11 megabytes/second transfer across the network,
regardless of direction.
If I set both machines to auto-negotiation OFF and full duplex,
I get 300 to 400 kilobytes/second regardless of the direction.
Might this be related to the broken BicTCP implementations in the 2.6.6+
kernels?  A fix was added around 2.6.11-rc3 or 4.
Unlikely, the problem with BIC would have shown itself only at high 
speeds over long latency links, not over a lan connection.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Network speed Linux-2.6.10

2005-03-01 Thread Paul Dickson
On Tue, 1 Mar 2005 14:29:24 -0500 (EST), linux-os wrote:

> Intel NIC e100 device driver. Two identical machines.
> Private network, no other devices. Connected using a Netgear switch.
> Test data is the same thing sent from memory on one machine
> to a discard server on another, using TCP/IP SOCK_STREAM.
> 
> If I set both machines to auto-negotiation OFF and half duplex,
> I get about 9 to 9.5 megabytes/second across the private wire
> network.
> 
> If I set one machine to full duplex and the other to half-duplex
> I get 10 to 11 megabytes/second transfer across the network,
> regardless of direction.
> 
> If I set both machines to auto-negotiation OFF and full duplex,
> I get 300 to 400 kilobytes/second regardless of the direction.

Might this be related to the broken BicTCP implementations in the 2.6.6+
kernels?  A fix was added around 2.6.11-rc3 or 4.

-Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Nick Piggin
Corey Minyard wrote:
Nick Piggin wrote:

Is get_with_check actually going to be useful for anything? It
seems like it promotes complex and potentially unsafe schemes.

It is certainly more complex to use this, and I'm guessing that's why 
Greg rejected it.  Certainly a valid problem.

eg. In your queue example, it would usually be better to have
a refcount for being on queue, and entry_completed would remove
the entry from the queue and accordingly drop the refcount. The
release function would then just free it.

True.  But if things picked up entries of the queue and incremented 
their refcount, then you would need a lock.  The same technique would 
apply.  But your example would be the more common one, I would think.

Well, but you take a lock in your system too, to protect the
queue (ie. in get_working_entry()).
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Corey Minyard
Nick Piggin wrote:
Corey Minyard wrote:
Arjan van de Ven wrote:
Just doing an atomic operation is not faster than doing a lock, an 
atomic operation, then an unlock?  Am I missing something?
  

if the lock and the atomic are on the same cacheline they're the same
cost on most modern cpus...
 

Ah, I see.  Not likely to ever be the case with this.  The lock will 
likely be with the main data structure (the list, or whatever) and 
the refcount will be in the individual item in the main data 
structure (list entry).

Is get_with_check actually going to be useful for anything? It
seems like it promotes complex and potentially unsafe schemes.
It is certainly more complex to use this, and I'm guessing that's why 
Greg rejected it.  Certainly a valid problem.

eg. In your queue example, it would usually be better to have
a refcount for being on queue, and entry_completed would remove
the entry from the queue and accordingly drop the refcount. The
release function would then just free it.
True.  But if things picked up entries of the queue and incremented 
their refcount, then you would need a lock.  The same technique would 
apply.  But your example would be the more common one, I would think.

Thanks,
-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sysfs: fix signedness problem

2005-03-01 Thread Greg KH

count is size_t, fill_write_buffer() may return a negative number
which would evade the 'count > 0' checks and do bad things.

found by the Coverity tool

Signed-off-by: Alexander Nyberg <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

--- 1.22/fs/sysfs/file.c2004-11-04 03:04:14 +01:00
+++ edited/fs/sysfs/file.c  2005-02-26 15:48:19 +01:00
@@ -231,15 +231,16 @@ static ssize_t
 sysfs_write_file(struct file *file, const char __user *buf, size_t count, 
loff_t *ppos)
 {
struct sysfs_buffer * buffer = file->private_data;
+   ssize_t len;
 
down(>sem);
-   count = fill_write_buffer(buffer,buf,count);
-   if (count > 0)
-   count = flush_write_buffer(file->f_dentry,buffer,count);
-   if (count > 0)
-   *ppos += count;
+   len = fill_write_buffer(buffer, buf, count);
+   if (len > 0)
+   len = flush_write_buffer(file->f_dentry, buffer, len);
+   if (len > 0)
+   *ppos += len;
up(>sem);
-   return count;
+   return len;
 }
 
 static int check_perm(struct inode * inode, struct file * file)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Nick Piggin
Corey Minyard wrote:
Arjan van de Ven wrote:
Just doing an atomic operation is not faster than doing a lock, an 
atomic operation, then an unlock?  Am I missing something?
  

if the lock and the atomic are on the same cacheline they're the same
cost on most modern cpus...
 

Ah, I see.  Not likely to ever be the case with this.  The lock will 
likely be with the main data structure (the list, or whatever) and the 
refcount will be in the individual item in the main data structure (list 
entry).

Is get_with_check actually going to be useful for anything? It
seems like it promotes complex and potentially unsafe schemes.
eg. In your queue example, it would usually be better to have
a refcount for being on queue, and entry_completed would remove
the entry from the queue and accordingly drop the refcount. The
release function would then just free it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove dead cyrix/centaur mtrr init code

2005-03-01 Thread Alan Cox
On Llu, 2005-02-28 at 19:20, Andries Brouwer wrote:
> One such case is the mtrr code, where struct mtrr_ops has an
> init field pointing at __init functions. Unless I overlook
> something, this case may be easy to settle, since the .init
> field is never used.

The failure to invoke the ->init operator appears to be the bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andreas Schwab
Bernd Schubert <[EMAIL PROTECTED]> writes:

> Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does 
> it work without this option on a 32bit kernel, but not on a 64bit kernel?

See nfs_fileid_to_ino_t for why the inode number is different between
32bit and 64bit kernels.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.11-rc5-mm1 patch] reiser4 Kconfig help cleanup

2005-03-01 Thread Adrian Bunk
The current reiser4 help texts have two disadvantages:
1. they are more marketing speech than technical speech with
   some debatable statements
2. they are too long


Examples for what I call "debatable statements":


  ReiserFS V3 is the stablest Linux filesystem, and V4 is the fastest.


All people I know who talked about serious data loss due to a filesystem 
damage were using reiser3.

That's not an objective measurement, but I have yet to see the proof 
that reiserfs plus reiserfs fsck is able to beat ext2 plus ext2 fsck
in terms of stability.


  In regards to claims by ext2 that they are the de facto
  standard Linux filesystem, the most polite thing to say is that
  many persons disagree, and it is interesting that those persons
  seem to include the distros that are growing in market share.
  See http://www.namesys.com/benchmarks.html for why many disagree.


ext2 was the de facto standard Linux filesystem at the time when this 
was written into the ext2 help text. I just checked kernel 2.0 (released 
more than eight and a half years ago) and this text was already there at 
that time (and it was accurate at that time). It was simply a leftover.

Unfortunately, most people will no longer be able to understand this 
"joke" in the reiser4 help text since my patch removing this from the 
ext2 help text because it's no longer accurate was included in kernel 
2.6.8 released more than half a year ago.




This patch shortens the help texts to include only the interesting 
technical information and to bring the help text to a lenght that is 
comparable to the help texts of other filesystems.

The pointer to http://www.namesys.com is still there, and people can 
get all the other information formerly in the help text from there 
of they are interested in more details.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 fs/reiser4/Kconfig |   81 +
 1 files changed, 10 insertions(+), 71 deletions(-)

--- linux-2.6.11-rc5-mm1-full/fs/reiser4/Kconfig.old2005-03-01 
23:49:13.0 +0100
+++ linux-2.6.11-rc5-mm1-full/fs/reiser4/Kconfig2005-03-02 
00:31:02.0 +0100
@@ -1,27 +1,11 @@
 config REISER4_FS
-   tristate "Reiser4 (EXPERIMENTAL very fast general purpose filesystem)"
+   tristate "Reiser4 (EXPERIMENTAL)"
depends on EXPERIMENTAL && !4KSTACKS
help
- Reiser4 is more than twice as fast for both reads and writes as
- ReiserFS V3, and is the fastest Linux filesystem, by a lot,
- for typical IO intensive workloads.  [It is slow at fsync
- intensive workloads as it is not yet optimized for fsync
- (sponsors are welcome for that work), and it is instead
- optimized for atomicity, see below.]  Benchmarks that define
- "a lot" are at http://www.namesys.com/benchmarks.html.
-
- It is the storage layer of what will become a general purpose naming
- system --- like what Microsoft wants WinFS to be except designed with 
a
- clean new semantic layer rather than being SQL based like WinFS.
- For details read http://www.namesys.com/whitepaper.html
-
- It performs all filesystem operations as atomic transactions, which
- means that it either performs a write, or it does not, and in the
- event of a crash it does not partially perform it or corrupt it.
- Many applications that currently use fsync don't need to if they use
- reiser4, and that means a lot for performance.  An API for performing
- multiple file system operations as one high performance atomic write
- is almost finished.
+ Reiser4 is a filesystem that performs all filesystem operations
+ as atomic transactions, which means that it either performs a
+ write, or it does not, and in the event of a crash it does not
+ partially perform it or corrupt it.
 
  It stores files in dancing trees, which are like balanced trees but
  faster.  It packs small files together so that they share blocks
@@ -30,45 +14,9 @@
  hassling you with anachronisms like having a maximum number of
  inodes, and wasting space if you use less than that number.
 
- It can handle really large directories, because its search
- algorithms are logarithmic with size not linear.  With Reiser4 you
- should use subdirectories because they help YOU, not because they
- help your filesystem's performance, or because your filesystem won't
- be able to shrink a directory once you have let it grow.  For squid
- and similar applications, everything in one directory should perform
- better.
-
- It has a plugin-based infrastructure, which means that you can easily
- invent new kinds of files, and so can other people, so it will evolve
- rapidly.
-
- We will be adding a variety of 

Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andreas Schwab
Bernd Schubert <[EMAIL PROTECTED]> writes:

> Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does 
> it work without this option on a 32bit kernel, but not on a 64bit kernel?

Most likely the inode number (which is the only non-filesize related item
that is different between struct stat and struct stat64) overflows ino_t.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

2005-03-01 Thread Pavel Machek
Hi!

> > > Relocating pagedir | 
> > > Reading image data (8157 pages): 100% 8157 done.
> > > Stopping tasks: |   
> > > Freeing memory... done (0 pages freed)
> > > Freezing CPUs (at 1)...Sleeping in:   
> > >  [] dump_stack+0x19/0x20 
> > >  [] smp_pause+0x1f/0x54 
> > >  [] smp_call_function_interrupt+0x3b/0x60
> > >  [] call_function_interrupt+0x1c/0x24
> > >  [] cpu_idle+0x55/0x64   
> > >  [] start_secondary+0x71/0x78
> > >  [<>] 0x0  
> > >  [] 0xcffa5fbc
> > > ok  
> > > double fault, gdt at c1203260 [255 bytes]
> > > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
> 
> Note the double fault.

Yes, I can see it, it scares me. SMP swsusp is not in good state
because I do not have easy access to SMP or HT hardware. I guess I'll
just have to get into suse at the night and steal some P4 ;-).

Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5: Promise SATA150 TX4 failure

2005-03-01 Thread J.A. Magallon

On 03.01, Joerg Sommrey wrote:
> Hi all,
> 
> a problem that was introduced between 2.6.10-ac9 and 2.6.10-ac11 made
> it's way into 2.6.11-rc5.  While taking a backup onto a SCSI-streamer one
> of my RAID1-arrays gets corrupted.  Afterwards the system hangs and
> isn't even bootable.  Need to raidhotadd the failed partition in single
> user mode to get the box working again. Error messages:
> 

Me too :(. Just a slightly different case.
I have a server with 6x250Gb SATA drives, hanged on a pair of Promise
PDC20319 (FastTrak S150 TX4) (rev 02) controlers (each has 4 ports).
Main use for the box is as a smb/atalk/nfs server.

With 2.6.20-rc3-mm2+libata-dev2, the box is stable, we can drop
gigs of files throug samba amd it works. 
Anything newer that that makes the box hang siliently, no messages,
no oops. It also happened to me with just a local wget of a big
file (oofice-2.0-beta), after download the box locked hard.

I tried to apply libata-dev1 on top of newer kernels, but part of it
is already there, and the rest drops too many rejects/offsets for
me.

I also have one other problem with flock, but thats subject for another
post...

Any ideas about what changed wrt sata ?

--
J.A. Magallon  \   Software is like sex:
werewolf!able!es \ It's better when it's free
Mandrakelinux release 10.2 (Cooker) for i586
Linux 2.6.10-jam12 (gcc 3.4.3 (Mandrakelinux 10.2 3.4.3-3mdk)) #1










-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/serial/8250.c:1256!

2005-03-01 Thread Russell King
On Wed, Mar 02, 2005 at 12:09:46AM +0100, Karol Kozimor wrote:
> I've finally got around to test latest kernels and managed to find a bug in 
> the serial subsystem, which happens during suspend.

Yes, serial_cs is claiming that we don't have a device associated with
the port, so we're treating it as a legacy port.  However, serial_cs is
implementing the suspend/resume methods.  This is wrong, since that
means the port will be suspended twice, and hence causes this bug.

serial_cs needs to register the ports along with the PCMCIA device with
which the port belongs to.  This will stop it being treated as a legacy
serial port.

Unfortunately, it's too late tonight for me to dig into PCMCIA to work
out how we get at the device structure - I can't find any examples off
hand either.  Therefore, it may be a while before I can produce a patch
to resolve this.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Corey Minyard
Arjan van de Ven wrote:
Just doing an atomic operation is not faster than doing a lock, an 
atomic operation, then an unlock?  Am I missing something?
   

if the lock and the atomic are on the same cacheline they're the same
cost on most modern cpus...
 

Ah, I see.  Not likely to ever be the case with this.  The lock will 
likely be with the main data structure (the list, or whatever) and the 
refcount will be in the individual item in the main data structure (list 
entry).

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bootsplash for 2.6.11-rc4

2005-03-01 Thread Pavel Machek
Hi!

> Hmm, maybe I should change the vesafb test in the bootsplash code
> to test if fb_imageblit == cfb_imageblit. This would make Pavel
> very happy, I guess ;-)

Yes, I like that one. Also it is likely going to be cleaner than
vesafb_ops hack.
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cyrix_arr_init and centaur_mcr_init unused?

2005-03-01 Thread Alan Cox
On Sad, 2005-02-26 at 21:50, [EMAIL PROTECTED] wrote:
> arch/i386/kernel/cpu/mtrr/cyrix.c has a routine cyrix_arr_init(), and
> arch/i386/kernel/cpu/mtrr/centaur.c has a routine centaur_mcr_init().
> At first sight it looks like these are unused.
> Do I overlook something?
> 
> (They occur as the .init fields of some struct, and I did not find any
> calls of ->init().)

Does look like a bug to me - and the centaur code definitely wants the
mcr init function to be called.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andreas Schwab
Andi Kleen <[EMAIL PROTECTED]> writes:

> On Tue, Mar 01, 2005 at 11:10:38PM +0100, Andreas Schwab wrote:
>> That's because there are some values in the stat64 buffer delivered by the
>> kernel which cannot be packed into the stat buffer that you pass to stat.
>> Use stat64 or _FILE_OFFSET_BITS=64.
>
> If that had been the case strace would have reported EOVERFLOW
> or E2BIG.

No, the values are ok for stat64.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Bernd Schubert
On Tuesday 01 March 2005 23:10, Andreas Schwab wrote:
> Bernd Schubert <[EMAIL PROTECTED]> writes:
> >> It is most likely some kind of user space problem.  I would change
> >> it to int err = stat(dir, );
> >> and then go through it with gdb and see what value err gets assigned.
> >>
> >> I cannot see any kernel problem.
> >
> > The err value will become -1 here.
>
> That's because there are some values in the stat64 buffer delivered by the
> kernel which cannot be packed into the stat buffer that you pass to stat.
> Use stat64 or _FILE_OFFSET_BITS=64.

Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does 
it work without this option on a 32bit kernel, but not on a 64bit kernel?

32bit kernel, 32bit binary: always works
64bit kernel, 64bit binary: always works

64bit kernel, 32bit binary:
 - always works on knfsd mount points
 - always works with -D_FILE_OFFSET_BITS=64
 - only works on unfs3 mount points with _FILE_OFFSET_BITS=64 


Do I really have to write a bug report for every single debian package that 
access /etc  and /var to make the maintainers recompile it with 
-D_FILE_OFFSET_BITS=64? 
Btw, whats about Suse, are there all packages compiled with this option? ;)


Cheers, 
(a completely confused) Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at drivers/serial/8250.c:1256!

2005-03-01 Thread Karol Kozimor
Hi,

I've finally got around to test latest kernels and managed to find a bug in 
the serial subsystem, which happens during suspend.

I use a 3Com PC Card Bluetooth adapter that needs serial_cs and hci_uart
modules. Whenever I try to suspend using 2.6.10 or a newer kernel, the
following bug appears. Note that 2.6.9 works perfectly.

#v+ handwritten, 2.6.11-rc5
kernel BUG at drivers/serial/8250.c:1256!
invalid operand:  [#1]
PREEMPT
[...]
EIP is at serial_unlink_irq_chain+0x4b/0x60 [8250]
[...]
Call Trace:
uart_suspend_port [serial_core]
serial_suspend [serial_cs]
serial_event [serial_cs]
send_event_callback [pcmcia]
__bus_for_each_dev
bus_for_each_dev
send_event_callback [pcmcia]
send_event [pcmcia]
send_event_callback [pcmcia]
handle_event [pcmcia]
ds_event [pcmcia]
send_event [pcmcia_core]
socket_suspend [pcmcia_core]
#v-

Photos are available here (sorry for the quality):
http://hell.org.pl/~sziwan/bug_8250-1.jpg
http://hell.org.pl/~sziwan/bug_8250-2.jpg
http://hell.org.pl/~sziwan/bug_8250-3.jpg

I'll be happy to provide whatever information is needed.

Best regards,

-- 
Karol 'sziwan' Kozimor
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andi Kleen
> stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0

It returns 0. No error.  Someone else in user space must be adding the 
EOVERFLOW.
glibc code does quite a lot of strange things with stat, perhaps
it comes from there.

> write(2, "err = -1\n", 9err = -1
> )   = 9
> write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
> ) = 30
> write(2, "ernno: 75 (Value too large for d"..., 50ernno: 75 (Value too large 
> for defined data type)
> ) = 50
> exit_group(0)   = ?

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SPARC64: Modular floppy?

2005-03-01 Thread David S. Miller
On Tue, 01 Mar 2005 16:26:05 -0300
Horst von Brand <[EMAIL PROTECTED]> wrote:

> Right. But where? I was thinking under arch/sparc64/drivers/floppy.S or
> such. And then there would need to be some make magic for it to get picked
> up and included only for sparc64. Sounds doable, if somewhat messy.

Sparc 32-bit has the same problem btw.  It's a direct IRQ handler that
doesn't need to save any trap state.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt

> In fact, I'd argue that even a driver that _uses_ the interface should not
> necessarily shut itself down on error. Obviously, it should always log the
> error, but outside of that it might be good if the operator can decide and
> set a flag whether it should try to re-try (which may not always be
> possible, of course), shut down, or just continue.

In lots of case, you don't have an operator smart enough to make this
decision nowadays, and even if you had, for things like your SCSI
adapter, you just can't expect userland to be operational. The error
recovery policy should be buildable in the driver. If it's not, however,
then I agree that userland intervention is probably a good idea.

Note that on pSeries, we have no choice. On error, the slot is isolated.
So we can't let the driver continue anyway.

Ben.
  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Bernd Schubert
> strace didn't say so, and normally it doesn't lie about things like this.

Well, I show you the updated source code and strace output and if you still 
don't believe me, ask me for a login to our system ;)


#include 
#include 
#include 
#include 
#include 
#include 
#include 


int main(int argc, char **argv)
{
char *dir;
struct stat *buf;
int err;

dir = argv[1];

buf = malloc(sizeof(struct stat));

errno = 0;

err = stat(dir, buf);
if ( err ) {
fprintf(stderr, "err = %i\n", err);
fprintf(stderr, "stat for %s failed \n", dir);
fprintf(stderr, "ernno: %i (%s)\n", errno, strerror(errno));
} else
fprintf(stderr, "stat() works fine.\n");

return (0);
}


>
> > > [EMAIL PROTECTED] tests>./test_stat32 /mnt/test/yp
> > > stat for /mnt/test/yp failed
> > > ernno: 75 (Value too large for defined data type)
>
> errno is undefined unless a system call returned -1 before or
> you set it to 0 before.

See above.

>
> > > But why does stat64() on a 64-bit kernel tries to fill in larger data
> > > than
>
> A 64bit kernel has no stat64(). All stats are 64bit.

[EMAIL PROTECTED] tests>strace32 ./test_stat32 /mnt/test/yp
execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 43 vars */]) = 
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0)  = 0x80ad000
brk(0x80ce000)  = 0x80ce000
stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "err = -1\n", 9err = -1
)   = 9
write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
) = 30
write(2, "ernno: 75 (Value too large for d"..., 50ernno: 75 (Value too large 
for defined data type)
) = 50
exit_group(0)   = ?

You certainly know much better than me, but I think strace shows that its 
calling stat64.

>
> > > on a 32-bit kernel and larger data also only for nfs-mount points? Hmm,
> > > I will tomorrow compare the tcp-packges sent by the server.
> >
> > So I still think thats a kernel bug.
>
> Your data so far doesn't support this assertion.

I have to admit that knfsd-mount moints are not affected, but on the other 
hand, I really cant't see anything in the ethereal captures. If someone 
should be interested, I have uploaded them:

http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/nfs-stat/


Cheers,
 Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] SCSI: possible cleanups

2005-03-01 Thread Luben Tuikov
On 03/01/05 17:17, Christoph Hellwig wrote:
Doing it in the core means less duplication and avoiding updating
all drivers.
I agree.
Luben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:

> The current proposal (and prototype) has a "master recovery thread"
> to handle the coordinated reset of the pci controller.  This master
> recovery thyread makes three calls in struct pci_driver:
> 
>void (*frozen) (struct pci_dev *);  /* called when dev is first frozen */
>void (*thawed) (struct pci_dev *);  /* called after card is reset */
>void (*perm_failure) (struct pci_dev *);  /* called if card is dead */

See my other emails. I think only one callback is enough, and I think we
need more parameters.

> The master recovery thread runs in the kernel.  Earlier suggestions said
> "run it in user space, use pci hotplug, use udev, etc." However, if
> you get a pci error on a scsi card, you can't shell script 
> "umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
> beacuse you can't umount an open filesystem, and you can't really close
> it (I fiddled with prototyping some of this, but its ugly and painful
> and bizarre and outside my area of expertise :)
> 
> FWIW, the current prototype tries to do a pci hotplug if the above
> routines aren't implemented in struct pci_driver.  It can recover 
> from pci errors on ethernet cards, and I have one scsi driver that
> successfully recovers with above API, and am working on adding recovery
> to the symbios driver.
> 
> --linas
-- 
Benjamin Herrenschmidt <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 18:19 +0100, Andi Kleen wrote:
> Hidetoshi Seto <[EMAIL PROTECTED]> writes:
> 
> >
> > int sample_read_with_iochk(struct pci_dev *dev, u32 *buf, int words)
> > {
> > unsigned long ofs = pci_resource_start(dev, 0) + DATA_OFFSET;
> > int i;
> >
> > /* Create magical cookie on the stack */
> > iocookie cookie;
> >
> > /* Critical section start */
> > iochk_clear(, );
> > {
> > /* Get the whole packet of data */
> > for (i = 0; i < words; i++)
> > *buf++ = ioread32(dev, ofs);
> > }
> > /* Critical section end. Did we have any trouble? */
> > if ( iochk_read() ) return -1;
> 
> Looks good for handling PCI-Express errors.
> 
> But what would the default handling be? It would be nice if there
> was a simple way for a driver to say "just shut me down on an error"
> without adding iochk_* to each function. Ideally this would be just
> a standard callback that knows how to clean up the driver.

I think that would be the lack of a callback, see other messages.

> > +void iochk_clear(iocookie *cookie, struct pci_dev *dev)
> > +{
> > +   local_irq_save(*cookie);
> > +}
> > +
> > +int iochk_read(iocookie *cookie)
> > +{
> > +   local_irq_restore(*cookie);
> > +   return 0;
> > +}
> 
> These should be inlined.
> 
> > +EXPORT_SYMBOL(iochk_init);
> 
> This doesn't need to be exported.
> 
> -Andi
-- 
Benjamin Herrenschmidt <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 09:10 -0800, Jesse Barnes wrote:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc).  OK, fine, it finds that it
> > was an error on PCI bus 73.  At this point, I think the architecture
> > error handler needs to call into the PCI subsystem and say "Hey, there
> > was an error, you deal with it".
> >
> > If we're lucky, we get all the information that allows us to figure
> > out which device it was (eg a destination address that matches a BAR),
> > then we could have a ->error method in the pci_driver that handles it.
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
> >
> > Does this make sense?
> 
> This was my thought too last time we had this discussion.  A completely 
> asynchronous call is probably needed in addition to Hidetoshi's proposed API, 
> since as you point out, the driver may not be running when an error occurs 
> (e.g. in the case of a DMA error or more general bus problem).  The async 
> ->error callback could do a total reset of the card, or something along those 
> lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could 
> potentially deal with errors as they happen (probably in the case of PIO 
> related errors), when the additional context may allow us to be smarter about 
> recovery.

What I think we need is an async call that takes:

 - an opaque blob with the error informations and accessors (see my
reply to Jeff)

 - a slot state (slot isolated, slot has been reset, slot has IOs /DMA
enabled/disabled, must probably be a bitmask)

 - a bit mask of possible actions the driver can request (nothing, reset
slot, re-enable IOs, ...)

I'm afraid tho that the combinatory explosion will make it difficult to
drivers to deal with the right thing. Maybe we can simplify the mecanism
to archs that can just 1) re-enable IOs, 2) reset slot.

Note that the reason to re-enable IOs before trying to reset the slot is
that some devices will allow you to extract diagnostic informations, and
it's always useful to gather as much informations as possible upon an
error of this type.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 08:49 -0800, Linus Torvalds wrote:
> 
> On Tue, 1 Mar 2005, Jeff Garzik wrote:
> > 
> > A new API handles none of this.
> 
> Ehh? 
> 
> The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> I think that's because nobody knows how to handle it (ie it's probably
> hw-dependent and all existign implementations would thus be
> driver-specific anyway).

We do on pSeries. Additionally, another device on the same physical
segment can trigger an error and cause a slot isolation on us even when
we do no IOs, so we also need asynchronous notification.

> And in the sense of "any general new api handles none of it", your 
> argument doesn't make sense. The _old_ IO API's clearly don't handle it. 
> So if you seem to say that "A new API" can't handle it either, then that 
> translates to "no API can ever handle it". Fair enough, if you think it's 
> impossible, but clearly you can handle some things.
> 
> And yes, CLEARLY drivers will have to do all the heavy lifting. 

I think Seto has a good start tho. But it's not enough. See my other
mail to Jeff for my other thought on the issue, I'm still trying to get
some API I'm happy myself with however, since it's not a simple issue.

> I don't expect most drivers to care. In fact, I expect about five to ten
> drivers to be converted to really care, and then for some forseeable time
> you'll have to be very picky about your hardware if you care about PCI 
> parity errors etc. Most people don't, and most drivers won't be written in 
> environments where they can be reasonably tested.

On pSeries, we'll default, for drivers that don't care, to triggering a
PCI unplug, slot reset, PCI re-plug. This is perfect for ethernet
drivers for example. But it's a real pain for block devices since they
won't be able to recover (and we can't even force unmount the dangling
filesystem afaik).

For drivers like IPR SCSI, we want to expose a richer API so they can
make use of the features provided by the platform, like resetting the
slot.

But the above also need to be able to differenciate a driver that cares
from a driver that doesn't. I think an additional callback in pci_driver
for notifying of async events (error happened, slot has been reset, IOs
are re-enabled, whatever we define ...) is a good idea, since the
presence of a callback is a good enough indication to the core of wether
the driver has it's own recovery strategy or not.

> That's just a fact. Anybody who expects all drivers to suddenly start 
> doing IO checks is just living in a dream world.
> 
>   Linus
-- 
Benjamin Herrenschmidt <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andi Kleen
On Tue, Mar 01, 2005 at 11:10:38PM +0100, Andreas Schwab wrote:
> That's because there are some values in the stat64 buffer delivered by the
> kernel which cannot be packed into the stat buffer that you pass to stat.
> Use stat64 or _FILE_OFFSET_BITS=64.

If that had been the case strace would have reported EOVERFLOW
or E2BIG. But it returned 0 according to the log that was posted.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] I/O-check interface for driver's error handling

2005-03-01 Thread Benjamin Herrenschmidt

> I have been thinking about PCI system and parity errors, and how to 
> handle them.  I do not think this is the correct approach.
> 
> A simple retry is... too simple.  If you are having a massive problem on 
> your PCI bus, more action should be taken than a retry.

It goes beyond that, see below.

> In my opinion each driver needs to be aware of PCI sys/parity errs, and 
> handle them.  For network drivers, this is rather simple -- check the 
> hardware, then restart the DMA engine.  Possibly turning off 
> TSO/checksum to guarantee that bad packets are not accepted.  For SATA 
> and SCSI drivers, this is more complex, as one must retry a number of 
> queued disk commands, after resetting the hardware.
> 
> A new API handles none of this.

On IBM pSeries machine (and I'm trying to figure out an API to deal with
that generically for drivers), upon a PCI error (either MMIO error or
DMA error), the slot is put in isolation automatically.

>From this point, we can instruct the firmware to 1) re-enable MMIO, 2)
re-enable DMA, 3) proceed to a slot reset and re-enable MMIO & DMA.

That allows all sort of recovery strategies. However, obviously, not all
architectures provide those facilities.

So I'm looking into a way to expose a generic API to drivers that would
allow them to use those facilities when present, and/or fallback to
whatever they can do when not (or just retry or even no recovery).

I have some ideas, but am not fully happy with them yet. But part of the
problem is the notification of the driver.

Checking IOs is one thing, what to do once a failure is detected is
another. Also, we need asynchronous notification, since a driver may
well be idle, not doing any IO, while the bus segment on which it's
sitting is getting isolated because another card on the same segment (or
another function on the same card) triggered an error.

Then, we need at least several back-and-forth callbacks. I'm thinking
about an additional callback in pci_driver() with a message and a state
indicating what happened, and returning wether to proceed or not, I'll
try to write down the details in a later email.

Another issue finally is the type of error informations. Various systems
may provide various details, like some systems, upon a DMA error, can
provide you with the actual address that faulted. Those infos can be
very useful for diagnosing the issue (since some errors are actual bugs,
for example, we spent a lot of time chasing issues with e1000 vs.
barriers). An "error cookie" is I think a good idea, with eventually
various accessors to extract data from it, and maybe a function to dump
the content in ascii form in some buffer...

Ben.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] SCSI: possible cleanups

2005-03-01 Thread Christoph Hellwig
On Tue, Mar 01, 2005 at 09:40:48AM -0500, Luben Tuikov wrote:
> On 03/01/05 03:14, Douglas Gilbert wrote:
> >>  - scsi_error.c: scsi_normalize_sense
> >
> >
> >I introduced scsi_normalize_sense() recently, Christoph H.
> >proposed it should be static but Luben Tuikov (aic7xxx
> >maintainer) said he wished to use it in the future.
> >Hence it was left global.
> 
> Hi guys,
> 
> I think the idea of normalized sense is very good.
> Basically the question is if LLDD would submit normalized
> sense to SCSI Core or whether they would submit a pointer
> to raw sense data as returned by the device and let SCSI
> Core decipher it.
> 
> If the former, then it should be global, if the latter then
> it should be static to SCSI Core.

Doing it in the core means less duplication and avoiding updating
all drivers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Arjan van de Ven

> Just doing an atomic operation is not faster than doing a lock, an 
> atomic operation, then an unlock?  Am I missing something?

if the lock and the atomic are on the same cacheline they're the same
cost on most modern cpus...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andreas Schwab
Bernd Schubert <[EMAIL PROTECTED]> writes:

>> It is most likely some kind of user space problem.  I would change
>> it to int err = stat(dir, );
>> and then go through it with gdb and see what value err gets assigned.
>>
>> I cannot see any kernel problem.
>
> The err value will become -1 here.

That's because there are some values in the stat64 buffer delivered by the
kernel which cannot be packed into the stat buffer that you pass to stat.
Use stat64 or _FILE_OFFSET_BITS=64.

>  Trond Myklebust already suggested to look at the results of errno:
>
> On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
>> On Monday 28 February 2005 23:26, you wrote:
>> > Given that strace shows that both syscalls (stat64() and stat())
>> > succeed,

The trace does not say anything about the user-level stat().

>> [EMAIL PROTECTED] tests>./test_stat32 /mnt/test/yp
>> stat for /mnt/test/yp failed
>> ernno: 75 (Value too large for defined data type)
>>
>> But why does stat64() on a 64-bit kernel tries to fill in larger data than
>> on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
>> will tomorrow compare the tcp-packges sent by the server.
>
> So I still think thats a kernel bug.

This has nothing to do with the kernel.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc32: uninorth-agp suspend support

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 22:23 +0100, Pavel Machek wrote:
> Hi!
> 
> > (This is for -mm, to be merged along with the aty128fb and radeonfb
> > related patches).
> > 
> > This patch adds suspend/resume support to the Apple UniNorth AGP bridge
> > to make sure AGP is properly disabled when the machine goes to sleep.
> > Without this, the r300 based laptops will fail to wakeup from sleep when
> > using the new experimental r300 DRI driver. It should also improve
> > reliablility in general with other chips.
> 
> > --- linux-work.orig/drivers/char/agp/uninorth-agp.c 2005-03-01 
> > 13:53:32.0 +1100
> > +++ linux-work/drivers/char/agp/uninorth-agp.c  2005-03-01 
> > 14:36:54.0 +1100
> > @@ -155,6 +161,56 @@
> > uninorth_tlbflush(NULL);
> >  }
> >  
> > +#ifdef CONFIG_PM
> > +static int agp_uninorth_suspend(struct pci_dev *pdev, u32 state)
> 
> pm_message_t state, please.

Oops :) 

>From [EMAIL PROTECTED] Mon Feb 28 18:49:35 2005
Return-Path: <[EMAIL PROTECTED]>
Received: from ozlabs.org (ozlabs.org [203.10.76.45]) by gate.crashing.org
(8.12.8/8.12.8) with ESMTP id j210nZgJ030407 for
<[EMAIL PROTECTED]>; Mon, 28 Feb 2005 18:49:35 -0600
Received: by ozlabs.org (Postfix, from userid 1003) id 0B4B767A75; Tue,  1
Mar 2005 11:50:51 +1100 (EST)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <[EMAIL PROTECTED]>
Date: Tue, 1 Mar 2005 11:57:58 +1100
From: Paul Mackerras <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: agp sleep patch
X-Mailer: VM 7.19 under Emacs 21.3.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on gate.crashing.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham
version=3.0.1
Status:   
X-Evolution-Source: pop://[EMAIL PROTECTED]:10110
Content-Transfer-Encoding: 8bit

Index: linux-work/drivers/char/agp/uninorth-agp.c
===
--- linux-work.orig/drivers/char/agp/uninorth-agp.c 2005-03-01 
13:53:32.0 +1100
+++ linux-work/drivers/char/agp/uninorth-agp.c  2005-03-02 09:01:00.0 
+1100
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "agp.h"
@@ -51,6 +52,11 @@
 
 static void uninorth_cleanup(void)
 {
+   u32 tmp;
+
+   pci_read_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, );
+   if (!(tmp & UNI_N_CFG_GART_ENABLE))
+   return;
pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL,
UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_INVAL);
pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL,
@@ -155,6 +161,59 @@
uninorth_tlbflush(NULL);
 }
 
+#ifdef CONFIG_PM
+static int agp_uninorth_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+   u32 cmd;
+   u8 agp;
+   struct pci_dev *device = NULL;
+
+   if (state != PMSG_SUSPEND)
+   return 0;
+
+   /* turn off AGP on the video chip, if it was enabled */
+   for_each_pci_dev(device) {
+   /* Don't touch the bridge yet, device first */
+   if (device == pdev)
+   continue;
+   /* Only deal with devices on the same bus here, no Mac has a P2P
+* bridge on the AGP port, and mucking around the entire PCI 
tree
+* is source of problems on some machines because of a bug in
+* some versions of pci_find_capability() when hitting a dead 
device
+*/
+   if (device->bus != pdev->bus)
+   continue;
+   agp = pci_find_capability(device, PCI_CAP_ID_AGP);
+   if (!agp)
+   continue;
+   pci_read_config_dword(device, agp + PCI_AGP_COMMAND, );
+   if (!(cmd & PCI_AGP_COMMAND_AGP))
+   continue;
+   printk("uninorth-agp: disabling AGP on device %s\n", 
pci_name(device));
+   cmd &= ~PCI_AGP_COMMAND_AGP;
+   pci_write_config_dword(device, agp + PCI_AGP_COMMAND, cmd);
+   }
+   
+   /* turn off AGP on the bridge */
+   agp = pci_find_capability(pdev, PCI_CAP_ID_AGP);
+   pci_read_config_dword(pdev, agp + PCI_AGP_COMMAND, );
+   if (cmd & PCI_AGP_COMMAND_AGP) {
+   printk("uninorth-agp: disabling AGP on bridge %s\n", 
pci_name(pdev));
+   cmd &= ~PCI_AGP_COMMAND_AGP;
+   pci_write_config_dword(pdev, agp + PCI_AGP_COMMAND, cmd);
+   }
+   /* turn off the GART */
+   uninorth_cleanup();
+
+   return 0;
+}
+
+static int agp_uninorth_resume(struct pci_dev *pdev)
+{
+   return 0;
+}
+#endif
+
 static int uninorth_create_gatt_table(void)
 {
char *table;
@@ -369,6 +428,10 @@
.id_table   = agp_uninorth_pci_table,
.probe  = agp_uninorth_probe,
.remove = agp_uninorth_remove,
+#ifdef CONFIG_PM
+   .suspend= 

Re: 2.6.11-rc5-mm1

2005-03-01 Thread Andrew Morton
Chris Wright <[EMAIL PROTECTED]> wrote:
>
> * Andrew Morton ([EMAIL PROTECTED]) wrote:
> > - I seem to be getting a lot of patches which don't compile if you breathe
> >   on the .config file, let alone if you try them on another architecture.  
> > It
> >   would be nice to receive less such patches, please.
> 
> The ia64 audit bit is likely my fault from the audit header detangle.

I figured something like that, but the change is a good one anwyay.  IOW:
your cleanup exposed a prior problem...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] Re: [RFC][PATCH 3/3] Kdump: Export crash notes section address through sysfs

2005-03-01 Thread Randy.Dunlap
Andrew Morton wrote:
Vivek Goyal <[EMAIL PROTECTED]> wrote:
o Following patch exports kexec global variable "crash_notes" to user space
  through sysfs as kernel attribute in /sys/kernel.

It breaks the x86_64 build.  A fix for that is below.
Please test kexec/kdump patches on all three architectures, both with your
config option enabled and with it disabled.  There are cross-compilers at
http://developer.osdl.org/dev/plm/cross_compile/
BTW:
You can download the cross_compile tools and run them yourself or you
can submit a patch to the PLM tool and it will run 8 arch. builds
for you
--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5

2005-03-01 Thread Mws
On Tuesday 01 March 2005 22:38, Benjamin Herrenschmidt wrote:
> On Tue, 2005-03-01 at 12:36 +0100, Mws wrote:
> > hi benjamin
> > 
> > now i had some spare time to do some investigation
> > 
> > booting the 2.6.11-rc5 with radeonfb.default_dynclk=0 or with -1
> > brings up a framebuffer console. everything is fine.
> > starting xorg-x11 with Ati binary only drivers just brings up a black screen
> > without a mouse cursor and freezes the hole machine. even network ect. 
> > is no more reachable from outside the machine. worst thing out of that
> > a tail on the log files (on another machine) does immediately stop - also 
> > no 
> > output is written to syslog :/
> > 
> > next scenario - test 2.6.11-rc5 with radeonfb.default_dynclock=0 and -1
> > starting xorg-x11 with Xorg Radeon driver. 
> > a grey screen comes up - mouse cursor is visible and also able to move for
> > 5 - 8 seconds after screen display - then freezes the whole machine again.
> 
> Ok, so it's not dynamic clocks. At this point, i have no idea what's
> going on. I don't yet have any access to PCI Express hardware. You
> should report this to X.org list where others can try to help me track
> this down.
> 
> Ben.

it's possible to do so, but i also will try to find out whats going on.
i don't know if i will have success.

regards
marcel



pgp2GpCGx86XC.pgp
Description: PGP signature


Re: [PATCH/RFC] A method for clearing out page cache

2005-03-01 Thread Pavel Machek
Hi!

> So what it comes down to is
> 
> sys_free_node_memory(long node_id, long pages_to_make_free, long what_to_free)
> 
> where `what_to_free' consists of a bunch of bitflags (unmapped pagecache,
> mapped pagecache, anonymous memory, slab, ...).

Heh, swsusp needs shrink_all_memory() and I'd like to use something
more generic as shrink_all_memory() does not seem to work properly. I
guess that loop over all node_ids should be easy ;-).

Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5-mm1

2005-03-01 Thread Andrew Morton
Adrian Bunk <[EMAIL PROTECTED]> wrote:
>
> On Tue, Mar 01, 2005 at 01:27:41AM -0800, Andrew Morton wrote:
> >...
> > All 728 patches:
> >...
> > reiser4-rcu-barrier.patch
> >   reiser4: add rcu_barrier() synchronization point
> 
> Considering the patent situation at least in the USA, the 
> EXPORT_SYMBOL(rcu_barrier) has to become an EXPORT_SYMBOL_GPL.

I'll make that change.

> > reiser4-export-inode_lock.patch
> >   reiser4: export inode_lock to modules
> >...
> 
> __iget seems to be no longer used by reiser4.
> This part of the patch can therefore be dropped.

And that one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Corey Minyard
Arjan van de Ven wrote:
On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
 

On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
   

Add a routine to kref that allows the kref_put() routine to be
unserialized even when the get routine attempts to kref_get()
an object without first holding a valid reference to it.  This is
useful in situations where this happens multiple times without
freeing the object, as it will avoid having to do a lock/semaphore
except on the final kref_put().
This also adds some kref documentation to the Documentation
directory.
 

I like the first part of the documentation, that's nice.
But I don't like the new kref_get_with_check() function that you
implemented.  If you look in the -mm tree, kref_put() now returns if
this was the last put on the reference count or not, to help with lists
of objects with a kref in it.
Perhaps you can use that to implement what you need instead?
   

Yes, that helps a lot.  I had actually already implemented something 
like that :).  But that's a different thing than avoiding the lock.

It's just that with the I2C stuff, you may be calling kref_put() 20-30 
times for a single operation.  That's a lot of lock/unlock operations.  
But it is wierd, so I understand.  Thanks.

note that I'm not convinced the "lockless" implementation actually is
faster. It still uses an atomic variable, which is just as expensive as
taking a lock normally...
 

Just doing an atomic operation is not faster than doing a lock, an 
atomic operation, then an unlock?  Am I missing something?

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5-mm1

2005-03-01 Thread Chris Wright
* Andrew Morton ([EMAIL PROTECTED]) wrote:
> - I seem to be getting a lot of patches which don't compile if you breathe
>   on the .config file, let alone if you try them on another architecture.  It
>   would be nice to receive less such patches, please.

The ia64 audit bit is likely my fault from the audit header detangle.
I did try w/ and w/out CONFIG_AUDITSYSCALL set, but only on one arch.
I also did a grep, but was only looking at audit_putname/getname, and
missed the others.  I just did a more complete grep of the symbols that
can get config'd away (including CONFIG_AUDIT as well), and I think
there's a few more missing pieces.  Sorry about that.  Jeff, Ralf,
Martin, these look ok?

thanks,
-chris
--

A closer sweep of configurable audit symbols shows the following need
audit.h included when audit.h is detangled from fs.h as in -mm.

arch/um/kernel/ptrace.cfor audit_syscall_entry/exit
arch/s390/kernel/ptrace.c  for audit_syscall_entry/exit
arch/mips/kernel/ptrace.c  for audit_syscall_entry/exit

Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

= arch/um/kernel/ptrace.c 1.21 vs edited =
--- 1.21/arch/um/kernel/ptrace.c2005-01-20 20:59:15 -08:00
+++ edited/arch/um/kernel/ptrace.c  2005-03-01 13:23:06 -08:00
@@ -9,6 +9,7 @@
 #include "linux/smp_lock.h"
 #include "linux/security.h"
 #include "linux/ptrace.h"
+#include "linux/audit.h"
 #ifdef CONFIG_PROC_MM
 #include "linux/proc_mm.h"
 #endif
= arch/s390/kernel/ptrace.c 1.28 vs edited =
--- 1.28/arch/s390/kernel/ptrace.c  2005-01-04 18:48:19 -08:00
+++ edited/arch/s390/kernel/ptrace.c2005-03-01 13:22:45 -08:00
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
= arch/mips/kernel/ptrace.c 1.15 vs edited =
--- 1.15/arch/mips/kernel/ptrace.c  2005-01-30 22:20:14 -08:00
+++ edited/arch/mips/kernel/ptrace.c2005-03-01 13:24:48 -08:00
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Andi Kleen
On Tue, Mar 01, 2005 at 10:07:01PM +0100, Bernd Schubert wrote:
> Hello Andi,
> 
> sorry, due to some mail sending/refusing problems, I had to resend to the 
> nfs-list, which prevented the answers there to be posted to the other CCs.
> 
> > It is most likely some kind of user space problem.  I would change
> > it to int err = stat(dir, );
> > and then go through it with gdb and see what value err gets assigned.
> >
> > I cannot see any kernel problem.
> 
> The err value will become -1 here.

strace didn't say so, and normally it doesn't lie about things like this.

> > [EMAIL PROTECTED] tests>./test_stat32 /mnt/test/yp
> > stat for /mnt/test/yp failed
> > ernno: 75 (Value too large for defined data type)

errno is undefined unless a system call returned -1 before or
you set it to 0 before.

> > But why does stat64() on a 64-bit kernel tries to fill in larger data than

A 64bit kernel has no stat64(). All stats are 64bit.

> > on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
> > will tomorrow compare the tcp-packges sent by the server.
> 
> So I still think thats a kernel bug.

Your data so far doesn't support this assertion.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] lib/sort: Heapsort implementation of sort()

2005-03-01 Thread Andrew Morton
Matt Mackall <[EMAIL PROTECTED]> wrote:
>
> I'll queue this
> up for after the sort and ACL stuff gets merged.

Whew!

I don't know how long the ACL changes will take to get merged up - is up to
Trond and he had quite a lot of rather robust comments on the first
iteration.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5

2005-03-01 Thread Benjamin Herrenschmidt
On Tue, 2005-03-01 at 12:36 +0100, Mws wrote:
> hi benjamin
> 
> now i had some spare time to do some investigation
> 
> booting the 2.6.11-rc5 with radeonfb.default_dynclk=0 or with -1
> brings up a framebuffer console. everything is fine.
> starting xorg-x11 with Ati binary only drivers just brings up a black screen
> without a mouse cursor and freezes the hole machine. even network ect. 
> is no more reachable from outside the machine. worst thing out of that
> a tail on the log files (on another machine) does immediately stop - also no 
> output is written to syslog :/
> 
> next scenario - test 2.6.11-rc5 with radeonfb.default_dynclock=0 and -1
> starting xorg-x11 with Xorg Radeon driver. 
> a grey screen comes up - mouse cursor is visible and also able to move for
> 5 - 8 seconds after screen display - then freezes the whole machine again.

Ok, so it's not dynamic clocks. At this point, i have no idea what's
going on. I don't yet have any access to PCI Express hardware. You
should report this to X.org list where others can try to help me track
this down.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sparc: fix compile failure ("struct resource" related)

2005-03-01 Thread Andrew Morton
Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
>
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Thanks.  Many of these fixups are due to a 64-bit-resource patch in Greg's
bk-pci tree which he has now reverted.  That being said:

- That patch will come back sometime

- Fixes like the below make sense anyway and can be merged any time.

- All the fixes which were only applicable when the 64-bit-resource patch
  is present have been sent to Greg for when that patch reemerges.


> --- linux-2.6.11-rc5-mm1/arch/sparc/kernel/ioport.c.orig  2005-03-01 
> 21:11:30.0 +0200
> +++ linux-2.6.11-rc5-mm1/arch/sparc/kernel/ioport.c   2005-03-01 
> 21:12:48.0 +0200
> @@ -54,11 +54,11 @@ static void _sparc_free_io(struct resour
>  
>  /* This points to the next to use virtual memory for DVMA mappings */
>  static struct resource _sparc_dvma = {
> - "sparc_dvma", DVMA_VADDR, DVMA_END - 1
> + .name = "sparc_dvma", .start = DVMA_VADDR, .end = DVMA_END - 1
>  };
>  /* This points to the start of I/O mappings, cluable from outside. */
>  /*ext*/ struct resource sparc_iomap = {
> - "sparc_iomap", IOBASE_VADDR, IOBASE_END - 1
> + .name = "sparc_iomap", .start = IOBASE_VADDR, .end = IOBASE_END - 1
>  };
>  
>  /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Greg KH
On Tue, Mar 01, 2005 at 10:02:43PM +0100, Arjan van de Ven wrote:
> On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
> > On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
> > > Add a routine to kref that allows the kref_put() routine to be
> > > unserialized even when the get routine attempts to kref_get()
> > > an object without first holding a valid reference to it.  This is
> > > useful in situations where this happens multiple times without
> > > freeing the object, as it will avoid having to do a lock/semaphore
> > > except on the final kref_put().
> > > 
> > > This also adds some kref documentation to the Documentation
> > > directory.
> > 
> > I like the first part of the documentation, that's nice.
> > 
> > But I don't like the new kref_get_with_check() function that you
> > implemented.  If you look in the -mm tree, kref_put() now returns if
> > this was the last put on the reference count or not, to help with lists
> > of objects with a kref in it.
> > 
> > Perhaps you can use that to implement what you need instead?
> 
> note that I'm not convinced the "lockless" implementation actually is
> faster. It still uses an atomic variable, which is just as expensive as
> taking a lock normally...

I have never stated it would be "faster" that I know of, and you still
need a lock to protect some of the paths.  But that is documented in my
2004 ols paper about kref.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc32: uninorth-agp suspend support

2005-03-01 Thread Pavel Machek
Hi!

> (This is for -mm, to be merged along with the aty128fb and radeonfb
> related patches).
> 
> This patch adds suspend/resume support to the Apple UniNorth AGP bridge
> to make sure AGP is properly disabled when the machine goes to sleep.
> Without this, the r300 based laptops will fail to wakeup from sleep when
> using the new experimental r300 DRI driver. It should also improve
> reliablility in general with other chips.

> --- linux-work.orig/drivers/char/agp/uninorth-agp.c   2005-03-01 
> 13:53:32.0 +1100
> +++ linux-work/drivers/char/agp/uninorth-agp.c2005-03-01 
> 14:36:54.0 +1100
> @@ -155,6 +161,56 @@
>   uninorth_tlbflush(NULL);
>  }
>  
> +#ifdef CONFIG_PM
> +static int agp_uninorth_suspend(struct pci_dev *pdev, u32 state)

pm_message_t state, please.

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] openfirmware: implements hotplug for macio devices

2005-03-01 Thread Jeffrey Mahoney
This patch adds the hotplug routine for generating hotplug events when
devices are seen on the macio bus. It uses the attributed created by the
sysfs nodes to generate the hotplug environment vars for userspace.

In order for hotplug to work with macio devices, patches to module-init-tools
and hotplug must be applied. Those patches are available at:

ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/

Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]>

diff -rupN -X dontdiff linux-2.6.8/drivers/macintosh/macio_asic.c 
linux-2.6.8.devel/drivers/macintosh/macio_asic.c
--- linux-2.6.8/drivers/macintosh/macio_asic.c  2004-08-14 01:36:45.0 
-0400
+++ linux-2.6.8.devel/drivers/macintosh/macio_asic.c2004-09-16 
17:12:37.242920008 -0400
@@ -126,11 +126,77 @@ static int macio_device_resume(struct de
return 0;
 }
 
+static int macio_hotplug (struct device *dev, char **envp, int num_envp,
+  char *buffer, int buffer_size)
+{
+   struct macio_dev * macio_dev;
+   struct of_device * of;
+   char *scratch, *compat;
+   int i = 0;
+   int length = 0;
+   int cplen, seen = 0;
+
+   if (!dev)
+   return -ENODEV;
+
+   macio_dev = to_macio_device(dev);
+   if (!macio_dev)
+   return -ENODEV;
+
+   of = _dev->ofdev;
+   scratch = buffer;
+
+   /* stuff we want to pass to /sbin/hotplug */
+   envp[i++] = scratch;
+   length += scnprintf (scratch, buffer_size - length, "OF_NAME=%s",
+of->node->name);
+   if ((buffer_size - length <= 0) || (i >= num_envp))
+   return -ENOMEM;
+   ++length;
+   scratch += length;
+
+   envp[i++] = scratch;
+   length += scnprintf (scratch, buffer_size - length, "OF_TYPE=%s",
+of->node->type);
+   if ((buffer_size - length <= 0) || (i >= num_envp))
+   return -ENOMEM;
+   ++length;
+   scratch += length;
+
+   envp[i++] = scratch;
+   length += scnprintf (scratch, buffer_size - length,
+"OF_COMPATIBLE=");
+   if ((buffer_size - length <= 0) || (i >= num_envp))
+   return -ENOMEM;
+   ++length;
+   scratch += length;
+
+   compat = (char *) get_property(of->node, "compatible", );
+   while (compat && cplen > 0) {
+   int l;
+   length += scnprintf (scratch, buffer_size - length,
+"%s%s", seen ? "," : "", compat);
+   if ((buffer_size - length <= 0) || (i >= num_envp))
+   return -ENOMEM;
+   length++;
+   scratch += length;
+   l = strlen (compat) + 1;
+   compat += l;
+   cplen -= l;
+   seen++;
+   }
+
+   envp[i] = NULL;
+
+   return 0;
+
+}
 extern struct device_attribute macio_dev_attrs[];
 
 struct bus_type macio_bus_type = {
.name   = "macio",
.match  = macio_bus_match,
+   .hotplug = macio_hotplug,
.suspend= macio_device_suspend,
.resume = macio_device_resume,
.dev_attrs = macio_dev_attrs,
 };
 
 static int __init macio_bus_driver_init(void)
-- 
Jeff Mahoney
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] openfirmware: generate device table for userspace

2005-03-01 Thread Jeffrey Mahoney
This patch converts the usage of struct of_match to struct of_device_id,
similar to pci_device_id. This allows a device table to be generated, which 
can be parsed by depmod(8) to generate a map file for module loading.

In order for hotplug to work with macio devices, patches to module-init-tools
and hotplug must be applied. Those patches are available at:

ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/

Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]>

diff -rupN -X dontdiff linux-2.6.8/arch/ppc/syslib/of_device.c 
linux-2.6.8.devel/arch/ppc/syslib/of_device.c
--- linux-2.6.8/arch/ppc/syslib/of_device.c 2004-08-14 01:38:10.0 
-0400
+++ linux-2.6.8.devel/arch/ppc/syslib/of_device.c   2004-09-16 
17:12:37.212924568 -0400
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -15,20 +16,20 @@
  * Used by a driver to check whether an of_device present in the
  * system is in its list of supported devices.
  */
-const struct of_match * of_match_device(const struct of_match *matches,
+const struct of_device_id * of_match_device(const struct of_device_id *matches,
const struct of_device *dev)
 {
if (!dev->node)
return NULL;
-   while (matches->name || matches->type || matches->compatible) {
+   while (matches->name[0] || matches->type[0] || matches->compatible[0]) {
int match = 1;
-   if (matches->name && matches->name != OF_ANY_MATCH)
+   if (matches->name[0])
match &= dev->node->name
&& !strcmp(matches->name, dev->node->name);
-   if (matches->type && matches->type != OF_ANY_MATCH)
+   if (matches->type[0])
match &= dev->node->type
&& !strcmp(matches->type, dev->node->type);
-   if (matches->compatible && matches->compatible != OF_ANY_MATCH)
+   if (matches->compatible[0])
match &= device_is_compatible(dev->node,
matches->compatible);
if (match)
@@ -42,7 +43,7 @@ static int of_platform_bus_match(struct 
 {
struct of_device * of_dev = to_of_device(dev);
struct of_platform_driver * of_drv = to_of_platform_driver(drv);
-   const struct of_match * matches = of_drv->match_table;
+   const struct of_device_id * matches = of_drv->match_table;
 
if (!matches)
return 0;
@@ -75,7 +76,7 @@ static int of_device_probe(struct device
int error = -ENODEV;
struct of_platform_driver *drv;
struct of_device *of_dev;
-   const struct of_match *match;
+   const struct of_device_id *match;
 
drv = to_of_platform_driver(dev->driver);
of_dev = to_of_device(dev);
diff -rupN -X dontdiff linux-2.6.8/drivers/i2c/busses/i2c-keywest.c 
linux-2.6.8.devel/drivers/i2c/busses/i2c-keywest.c
--- linux-2.6.8/drivers/i2c/busses/i2c-keywest.c2004-09-16 
17:10:39.329845520 -0400
+++ linux-2.6.8.devel/drivers/i2c/busses/i2c-keywest.c  2004-09-16 
17:12:37.215924112 -0400
@@ -694,7 +694,7 @@ dispose_iface(struct device *dev)
 }
 
 static int
-create_iface_macio(struct macio_dev* dev, const struct of_match *match)
+create_iface_macio(struct macio_dev* dev, const struct of_device_id *match)
 {
return create_iface(dev->ofdev.node, >ofdev.dev);
 }
@@ -706,7 +706,7 @@ dispose_iface_macio(struct macio_dev* de
 }
 
 static int
-create_iface_of_platform(struct of_device* dev, const struct of_match *match)
+create_iface_of_platform(struct of_device* dev, const struct of_device_id 
*match)
 {
return create_iface(dev->node, >dev);
 }
@@ -717,10 +717,9 @@ dispose_iface_of_platform(struct of_devi
return dispose_iface(>dev);
 }
 
-static struct of_match i2c_keywest_match[] = 
+static struct of_device_id i2c_keywest_match[] = 
 {
{
-   .name   = OF_ANY_MATCH,
.type   = "i2c",
.compatible = "keywest"
},
diff -rupN -X dontdiff linux-2.6.8/drivers/ide/ppc/pmac.c 
linux-2.6.8.devel/drivers/ide/ppc/pmac.c
--- linux-2.6.8/drivers/ide/ppc/pmac.c  2004-09-16 17:11:14.056566256 -0400
+++ linux-2.6.8.devel/drivers/ide/ppc/pmac.c2004-09-16 17:12:37.221923200 
-0400
@@ -1279,7 +1279,7 @@ pmac_ide_setup_device(pmac_ide_hwif_t *p
  * Attach to a macio probed interface
  */
 static int __devinit
-pmac_ide_macio_attach(struct macio_dev *mdev, const struct of_match *match)
+pmac_ide_macio_attach(struct macio_dev *mdev, const struct of_device_id *match)
 {
unsigned long base, regbase;
int irq;
@@ -1500,27 +1500,19 @@ pmac_ide_pci_resume(struct pci_dev *pdev
return rc;
 }
 
-static struct of_match pmac_ide_macio_match[] = 
+static struct of_device_id pmac_ide_macio_match[] = 
 {
{
.name   = "IDE",
-   .type   = OF_ANY_MATCH,
-   .compatible

[PATCH 2/3] openfirmware: adds sysfs nodes for openfirmware devices

2005-03-01 Thread Jeffrey Mahoney
This patch adds sysfs nodes that the hotplug userspace can use to load the
appropriate modules.

In order for hotplug to work with macio devices, patches to module-init-tools
and hotplug must be applied. Those patches are available at:

ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/

Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]>

diff -rupN -X dontdiff linux-2.6.8/drivers/macintosh/macio_sysfs.c 
linux-2.6.8.devel/drivers/macintosh/macio_sysfs.c
--- linux-2.6.8/drivers/macintosh/macio_sysfs.c 1969-12-31 19:00:00.0 
-0500
+++ linux-2.6.8.devel/drivers/macintosh/macio_sysfs.c   2004-09-16 
17:12:37.244919704 -0400
@@ -0,0 +1,49 @@
+#include 
+#include 
+#include 
+#include 
+
+
+#define macio_config_of_attr(field, format_string)  \
+static ssize_t  \
+field##_show (struct device *dev, char *buf)\
+{   \
+struct macio_dev *mdev = to_macio_device (dev); \
+return sprintf (buf, format_string, mdev->ofdev.node->field); \
+}
+
+static ssize_t
+compatible_show (struct device *dev, char *buf)
+{
+struct of_device *of;
+char *compat;
+int cplen;
+int length = 0;
+
+of = _macio_device (dev)->ofdev;
+   compat = (char *) get_property(of->node, "compatible", );
+   if (!compat) {
+   *buf = '\0';
+   return 0;
+   }
+   while (cplen > 0) {
+   int l;
+   length += sprintf (buf, "%s%s", length ? "," : "", compat);
+   buf += length;
+   l = strlen (compat) + 1;
+   compat += l;
+   cplen -= l;
+   }
+
+   return length;
+}
+
+macio_config_of_attr (name, "%s");
+macio_config_of_attr (type, "%s");
+
+struct device_attribute macio_dev_attrs[] = {
+   __ATTR_RO(name),
+   __ATTR_RO(type),
+   __ATTR_RO(compatible),
+   __ATTR_NULL
+};
diff -rupN -X dontdiff linux-2.6.8/drivers/macintosh/Makefile 
linux-2.6.8.devel/drivers/macintosh/Makefile
--- linux-2.6.8/drivers/macintosh/Makefile  2004-08-14 01:37:40.0 
-0400
+++ linux-2.6.8.devel/drivers/macintosh/Makefile2004-09-16 
17:12:37.252918488 -0400
@@ -4,7 +4,7 @@
 
 # Each configuration option enables a list of files.
 
-obj-$(CONFIG_PPC_PMAC) += macio_asic.o
+obj-$(CONFIG_PPC_PMAC) += macio_asic.o macio_sysfs.o
 
 obj-$(CONFIG_PMAC_PBOOK)   += mediabay.o
 obj-$(CONFIG_MAC_SERIAL)   += macserial.o

diff -rupN -X dontdiff linux-2.6.8/drivers/macintosh/macio_asic.c 
linux-2.6.8.devel/drivers/macintosh/macio_asic.c
--- linux-2.6.8/drivers/macintosh/macio_asic.c  2004-08-14 01:36:45.0 
-0400
+++ linux-2.6.8.devel/drivers/macintosh/macio_asic.c2004-09-16 
17:12:37.242920008 -0400
@@ -126,10 +126,13 @@ static int macio_device_resume(struct de
return 0;
 }
 
+extern struct device_attribute macio_dev_attrs[];
+
 struct bus_type macio_bus_type = {
.name   = "macio",
.match  = macio_bus_match,
.suspend= macio_device_suspend,
.resume = macio_device_resume,
+   .dev_attrs = macio_dev_attrs,
 };
 
 static int __init macio_bus_driver_init(void)
-- 
Jeff Mahoney
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] openfirmware/macio: implements hotplug for macio devices

2005-03-01 Thread Jeffrey Mahoney

Hello all -

I posted these patches a while ago, and let them fall by the wayside.

The following 3 patches, combined with the userspace patches referenced below,
implement hotplug events for open firmware/macio devices such as apple airport
wireless ethernet cards.

* 01-openfirmware-device-table.diff
  - Converts struct of_match to a MODULE_DEVICE_TABLE-compatible
struct of_device_id
  - Uses the information to generate a device table parsable by
depmod(8)

* 02-openfirmware-sysfs.diff
  - Exports openfirmware variables via sysfs so that coldplug can read and
take appropriate action

* 03-openfirmware-hotplug.diff
  - Adds the hotplug routine for generating hotplug events. Uses the
information published to provide the hotplug environment variables to
userspace.

In addition to the kernel patches, userspace patches for hotplug and
module-init-tools are also required. These patches, including the kernel
patches, are available here:

ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/

I'd appreciate any comments.

-Jeff

-- 
Jeff Mahoney
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Urgent Response

2005-03-01 Thread Rose Okot

Dear Friend,



My names are Rose Okot Opoka. I’m an 18 year old Ugandan girl living in the 
northern part. Our village was raided in November 2002 by the rebel group “the 
lord’s resistance army “. My father was a very wealthy International Coffee 
merchant and on that day, he was brutally beaten and tortured because he 
refused to sponsor the rebels with their evil intents. He was later killed.



 My mother and I were abducted. We were both raped and after they raped my 
mother, they shot one of her legs so that she could not run away and she could 
be forced to give them part of my father’s money. I was very young so they 
didn’t think I could escape. At 15 I was made a sex slave to 5 of the 
commanders. My mother was aware that she might have little time to live, so she 
told me what my inheritance was. My father left me a sum of 4 million US 
Dollars which I can only claim at reaching the age of 18. She asked me to find 
a way to escape from the mean soldiers. She gave me directions to where I could 
find the documents and whom I can contact since I was almost turning 18. I set 
my plans in motion. A month later, my mother was driven and dropped to an 
unknown place since she was not ready to give the Soldiers part of our life’s 
hard earned money. 

To cut the long story short, I managed to escape in July 2004. I found all the 
contacts that I was given.

Since I turned 18 on 31 January 2005, I can now claim that money. So that it 
can help me look for my missing mother, hopefully she’s still alive and also 
continue my education.



Considering the fact that I am not exposed to the world market and investment,  
I am seeking your assistance to help me claim the money because it’s in a 
foreign account in Europe. I would like you to help me transfer it from where 
it is now, to your account.



I’m willing to offer you 15% of the money for your efforts and inputs. I would 
also appreciate it if you can act as my guardian. I will send you all the 
necessary detail and required documents as regards to the transaction, upon 
receipt of your response.



Thank you very much for your anticipated cooperation. Hope to hear from you 
soon. 



Sincerely Yours

Rose Okot Opoka


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Bad page state mapcount

2005-03-01 Thread Hugh Dickins
A small change to the tests for "Bad page state", to avoid one class of
the page_remove_rmap BUG reports, giving more information while letting
the system continue: check page_mapcount (_mapcount != -1) rather than
page_mapped (_mapcount >= 0).

And how does _mapcount go bad?  In the case under study, it looks sure
now that an overheating(?) Pentium III sometimes gets confused by a pair
of instructions in the no-buddy-bitmap __free_pages_bulk, and clears the
PG_private bit from the _mapcount field while buddying around - changing
PG_private value changes the bit cleared from _mapcount.  Bad page state
mapcount:-4096 would have tracked this down much sooner, and will be
recognizable if other cpus show the same aberrant reaction to 2.6.11.

The page_remove_rmap BUG does need to be replaced by more permissive and
informative handling, but I'm not yet ready to to finalize such a patch.

Please admit Colin Harrison to the Order of the Iridescent Penguin,
for his tireless testing.

Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>

--- 2.6.11-rc5-bk4/mm/page_alloc.c  2005-02-24 19:44:06.0 +
+++ linux/mm/page_alloc.c   2005-03-01 19:58:44.0 +
@@ -276,7 +276,7 @@ static inline void __free_pages_bulk (st
 
 static inline void free_pages_check(const char *function, struct page *page)
 {
-   if (page_mapped(page) ||
+   if (page_mapcount(page) ||
page->mapping != NULL ||
page_count(page) != 0 ||
(page->flags & (
@@ -404,7 +404,7 @@ void set_page_refs(struct page *page, in
  */
 static void prep_new_page(struct page *page, int order)
 {
-   if (page->mapping || page_mapped(page) ||
+   if (page->mapping || page_mapcount(page) ||
(page->flags & (
1 << PG_private |
1 << PG_locked  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: 32bit emulation problems

2005-03-01 Thread Bernd Schubert
Hello Andi,

sorry, due to some mail sending/refusing problems, I had to resend to the 
nfs-list, which prevented the answers there to be posted to the other CCs.

> It is most likely some kind of user space problem.  I would change
> it to int err = stat(dir, );
> and then go through it with gdb and see what value err gets assigned.
>
> I cannot see any kernel problem.

The err value will become -1 here.

 Trond Myklebust already suggested to look at the results of errno:

On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
> On Monday 28 February 2005 23:26, you wrote:
> > Given that strace shows that both syscalls (stat64() and stat())
> > succeed, I expect the "problem" is probably just glibc setting an
> > EOVERFLOW error in the 32-bit case. That's what it is supposed to do if
> > a 64 bit value overflows the 32-bit buffers.
>
> Right, thanks.
>
> > Have you tried looking at errno?
>
> [EMAIL PROTECTED] tests>./test_stat32 /mnt/test/yp
> stat for /mnt/test/yp failed
> ernno: 75 (Value too large for defined data type)
>
> But why does stat64() on a 64-bit kernel tries to fill in larger data than
> on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
> will tomorrow compare the tcp-packges sent by the server.

So I still think thats a kernel bug.


Thanks,
 Bernd

-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Arjan van de Ven
On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
> On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
> > Add a routine to kref that allows the kref_put() routine to be
> > unserialized even when the get routine attempts to kref_get()
> > an object without first holding a valid reference to it.  This is
> > useful in situations where this happens multiple times without
> > freeing the object, as it will avoid having to do a lock/semaphore
> > except on the final kref_put().
> > 
> > This also adds some kref documentation to the Documentation
> > directory.
> 
> I like the first part of the documentation, that's nice.
> 
> But I don't like the new kref_get_with_check() function that you
> implemented.  If you look in the -mm tree, kref_put() now returns if
> this was the last put on the reference count or not, to help with lists
> of objects with a kref in it.
> 
> Perhaps you can use that to implement what you need instead?

note that I'm not convinced the "lockless" implementation actually is
faster. It still uses an atomic variable, which is just as expensive as
taking a lock normally...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-fbdev-devel] Re: 2.6.11-rc5, rivafb i2c oops, bogus error handling

2005-03-01 Thread Olaf Hering
 On Mon, Feb 28, Antonino A. Daplas wrote:

> On Monday 28 February 2005 04:32, Olaf Hering wrote:
> >  On Wed, Feb 23, Linus Torvalds wrote:
> > > This time it's really supposed to be a quickie, so people who can, please
> > > check it out, and we'll make the real 2.6.11 asap.
> >
> > Here is another one, probably not new.
> > Is riva_get_EDID_i2c a bit too optimistic by not having a $i2cadapter_ok
> > member in riva_par->riva_i2c_chan? It calls riva_probe_i2c_connector
> > even if riva_create_i2c_busses fails to register all 3 busses.

Side note:

<[EMAIL PROTECTED]>: connect to
lists.surfsouth.com[216.128.200.12]: Connection timed out

Is this one supposed to work? perhaps the MAINTAINERS entry needs an
update.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >