from:"Martin Diehl"

Re: forcibly unmap pages in driver?

2001-06-06 Thread Martin Diehl

On Tue, 5 Jun 2001, Pete Wyckoff wrote:

> [EMAIL PROTECTED] said:
> > user-space program is still running. I need to remove the user-space
> > mapping -- otherwise the user process would still have access to the
> > now-freed pages. I need an inverse of remap_page_range().
> 
> That seems a bit perverse.  How will the poor userspace program know
> not to access the pages you have yanked away from it?  If you plan
> to kill it, better to do that directly.  If you plan to signal it
> that the mapping is gone, it can just call munmap() itself.

I can very well imagine situations where you want to unmap a buffer 
beneath the userspace program to make it dying when not following the
rules - see below.

> However, do_munmap() will call zap_page_range() for you and take care of
> cache and TLB flushing if you're going to do this in the kernel.
> 
> Your driver mmap function is called by do_mmap_pgoff() which takes
> care of those issues, and there is no (*munmap) in file_operations---
> perhaps you are the first driver writer to want to unmap in the kernel.

Well, I don't know whether he is the first one, but if so, I'm probably
the other one thinking about (temporarily) unmapping mmap(2)ed memory
directly on behalf of the kernel. The idea is as follows:

Let's assume the mmap'ed buffer should be either accessible by userland
or (mutually exclusive) dedicated to I/O or DMA. So you need some
synchronisation like userland calling ioctl(2) to release the buffer and
blocks on select(2) or read(2) to get buffer access re-granted after DMA
has finished. Let's further assume you want to make absolutely sure the
userland app would really _never_, under no circumstance, access the
buffer at the wrong moment where it would read corrupted data, i.e. after
the ioctl() but before the file gets ready for nonblocking read again.

Well, you could validate all the paths in the app that might access the
mmap'ed buffer at some point and make sure this is either impossible or
will be detected somehow. Furthermore, for any future change you have to
do some re-validation... --- or, you simply make the mapping inaccessible
inside the critical window so the application would immediately die if it
would violate the rule. Note that it is not sufficient to require the app
to do the unmap itself after calling ioctl() because this would not solve
the problem - so you pay two syscalls for nothing.

I believe this is a good reason to do the unmap in the drivers'
ioctl() and remap again after DMA has finished. Basically, one could also
temporarily switch to PROT_NONE to get the same behaviour, but I
personally prefer unmapping due to more flexibility.

Although not tested so far, I think it should be possible by simply
calling do_munmap(), which is exported by mm/mmap.c. AFAICS the only thing
to take care of is the correct handling of mm->mmap_sem. Question remains
however, whether there is any reason not to do so, wrt. future changes
for example.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.2-ac27

2001-03-29 Thread Martin Diehl

(Linus cc'ed - related thread: 243-pre[78]: mmap changes (breaks) /proc)

On Wed, 28 Mar 2001, Alan Cox wrote:

> 2.4.2-ac27
> o Revert mmap change that broke assumptions (and  (Martin Diehl)
>   it seems SuS) 

the reason to suggest keeping the test was not due to len=0 behaviour of
mmap in general as is suggested by your comment. The breakage that I've
seen was due to mmap not returning -ENODEV for files from /proc despite
the lack of valid f_op->mmap (because the test was moved behind the len==0
check). The point is sed(1) first tries to mmap(2) the file and falls back
to read(2) in case of -ENODEV (and probably other errors too). This is
important for /proc since most files there are stat'ed size=0 but return
stuff when reading. Not getting error for mmap len=0 file makes sed assume
EOF. Anyway, reverting it was not addressed to cases where f_op->mmap is
valid but request is to mmap len=0 - we still return the startaddr 
parameter in that case:

$ touch nullfile
$ strace sed 's/./X/' nullfile
open("nullfile", O_RDONLY|O_LARGEFILE)  = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 0, PROT_READ, MAP_PRIVATE, 4, 0) = 0

This is consistent throughout all 2.4.x at least. From your comment I've
learnt SuS v2 requires -ENODEV for the len=0 case. While this would
resolve the /proc issue as well there might be some chance to brake code
which expects mmap(len=0) to succeed.
BTW, man-pages (1.31) say, mmap(2) returns -EINVAL if called with bad
start/length/offset values - but makes no claims whether len=0 would be
valid or not.

In case we want to follow what you've said about SuS, the right thing
might simply go along

--- linux-243-pre8/mm/mmap.cWed Mar 28 13:14:19 2001
+++ linux-243p8-md/mm/mmap.cThu Mar 29 09:49:34 2001
@@ -204,8 +204,12 @@
int correct_wcount = 0;
int error;

+   /* We need to error mmaps of 0 length. The apps rely on this and
+  SuS v2 says that we return -ENODEV in this case without mentioning
+  returning 0 for 0 length mmap */
+
if ((len = PAGE_ALIGN(len)) == 0)
-   return addr;
+   return -ENODEV;

if (len > TASK_SIZE || addr > TASK_SIZE-len)
return -EINVAL;

I really have no idea about what, if any, code might rely on old
beahviour. Some commonly used tools may get confused with stuff from
/var/lock/subsys tree for example. So probably it's better not to field
this change before 2.5 to be able to identify such things in time.
In this case we could add some clarification to your comment, saying that
returning error on len=0 is a future thing, leave the current len=0
semantics untouched, keep the f_op->mmap test at the very beginning (and
drop the moved one a few lines below).

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.2-ac27

2001-03-28 Thread Martin Diehl

(Linus cc'ed - related thread: 243-pre[78]: mmap changes (breaks) /proc)

On Wed, 28 Mar 2001, Alan Cox wrote:

> 2.4.2-ac27
> o Revert mmap change that broke assumptions (and  (Martin Diehl)
>   it seems SuS) 

the reason to suggest keeping the test was not due to len=0 behaviour of
mmap in general as is suggested by your comment. The breakage that I've
seen was due to mmap not returning -ENODEV for files from /proc despite
the lack of valid f_op->mmap (because the test was moved behind the len==0
check). The point is sed(1) first tries to mmap(2) the file and falls back
to read(2) in case of -ENODEV (and probably other errors too). This is
important for /proc since most files there are stat'ed size=0 but return
stuff when reading. Not getting error for mmap len=0 file makes sed behave
like at EOF. Anyway, reverting it was not addressed to cases where
f_op->mmap is valid but request is to mmap len=0 - we still return the 
startaddr parameter in that case:

$ touch nullfile
$ strace sed 's/./X/' nullfile
open("nullfile", O_RDONLY|O_LARGEFILE)  = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 0, PROT_READ, MAP_PRIVATE, 4, 0) = 0

This is consistent throughout all 2.4.x at least. From your comment I've
learnt SuS v2 requires -ENODEV for the len=0 case. While this would
resolve the /proc issue as well there might be some chance to brake code
which expects mmap(len=0) to succeed.
BTW, man-pages (1.31) say, mmap(2) returns -EINVAL if called with bad
start/length/offset values - but makes no claims whether len=0 would be
valid or not.

In case we want to follow what you've said about SuS, the right thing
might simply go along

--- linux-243-pre8/mm/mmap.cWed Mar 28 13:14:19 2001
+++ linux-243p8-md/mm/mmap.cThu Mar 29 09:49:34 2001
@@ -204,8 +204,12 @@
int correct_wcount = 0;
int error;

+   /* We need to error mmaps of 0 length. The apps rely on this and
+  SuS v2 says that we return -ENODEV in this case without mentioning
+  returning 0 for 0 length mmap */
+
if ((len = PAGE_ALIGN(len)) == 0)
-   return addr;
+   return -ENODEV;

if (len > TASK_SIZE || addr > TASK_SIZE-len)
return -EINVAL;

I really have no idea about what, if any, code might rely on old
beahviour. Some commonly used tools may get confused with stuff from
/var/lock/subsys tree for example. So probably it's better not to field
this change before 2.5 to be able to identify such things in time.
In this case we could add some clarification to your comment, saying that
returning error on len=0 is a future thing, leave the current len=0
semantics untouched, keep the f_op->mmap test at the very beginning (and
drop the moved one a few lines below).

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

243-pre[78]: mmap changes (breaks) /proc

2001-03-28 Thread Martin Diehl



Hi Linus,

I had some strange userland/proc problems appearing during sysinit.
Symptoms are "Malformed setting kernel.printk=" error message from
sysctl(8) and hanging linuxconf (SAK resolves this). The common thing
are several sed(1) calls silently failing when matching against files
from /proc. cat'ting the /proc-files works as usual. Copying its contents
to some file on ext2 and executing the same sed-command on it works
as expected, i.e. ok in contrast to direct execution on /proc.

Using "sed 's/^\(.\).*/\1/' /proc/sys/kernel/printk" as test case I've
tracked this down as follows:

strace'ing the above command for 2.4.0 ... 2.4.3-pre6 yields

open("/proc/sys/kernel/printk", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 0, PROT_READ, MAP_PRIVATE, 4, 0) = -1 ENODEV (No such device)

whereas for 2.4.3-pre[78] one gets

open("/proc/sys/kernel/printk", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 0, PROT_READ, MAP_PRIVATE, 4, 0) = 0

i.e. mmap used to return ENODEV and now returns success having mapped
len=0 bytes from the file on procfs - which triggers the sed failure.
(glibc is 2.1.3 and sed is 3.02 in case it matters)

This change is apparently due to the merge-mmap-patch which provides:

mm/mmap.c:
-   if (file && (!file->f_op || !file->f_op->mmap))
-   return -ENODEV;
 
Putting these two lines back in place restores the old behaviour thus
solving the problem. Although not tested, my impression is all files
not implementing f_op->mmap might be affected a similar way - at least 
potentially. On the other hand, anonymous mappings have file=0 so they are
not catched by the above test. Therefore, unless the change was 
intentional, my suggestion is to keep the test for valid f_op->mmap
returning -ENODEV.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

issue with 243-pre8: strange userland/proc breakage

2001-03-27 Thread Martin Diehl



Hi,

Something not so obvious (at least for me ;-) seems to be broken in
userland. Could narrow it down to triggers like failing sed, for example
(from /etc/rc.d/init.d/usb)

PKLVL=`sed 's/^\(.\).*/\1/' < /proc/sys/kernel/printk`

Another probably related thing happens when rc.sysinit sed's from
/proc/cmdline.

Common in both cases:
sed operation from /proc-tree, no error from sed, but failed to work as
expected - i.e. instead of returning the first character/number from
kernel.printk sysctl it returns empty string. For 2.4.0 it is ok.
Simple "cat /proc/sys/kerl/printk" doesn't show any difference.

symptoms:
- Box hangs at rc.sysinit when broken "sed ... /proc/cmdline" tries to
  start linuxconf. Probably some kind of deadlock: all processes sleeping,
  according to SysRq: PC always in cpu_idle. Hitting SAK solves the
  issue: booting finishes without any observable degradation.

- "Malformed setting kernel.printk=" error message from sysctl(8) when
  starting/stopping usb. No harm, simply fails playing games with
  kernel.printk.

System is K6-II UP. Using egcs-2.91.66 or gcc-2.95.3(pre) doesn't make any
difference. Playing with .config to exclude fb(ati64), devfs, scsi, bttv,
usb didn't change anything either.
Unfortunately the step from 2.4.0 is quite big and I had no time yet to
narrow it down - going to try later.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

issue with 243-pre8: undefined symbols from net_init.c

2001-03-27 Thread Martin Diehl



Hi,

for me, vanilla 2.4.3-pre8 refuses to load NIC-modules (8390,ne2k-pci) due
to unresolved symbol: ether_setup
might be related to latest changes at drivers/net/net_init.c

from /proc/ksyms I get:

c024dc8c kbd_ledfunc_Rfa67cc5f
c01ebf8c keyboard_tasklet_R28aa0faa
c024dc84 sysrq_power_off_R0c257849
c0166da4 __VERSIONED_SYMBOL(init_etherdev)
c0166dc0 __VERSIONED_SYMBOL(alloc_etherdev)
c0166e38 __VERSIONED_SYMBOL(ether_setup)
c0166ec4 __VERSIONED_SYMBOL(register_netdev)
c0166f34 __VERSIONED_SYMBOL(unregister_netdev)
c0167030 autoirq_setup_R5a5a2280
c016703c autoirq_report_R84530c53
c024f080 ide_hwifs_R11123430

All the bogus __VERSIONED_SYMBOL stuff is from net_init.c

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IDE on 2.4.2

2001-03-13 Thread Martin Diehl

On Mon, 12 Mar 2001, Steven Walter wrote:

> The big man himself, Andre Hedrick, has stated that the SiS5513 should
> work in UDMA/66 mode, as is evidenced by my setup.

right, but depending on the chipset that provides the SiS5513 function

> SIS5513: chipset revision 208
> SIS5513: not 100% native mode: will probe irqs later
> SiS530

So you have a SiS530 northbridge which does provide UDMA66 support. The
SiS5591 and SiS5597 however are not specified for ATA66 and sis5513.c
does not support UDMA66 for them. What makes me somewhat wondering
is that all these chipsets apparently report the same SiS5513 rev. d0. The
datasheet for the SiS5591 however only talks about UDMA33 and marks the
UDMA66 related bits reserved. Look like there were different flavours of
the same PCI vendor/device/revision combination for the IDE controler
function included in the different chipsets.
Anyway, the problem is not to get UDMA66 running on chipsets for which it
might work although not specified, but to solve the "hang during cd-drive
initialisation when sis5513-autotuning enabled" issue.

> And, as you've requested, here is the lspci output from my system, which
> is working and in UDMA66.
[..]

well, it shows some differences wrt. non-ATA66 related bits that are
marked "reserved" in the SiS5591 datasheet. But including some fixup code
into the pci_init_sis5513() doesn't help. So chances are this is a silicon
related issue or similar. My feeling is it's not worth the effort to go
any further because there is a simple workaround: either not including
CONFIG_BLK_DEV_SIS5513 or disabling the autotune on cdrom-drives prevents
the hang (at least for me). Using hdparm provides reasonable ATA33
performance AFAICS.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IDE on 2.4.2

2001-03-11 Thread Martin Diehl

On Sun, 11 Mar 2001, Steven Walter wrote:

> > on insmod). This is with SiS5513 rev 208 IDE function from SiS5591
> > chipset with CONFIG_BLK_DEV_SIS5513 and autotune enabled (default).
> 
> I have this exact same chip on my board (a PCChips M599-LMR or something
> like that) which works flawlessly on 2.4.2, even with UDMA66.

Do you have CONFIG_BLK_DEV_SIS5513 and autotuning enabled at the
same time? Unless I enable them both it works flawlessly for me too - up
to UDMA33. In fact, I've never seen any docs claiming the 5591/5513 would
even provide UDMA66 support. How do you program the controler to do UDMA66
cycles?
Anyway, might be interesting to have a look at your lspci -d:5513 -vvvxxx
report from working UDMA33/66 setups!

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: broken(?) Lucent Venus chipset and Linux 2.4

2001-03-11 Thread Martin Diehl


On Fri, 9 Mar 2001, W. Michael Petullo wrote:

> If you have or have access to a Linux box with a Venus-based modem,
> answering any of these questions would be very helpful:
> 

Well, I'm not absolutely sure if we are talking about the same thing: what 
I have is a re-labeled PC-Card modem which identifies according to

cardctl ident:
  product info: "LUCENT-VENUS", "PCMCIA 56K DataFax"
  manfid: 0x0200, 0x0001
  function: 2 (serial)

ATI:
Venus K56FLEX V.90 kfav163 PCMCIA p52198

> o Does your modem work flawlessly with Linux 2.4?

Yes, for me it does - with the standard serial.c driver (and
PCMCIA's serial_cs of course) under Linux 2.0/2.2/2.4.

HTH.
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IDE on 2.4.2

2001-03-11 Thread Martin Diehl

On Fri, 9 Mar 2001, Lawrence MacIntyre wrote:

> Uniform MultiPlatform E-IDE driver Revision 6.31
> ide: assuminmg 33 MHz system bus speed for PIO modes: override with
> idebus=xx
> SIS5513: IDE controller on PCI bus 00 dev 09
> PCI: Assigned IRQ 14 for device 00:01.1
> SIS5513: chipset revision 208
> SIS5513: not 100% native mode: will probe irqs later
> SIS5597
> ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio
> ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
> hda: Maxtor 90640D4, ATA DISK drive
> hdc: CD-ROM CDU55E, ATAPI CD/DVD-ROM drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> 
> At this point, the machine hangs...

interesting, I see the same thing except it hangs not before the disk
drives are initialized but afterwards, when initializing the CD-ROM
drives. (Compiling ide-cd as module permits successful boot but hangs
on insmod). This is with SiS5513 rev 208 IDE function from SiS5591
chipset with CONFIG_BLK_DEV_SIS5513 and autotune enabled (default).

For me, the workaround is disabling either one of the above (i.e. not
including SiS5513 support in the kernel or append'ing "hdx=noautotune"
for the cdrom-drives) and everything is fine again. You may want to use
hdparm to get udma2 working. Doing so provides relyable >14MB/s for a
5.4kRPM drive in UDMA(33), so my impression is this is only a tuning 
issue.

> PCI devices found:
>   Bus  0, device   0, function  0:
> Host bridge: Silicon Integrated Systems 5597/5598 Host (rev 2).
>   Medium devsel.  Master Capable.  Latency=32.  
>   Bus  0, device   1, function  0:
> ISA bridge: Silicon Integrated Systems 85C503 (rev 1).
>   Medium devsel.  Master Capable.  No bursts.  
>   Bus  0, device   1, function  1:
> IDE interface: Silicon Integrated Systems 85C5513 (rev 208).
>   Fast devsel.  IRQ 14.  Master Capable.  Latency=32.  
>   I/O at 0xe400 [0xe401].
>   I/O at 0xe000 [0xe001].
>   I/O at 0xd800 [0xd801].
>   I/O at 0xd400 [0xd401].
>   I/O at 0xd000 [0xd001].

I'm not absolutely sure, but I'm wondering why the driver enabled all
BAR's including the relocateable port areas which are useful in native
mode only. IMHO the driver should force compatibility mode. For me, only 
the last BAR containing the BM registers at 0xd000 is enabled.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] minor ne2k-pci irq fix

2001-02-02 Thread Martin Diehl

(apologies in case anybody should get this twice - was catched by the DUL
blocker again. Seems time to change my mail routing anyway...)

On Thu, 1 Feb 2001, Jeff Garzik wrote:

> > Probably I've missed this because the last time I hit such a thing was
> > when my ob800 bios mapped the cardbus memory BAR's into bogus legacy
> > 0xe area. Hence there was good reason to read and correct this before
> > trying to enable the device.
> 
> This is a PCI fixup, the driver shouldn't have to worry about this..

Agreed. Point was when I discovered the broken BAR location the BIOS had
set, it was at late 2.4.0-test12. So I prefered a simple fix in the yenta
driver without touching other stuff like PCI, just in case Linus would
have liked it for 2.4.

> > BTW, will it ever happen the kernel starts remapping BAR's when enabling
> > devices?
> 
> huh?  The two steps do not occur simultaneously.  The enabling should
> occur first, at which point the BARs should be useable.  The remapping
> occurs after that.  If the BARs are not usable after remapping, that is
> a PCI quirk that needs to be added to the list [probably].

Sorry, wasn't clear enough. I've meant, the kernel (PCI stuff) changing
the BAR bus address in the config space when enabling the device (i.e.
the bus address value which is used for later mapping). Doing so would
make the pci_resource_start() value bogus (when obtained before enabling
the device) - even without accessing/ioremap() it.
My guess is this might happen, but I'm not sure when. Probably if our PCI
stuff assigned another BAR without inital bus address to overlap with
what the BIOS suggested for some initially disabled BAR. Or for real PCI
hotplugging in general.

Just to understand it's more than a cosmetical bug if a driver saves this
before the device is up...

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI IRQ routing problem in 2.4.0 (updated patch)

2001-01-31 Thread Martin Diehl

(cc's shortened, not to trash Linus et al)

On Thu, 1 Feb 2001, Robert Siemer wrote:

> Is it possible to directly ask the 'IRQ-router' (namely the
> ISA-bridge) for what it is set up for? - I mean which IRQ is routed to
> what without the help of the BIOS?

It's written in the PCI config registers of the router. That's what I've
tried to document in the patch according to the chipset datasheet.
The BIOS in contrast uses link values, which are vendor-specific,
undocumented and sometimes wrong ;-)
But we have to rely on these unless we have the chipset docs to make it
better - hopefully.

> There is a BIOS update for my board out there. Are you interested in
> the difference? - I would give it a try.

Might be intresting _if_ you find something unexpected like new link
values. But I don't expect any surprise. You should end up with something
similar to Aaron - including the misleading mutual IDE/USB conflict
warning. But everythin fine.

> What is the relation between IRQ routing in the ISA-brigde and the
> APIC?

APIC is a different approach to route IRQ's which is used on PII based
systems and newer (IIRC). So it doesn't matter in your case.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] minor ne2k-pci irq fix

2001-01-31 Thread Martin Diehl

On Wed, 31 Jan 2001 [EMAIL PROTECTED] wrote:

> I think it would be better to move the pci_enable_device(pdev);
> above all this, as we should enable the device before reading the
> pdev->resource[] too iirc.

Probably I've missed this because the last time I hit such a thing was
when my ob800 bios mapped the cardbus memory BAR's into bogus legacy
0xe area. Hence there was good reason to read and correct this before
trying to enable the device.
The normal case however would be like you've suggested, IMHO.

BTW, will it ever happen the kernel starts remapping BAR's when enabling
devices?

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] minor ne2k-pci irq fix

2001-01-31 Thread Martin Diehl



Hi,

while testing the SiS routing code, at some point my ne2k-pci didn't work
anymore. Was when BIOS was set for PnP OS and IRQ set to "auto" for the
NIC. Had to rmmod/insmod the ne2k-pci module after reboot to make it
working again.

Reason: we fetched the irq too early, before calling pci_enable_device(),
so it was bogus after initial routing.
Patch below (prepared for 2.4.0 - should be fine for 2.4.1 too).

Regards
Martin

-

--- linux-2.4.0/drivers/net/ne2k-pci.c.orig Tue Jan 30 23:21:48 2001
+++ linux-2.4.0/drivers/net/ne2k-pci.c  Tue Jan 30 23:22:35 2001
@@ -203,7 +203,6 @@
printk(KERN_INFO "%s" KERN_INFO "%s", version1, version2);
 
ioaddr = pci_resource_start (pdev, 0);
-   irq = pdev->irq;
 
if (!ioaddr || ((pci_resource_flags (pdev, 0) & IORESOURCE_IO) == 0)) {
printk (KERN_ERR "ne2k-pci: no I/O resource at PCI BAR #0\n");
@@ -213,6 +212,7 @@
i = pci_enable_device (pdev);
if (i)
return i;
+   irq = pdev->irq;
 
if (request_region (ioaddr, NE_IO_EXTENT, "ne2k-pci") == NULL) {
printk (KERN_ERR "ne2k-pci: I/O resource 0x%x @ 0x%lx busy\n",


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test12: SiS pirq handling..

2001-01-31 Thread Martin Diehl

On Mon, 29 Jan 2001, Linus Torvalds wrote:

> Now, will the two people in the world who know the pirq black magic now
> stand up and confirm or deride my interpretation?

since I'm certainly not the other one, I'm not going to confirm it ;-)
But, besides it sounds reasonable, I could give some more ammu:

my IDE controller is located in the SiS5591 hostbridge (device 00:00)
and the BIOS didn't provide a routing table entry for this device.
Hence, instead of the confusing conflict messages, I get:

SIS5513: IDE controller on PCI bus 00 dev 01
IRQ for 00:00.1:0 -> not found in routing table

which you may take for a vice-versa prove of your explanation.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI IRQ routing problem in 2.4.0 (updated patch)

2001-01-31 Thread Martin Diehl

On Tue, 30 Jan 2001, Robert Siemer wrote:

> > Below is the updated patch. It should handle both (0x01/0x41
> > like) mappings. I can (and did) only test the 0x01 case.
> > USBIRQ routing (0x62) supported, IDE/ACPI/DAQ untouched.
> 
> I don't really understand your note above, but your patch alone does
> not fix my problem. - Linus diff over pci-irq.c does.

Yes, I know - in fact it couldn't, because your BIOS' irq routing is not
only subject to the 0x01/0x41 ambiguity but also wrong wrt. to the USBIRQ,
which gets routed using link value 0x62 in any other case but yours.

Your routing table is:

00:0c slot=01 0:01/1eb8 1:02/1eb8 2:03/1eb8 3:04/1eb8
00:0b slot=02 0:02/1eb8 1:03/1eb8 2:04/1eb8 3:01/1eb8
00:0a slot=03 0:03/1eb8 1:04/1eb8 2:01/1eb8 3:02/1eb8
00:09 slot=04 0:04/1eb8 1:01/1eb8 2:02/1eb8 3:03/1eb8
00:01 slot=00 0:01/1eb8 1:02/1eb8 2:03/1eb8 3:04/1eb8 >>> no 0x62 here!
00:13 slot=00 0:01/1eb8 1:02/1eb8 2:03/1eb8 3:04/1eb8

suggesting the ISA-bridge (00:01) would be routed exactly like a normal
PCI device, namely your SCSI-HA in slot 1. Since the ISA-bridge provides
both IDE und USB function and they pretend to use pin A, all the kernel
can do is believe the BIOS and so these 3 devices end up unseparable from
each other on link/pirq 0x01 - which we assign to some IRQ when needed.
Your USB however *is* already routed by the BIOS to IRQ 9 using link/pirq
value 0x62, which makes the BIOS provided routing table really crap:

00:01.0 ISA bridge: Silicon Integrated Systems [SiS] 85C503/5513 (rev 01)
60: ff 80 49 00 88 00 00 02 00 80 80 00 20 19 00 00
  ^^
Linus' patch helps you, because it makes us trusting the device's config
space over the routing table. Probably a good idea as long as BIOS'es
wouldn't start to set wrong values in config space too...

So, unless you get a working BIOS update there is no way to get it right.

Another solution might be to put your NIC into slot 1 and configure your
BIOS to share the NIC's IRQ with USB. This way you would set up the
system exactly the same way, your BIOS is cheating the kernel.

> The kernel still does not think what the bios states; it's like the
> vanilla 2.4.0 in this regard. (--> on my box: kernel panic after

in fact vanilla 2.4.0 did believe what the bios states, namely the broken
routing table. It didn't believe however what the devices config space
reports - which turned out to be correct.

You should be happy with 2.4.1 which contains both Linus' and the
0x01/0x41 fix.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI IRQ routing problem in 2.4.0 (updated patch)

2001-01-29 Thread Martin Diehl


On Mon, 29 Jan 2001, Linus Torvalds wrote:

>   reg = pirq;
>   if (reg < 5)
>   reg += 0x40;

or adding the 0x41..0x44 cases to the switch statement in my patch?

> > BTW: I was wondering, why we did not update the PCI_INTERRUPT_LINE in
> 
> I would prefer _not_ to see this.
> 
> Why? Because it's (a) real information what the PCI config space was, and
> it might help debug things in the future. And (b) I've seen to many broken
> BIOSes that do not re-initialize hardware fully over a soft boot, that I
> worry that you'll get different behaviour after doing a "shutdown -r" with
> this.

Ok, good reason, I believe - so I've dropped this again.

Below is the updated patch. It should handle both (0x01/0x41
like) mappings. I can (and did) only test the 0x01 case.
USBIRQ routing (0x62) supported, IDE/ACPI/DAQ untouched.

Martin

-

--- linux-2.4.0/arch/i386/kernel/pci-irq.c.orig Mon Jan  8 14:45:35 2001
+++ linux-2.4.0/arch/i386/kernel/pci-irq.c  Mon Jan 29 19:56:44 2001
@@ -234,22 +234,107 @@
return 1;
 }
 
+/*
+ * PIRQ routing for SiS 85C503 router used in several SiS chipsets
+ * According to the SiS 5595 datasheet (preliminary V1.0, 12/24/1997)
+ * the related registers work as follows:
+ * 
+ * general: one byte per re-routable IRQ,
+ *  bit 7  IRQ mapping enabled (0) or disabled (1)
+ *  bits [6:4] reserved
+ *  bits [3:0] IRQ to map to
+ *  allowed: 3-7, 9-12, 14-15
+ *  reserved: 0, 1, 2, 8, 13
+ *
+ * individual registers in device config space:
+ *
+ * 0x41/0x42/0x43/0x44:PCI INT A/B/C/D - bits as in general case
+ *
+ * 0x61:   IDEIRQ: bits as in general case - but:
+ * bits [6:5] must be written 01
+ * bit 4 channel-select primary (0), secondary (1)
+ *
+ * 0x62:   USBIRQ: bits as in general case - but:
+ * bit 4 OHCI function disabled (0), enabled (1)
+ * 
+ * 0x6a:   ACPI/SCI IRQ - bits as in general case
+ *
+ * 0x7e:   Data Acq. Module IRQ - bits as in general case
+ *
+ * Apparently there are systems implementing PCI routing table using both
+ * link values 0x01-0x04 and 0x41-0x44 for PCI INTA..D, but register offsets
+ * like 0x62 as link values for USBIRQ e.g. So there is no simple
+ * "register = offset + pirq" relation.
+ * Currently we support PCI INTA..D and USBIRQ and try our best to handle
+ * both link mappings.
+ * IDE/ACPI/DAQ mapping is currently unsupported (left untouched as set by BIOS).
+ */
+
 static int pirq_sis_get(struct pci_dev *router, struct pci_dev *dev, int pirq)
 {
u8 x;
-   int reg = 0x41 + (pirq - 'A') ;
+   int reg = pirq;
 
-   pci_read_config_byte(router, reg, &x);
+   switch(pirq) {
+   case 0x01:
+   case 0x02:
+   case 0x03:
+   case 0x04:
+   reg += 0x40;
+   case 0x41:
+   case 0x42:
+   case 0x43:
+   case 0x44:
+   case 0x62:
+   pci_read_config_byte(router, reg, &x);
+   if (reg != 0x62)
+   break;
+   if (!(x & 0x40))
+   return 0;
+   break;
+   case 0x61:
+   case 0x6a:
+   case 0x7e:
+   printk("SiS pirq: advanced IDE/ACPI/DAQ mapping not yet 
+implemented\n");
+   return 0;
+   default:
+   printk("SiS router pirq escape (%d)\n", pirq);
+   return 0;
+   }
return (x & 0x80) ? 0 : (x & 0x0f);
 }
 
 static int pirq_sis_set(struct pci_dev *router, struct pci_dev *dev, int pirq, int 
irq)
 {
u8 x;
-   int reg = 0x41 + (pirq - 'A') ;
+   int reg = pirq;
 
-   pci_read_config_byte(router, reg, &x);
-   x = (pirq & 0x20) ? 0 : (irq & 0x0f);
+   switch(pirq) {
+   case 0x01:
+   case 0x02:
+   case 0x03:
+   case 0x04:
+   reg += 0x40;
+   case 0x41:
+   case 0x42:
+   case 0x43:
+   case 0x44:
+   case 0x62:
+   x = (irq&0x0f) ? (irq&0x0f) : 0x80;
+   if (reg != 0x62)
+   break;
+   /* always mark OHCI enabled, as nothing else knows about this 
+*/
+   x |= 0x40;
+   break;
+   case 0x61:
+   case 0x6a:
+   case 0x7e:
+   printk("advanced SiS pirq mapping not yet implemented\n");
+   return 0;
+   default:

Re: PCI IRQ routing problem in 2.4.0 (fwd)

2001-01-29 Thread Martin Diehl

Apparantly this didn't get thru since my mailer was blocked due to
dialup-blacklists (never observed this bevor on l-k!)

so I try to resend it.

Martin

-- Forwarded message --
Date: Mon, 29 Jan 2001 18:22:43 +0100 (CET)
From: Martin Diehl <[EMAIL PROTECTED]>
To: Jeff Garzik <[EMAIL PROTECTED]>
Cc: Linus Torvalds <[EMAIL PROTECTED]>,
 Robert Siemer <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: PCI IRQ routing problem in 2.4.0

On Mon, 29 Jan 2001, Jeff Garzik wrote:

> And what what we're seeing in this thread, it looks like there are
> two different types of SiS link values from two different BIOSen;
> or perhaps one BIOS is using some of the link value bits for other
> purposes.

Right, seems the 0x41/0x01 thing. I have the 0x01 case with SiS 85C503
router rev. 01. Hopefully the 0x41 boards have a different revision. My
fear however is, this is due to BIOS implementation of the routing table.

Using the docs of the 85C503 function from the SiS5595 southbridge
datasheet I've written a patch to get things right - at least for the 0x01
case. The mapping on my box appears as follows:

link/pirq value   config-reg   function
0x01/0x02/0x03/0x04   0x41/0x42/0x43/0x44PCI INTA..D
0x61  0x61   5513 onboard IDE
0x62  0x62   onboard USB (OHCI)
0x6a  0x6a   onboard ACPI
0x7e  0x7e   onboard data acquisition

The patch below deals with the PCI INT and USB case and works fine on my
box. It will fail (printk) on the 0x41 case.
So please, everyone with SiS chipset (seems all of them using the same
router function): Test it - regardless whether having problems or not.
Probably we'll need a kernel parameter to specify which mapping to use.

More details as commented in the code.

Attached: dmesg/lspci/dump_pirq output with BIOS-setting PNP OS=yes and
USB (enabled, auto-irq) to see the router at work (PCI-DEBUG).

BTW: I was wondering, why we did not update the PCI_INTERRUPT_LINE in
config space when we re-route dev->irq. Well, documentation/pci.txt says
we should trust on dev->irq over config space, however stopping lspci
and friends to confuse us would be too bad either. So I've included a
one-liner to fix this.

--

--- linux-2.4.0/arch/i386/kernel/pci-irq.c.orig Mon Jan  8 14:45:35 2001
+++ linux-2.4.0/arch/i386/kernel/pci-irq.c  Mon Jan 29 17:23:25 2001
@@ -234,22 +234,114 @@
return 1;
 }

+/*
+ * PIRQ routing for SiS 85C503 router used in several SiS chipsets
+ * According to the SiS 5595 datasheet (preliminary V1.0, 12/24/1997)
+ * the related registers work as follows:
+ * 
+ * general: one byte per re-routable IRQ,
+ *  bit 7  IRQ mapping enabled (0) or disabled (1)
+ *  bits [6:4] reserved
+ *  bits [3:0] IRQ to map to
+ *  allowed: 3-7, 9-12, 14-15
+ *  reserved: 0, 1, 2, 8, 13
+ *
+ * individual registers in device config space:
+ *
+ * 0x41/0x42/0x43/0x44:PCI INT A/B/C/D - bits as in general case
+ *
+ * 0x61:   IDEIRQ: bits as in general case - but:
+ * bits [6:5] must be written 01
+ * bit 4 channel-select primary (0), secondary (1)
+ *
+ * 0x62:   USBIRQ: bits as in general case - but:
+ * bit 4 OHCI function disabled (0), enabled (1)
+ * 
+ * 0x6a:   ACPI/SCI IRQ - bits as in general case
+ *
+ * 0x7e:   Data Acq. Module IRQ - bits as in general case
+ *
+ * Tested using SiS 5595 southbridge (vendor:device:rev=1039:0008:01)
+ * Testbox:
+ * BIOS Vendor: Award Software, Inc.
+ * BIOS Version: #401A0-0103xl-7 
+ * BIOS Release: 06/12/98
+ * System Vendor: System Manufacturer.
+ * Product Name: System Name.
+ * Version System Version.
+ * Serial Number SYS-1234567890.
+ * Board Vendor: ASUSTeK Computer INC..
+ * Board Name: SP98AGP-X.
+ * Board Version: REV 1.XX.
+ * Asset Tag: Asset-1234567890.
+ *
+ * BIOS routing table uses link 0x01-0x04 for PCI IRQ A-D, but register
+ * offsets like 0x62 as link values for USBIRQ e.g. So there is no simple
+ * "register = offset + pirq" relation. To make things even more confusing,
+ * reading USB OHCI's PCI_INTERRUPT_PIN config register (at 0x3d) returns 1
+ * suggesting the USB OHCI were connected to PCI INTA, which is not the case.
+ * One can map USBIRQ to a value that differs from all PCI INTA..D!
+ * Same holds for the other onboar

[PATCH] yenta, pm - part 2

2000-12-18 Thread Martin Diehl



This is part 2 of the yenta+pm updates for 2.4.0-t12 - to be applied after
part 1. Touching yenta.c only it provides:

- yenta_validate_base() to check and try to fix in case the BIOS has
  mapped the cardbus base registers to the legacy area <1M.
  IMHO, this would be better placed in the early pci initialization,
  but I prefered keeping all changes inside yenta_socket at pre-2.4.0.
- writing back the cardbus bridges' memory and io windows at resume in
  case they were lost. As it turned out this is the only additional thing
  to do for the TI1131, I've put that in the general yenta_config_init()
  as there might be other controllers with the same problem and it should
  not hurt anyway.
- adding pci_set_master() to yenta_open().

This should fix problems caused by bad BIOS configuration or
configuration loss at suspend.

Testing: Both parts together on HP Omnibook 800: Regardless what the BIOS
has done (Cardbus enable/disable, PnP OS yes/no), bad things are detected
and fixed and yenta is now working with pm, even when suspending
immediately after boot and modprobing the yenta stuff later - i.e. we do
not depend on saving something to restore before we suspend for the first
time. Tested with 16bit modem and ne2k card.

Thank you for pointing me in the right direction.

Regards
Martin

-
--- v2.4.0-t12-yenta1/driver/pcmcia/yenta.c Sun Dec 17 20:00:17 2000
+++ v2.4.0-t12-yenta2/driver/pcmcia/yenta.c Mon Dec 18 11:50:42 2000
@@ -623,14 +623,24 @@
 
 /*
  * Initialize the standard cardbus registers
+ * and write back bridge windows in case controller forgot it.
  */
 static void yenta_config_init(pci_socket_t *socket)
 {
u16 bridge;
struct pci_dev *dev = socket->dev;
+   struct resource *res;
+   u32 offset;
+   unsigned i;
 
config_writel(socket, CB_LEGACY_MODE_BASE, 0);
config_writel(socket, PCI_BASE_ADDRESS_0, dev->resource[0].start);
+   for (i = 0; i < 4; i++) {
+   res = socket->dev->resource + PCI_BRIDGE_RESOURCES + i;
+   offset = PCI_CB_MEMORY_BASE_0 + 8 * i;
+   config_writel(socket, offset, res->start);
+   config_writel(socket, offset+4, res->end);
+   }
config_writew(socket, PCI_COMMAND,
PCI_COMMAND_IO |
PCI_COMMAND_MEMORY |
@@ -676,17 +686,6 @@
 static int yenta_suspend(pci_socket_t *socket)
 {
yenta_set_socket(socket, &dead_socket);
-
-   /*
-* This does not work currently. The controller
-* loses too much informationduring D3 to come up
-* cleanly. We should probably fix yenta_init()
-* to update all the critical registers, notably
-* the IO and MEM bridging region data.. That is
-* something that pci_set_power_state() should
-* probably know about bridges anyway.
-*/
-
return 0;
 }
 
@@ -796,11 +795,66 @@
 
 #define NR_OVERRIDES (sizeof(cardbus_override)/sizeof(struct cardbus_override_struct))
 
+/* BIOS might have mapped devices' base resource to a bogus memory area
+ * and - even worse - the hostbridge looses this window during suspend.
+ * So we try to detect and fix this by re-assigning the resource if we
+ * find the base resource mapped to legacy area <1M. But we don't try
+ * this, if the obsolete MEM_TYPE_1M flag is set, just in case...
+ */
+
+static void yenta_validate_base(struct pci_dev *dev)
+{
+   struct resource *res;
+   u32 temp, size;
+
+   res = &dev->resource[0];
+   if (!res || !(res->flags&IORESOURCE_MEM) || res->start>=0x0010)
+   return;
+
+   pci_set_power_state(dev,0);
+   pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &temp);
+   if (temp & PCI_BASE_ADDRESS_MEM_TYPE_1M) {
+   printk("yenta: found pci memory mapped to <1M legacy area\n");
+   printk("yenta: not touched since (obsolete) 1M type set\n");
+   return;
+   }
+   printk("yenta: fixing bogus pci memory mapping %08lx\n", res->start);
+
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, ~0x0);
+   pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &temp);
+   if (temp & PCI_BASE_ADDRESS_SPACE_IO) {
+   printk("yenta: pci memory mutated to io - giving up!\n");
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, res->start);
+   return;
+   }
+
+   /* semi-optimal - old values lost if failure (pretty unlikely).
+* save & restore would require access to the resource list lock,
+* which is private in kernel/resource.c
+*/
+
+   release_resource(res);
+
+   size =  ~(u32)(temp & PCI_BASE_ADDRESS_MEM_MASK);
+   res->name = dev->name;
+   res->start = 0;
+   res->end = res->start + size;
+   res->flags = IORESOURCE_MEM;
+   if (temp & PCI_BASE_ADDRESS_MEM_PREFETCH)
+   res->flags |= IORESOURCE_PREFETCH; 
+   res->parent = res->child = res->sibling = NULL;
+

[PATCH] yenta, pm - part 1

2000-12-18 Thread Martin Diehl


On Fri, 15 Dec 2000, Linus Torvalds wrote:

> I suspect that the suspend/resume will do something bad to the BAT
> registers, which control the BIOS area mapping behaviour, and it just
> kills the forwarding of the legacy region to the PCI bus, or something.

FYI: I've identified a single byte in the hostbridges config space which
is altered after resume. Blindly restoring it makes the 0xe6000 pci bus
address mapping accessible again. But I think that's not the Right way to
fix it.

> I wonder if the PCI cardbus init code should just notice this, and force
> all cardbus windows to be re-initialized. That legacy area address really
> doesn't look right.

That's what I've done now. So I'm sending the modification I made in case
you would still like them for 2.4.0. I've separated it in 2 (almost
independent) patches meant to be applied in series against 2.4.0-t12 (final)

This is part 1. It provides:

- pm_state, pm_lock for cardbus socket to prevent multiple suspend/resume,
  especially the reentrant case due to some driver sleeping in resume.
- cardbus_change_pm_state() to handle pm state transition. Suspend is
  handled synchronously but resume schedules an asynchronous completion
  handler since pcmcia_resume_socket() may sleep.
- placing the pci_set_power_state() calls at the right place

Mainly touching pci_socket.* it fixes the suspend/resume multiple entrance
issues due to some driver (notably cardbus itself) sleeping in resume.

Regards
Martin

-
diff -Nur v2.4.0-test12/drivers/pcmcia/pci_socket.c 
v2.4.0-t12-yenta1/driver/pcmcia/pci_socket.c
--- v2.4.0-test12/drivers/pcmcia/pci_socket.c   Wed Nov 29 21:47:10 2000
+++ v2.4.0-t12-yenta1/driver/pcmcia/pci_socket.cSun Dec 17 19:59:27 2000
@@ -178,6 +178,8 @@
socket->op = ops;
dev->driver_data = socket;
spin_lock_init(&socket->event_lock);
+   socket->pm_state = 0;
+   spin_lock_init(&socket->pm_lock);
return socket->op->open(socket);
 }
 
@@ -212,16 +214,83 @@
dev->driver_data = 0;
 }
 
+/* Delayed handler scheduled to complete the D3->D0 transition in the
+ * upper layers. We may sleep in pcmcia_resume_socket() with pm_lock
+ * hold - so we are save from resume re-entry due to other drivers
+ * sleeping in pci_pm resume handling.
+ */
+
+static void cardbus_resume_bh(void *data)
+{
+   pci_socket_t  *socket = (pci_socket_t *)data;
+
+   pcmcia_resume_socket(socket->pcmcia_socket);
+   socket->pm_state = 0;
+   spin_unlock(&socket->pm_lock);
+   MOD_DEC_USE_COUNT;
+}
+
+/* We are forced to implement asynch resume semantics because the
+ * pcmcia_resume path sleeps and we might get screwed by a second
+ * pci_pm_resume_device() hitting us in the middle of the first one.
+ * Which might happen anyway, if other drivers do not cooperate!
+ * So it's good to know we are protected by our socket->pm_lock.
+ */
+
+static void cardbus_change_pm_state(pci_socket_t *socket, int newstate)
+{
+   switch (newstate) {
+   case 3:
+   pcmcia_suspend_socket(socket->pcmcia_socket);
+   break;
+
+   case 0:
+   socket->tq_resume.routine = cardbus_resume_bh;
+   socket->tq_resume.data = socket;
+   MOD_INC_USE_COUNT;
+   schedule_task(&socket->tq_resume);
+   break;
+   default:
+   printk("cardbus: undefined power state\n");
+   break;
+   }
+}
+
+
 static void cardbus_suspend (struct pci_dev *dev)
 {
pci_socket_t *socket = (pci_socket_t *) dev->driver_data;
-   pcmcia_suspend_socket (socket->pcmcia_socket);
+
+   spin_lock(&socket->pm_lock);
+   if (socket->pm_state != 0) {
+   spin_unlock(&socket->pm_lock);
+   printk("cardbus: suspend of already suspended socket blocked\n");
+   return;
+   }
+   cardbus_change_pm_state(socket,3);
+   pci_set_power_state(dev,3);
+   socket->pm_state = 3;
+   spin_unlock(&socket->pm_lock);
 }
 
 static void cardbus_resume (struct pci_dev *dev)
 {
pci_socket_t *socket = (pci_socket_t *) dev->driver_data;
-   pcmcia_resume_socket (socket->pcmcia_socket);
+   
+   spin_lock(&socket->pm_lock);
+   if (socket->pm_state != 3) {
+   spin_unlock(&socket->pm_lock);
+   printk("cardbus: resume of non-suspended socket blocked\n");
+   return;
+   }
+   pci_set_power_state(dev,0);
+   cardbus_change_pm_state(socket, 0);
+
+   /* we intentionally leave with socket->pm_state not updated
+* and socket->pm_lock still acquired!
+* Will be released by the pending cardbus_resume_bh()
+* Needed to protect against resume re-entry.
+*/
 }
 
 
diff -Nur v2.4.0-test12/drivers/pcmcia/pci_socket.h 
v2.4.0-t12-yenta1/driver/pcmcia/pci_socket.h
--- v2.4.0-test12/drivers/pcmcia/pci_soc

Re: yenta, pm, ioremap(!) problems (was: PCI irq routing..)

2000-12-15 Thread Martin Diehl


On Fri, 15 Dec 2000, Linus Torvalds wrote:

> I'm surprised: "yenta_init()" will re-initialize the yenta
> PCI_BASE_ADDRESS_0 register, but maybe there's something wrong there. Try

right - but it is just writing back the bogus 0xe6000 thing.

> adding a pci_enable_device() to turn the device on and also re-route the
> interrupts if necessary.

Tried: nothing changed. For the TI1131 only the bridge windows are lost,
not resource 0. It's still there and appears valid to the kernels best
knowledge - no need for re-negotiation.

> The above is fairly strange, though. I wonder if the problem is that
> 0xe6000 value: that's a pretty bogus address for a PCI window, as it's in
> the BIOS legacy area. 

I've just hacked down the pci resource allocation (namely pci-i386.c) in
such a way to always assign insane 0xe6000/0xe7000 to the base resource 0
(register memory) similar to what the BIOS does. Same result: working
until suspend, identical RO garbage thereafter. Seems it's really a bad
choice to map PCI memory to this area - at least for this box.

> I suspect that the suspend/resume will do something bad to the BAT
> registers, which control the BIOS area mapping behaviour, and it just
> kills the forwarding of the legacy region to the PCI bus, or something.

sounds reasonable wrt what I've seen - Don't trust the BIOS.

> I wonder if the PCI cardbus init code should just notice this, and force
> all cardbus windows to be re-initialized. That legacy area address really
> doesn't look right.

Should work - identify all (bad mapped) regions, free them, and let
pci_enable_device() make things fine. However, I'd suggest doing this at
initial pci device scan since
- not only cardbus devices might be misconfigured by the BIOS
- no need for sanity check in every pci-capable driver.
- similar stuff needed for transparent hotplugging
Loosing part of this later at suspend is a different issue which may
deserve fixing at per-driver basis. But broken PCI memory mapping to BIOS
legacy area should be corrected from the very beginning, I believe.

Regards
Martin

-
FYI - /proc/iomem showing broken iomapping from BIOS
(the 53c810 might better be page-adjusted too):

-0009fbff : System RAM
0009fc00-0009 : reserved
000a-000b : Video RAM area
000c-000c7fff : Video ROM
000e6000-000e6fff : Texas Instruments PCI1131
000e7000-000e7fff : Texas Instruments PCI1131 (#2)
000f-000f : System ROM
0010-02ff : System RAM
  0010-0020afaf : Kernel code
  0020afb0-0021f7a3 : Kernel data
1000-103f : PCI CardBus #80
1040-107f : PCI CardBus #80
1080-10bf : PCI CardBus #81
10c0-10ff : PCI CardBus #81
1f00-1fff : Symbios Logic Inc. (formerly NCR) 53c810
2000-2fff : PCI Bus #01
3000-3fff : PCI Bus #01
c000-c03f : Neomagic Corporation NM2093 [MagicGraph 128ZV]
  c000-c010 : vesafb
fff0- : reserved

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

yenta, pm, ioremap(!) problems (was: PCI irq routing..)

2000-12-15 Thread Martin Diehl

On Thu, 7 Dec 2000, Linus Torvalds wrote:

> Ok, definitely needs some more work. Thanks for testing - I have no
> hardware where this is needed.

Well, so I've tried to go on since my box has this "feature". Seems I
finally got the thing tracked down to several issues with mutual influence
and thus really hard to reproduce.
Apparently there are not (yet?) so much people hurt by it so my believe is
the required cleanup would be post-2.4.0(final). Just want to give you
some idea of the interesting things I've seen so far:

1) cardbus_resume() gets invoked more than once, even at test12, where the
"bridges hit twice" case from pci_pm stuff is fixed.
The reason is pcmcia_core stuff sleeping in the resume path waiting for
card detection. So we are scheduled with the resume still pending and
cardbus_resume() is entered again from kapm-idled (first was from userland
apmd context). So we get screwed when doing a second init while already
waiting for the card to finish interrogation.
My solution: asynch semantics for cardbus_resume() wrt to pcmcia_core
using a scheduled resume_bh. So we finish the pm callback before
sleeping.

2) While 1) prevents us from fooling ourselves there might be other
drivers sleeping in resume. According to Documentation/pm.txt it is legal
to do so. Probably, instead of speaking of some unspecified advantage to
finish fast, it should be stated as strongly discouraged to sleep.
Otherwise one single driver could trigger the multiple resume case for all
others. Anyway, best solution might be a clean state machine which handles
the pm transitions (and pci hotplugging). IMHO this is 2.5 stuff so I've
tried to protect the yenta stuff by its own (lockable) state flag.

3) The TI1131 is apparently not PCI PM 1.0 compliant. At least it seems it
has been replaced by the 12xx series at the moment some major player
required PCI PM 1.0 to get his "Designed for ..." label in '98 ;-)
So I had to add some code to save and restore things like memory and io
windows of the bridge which were lost after resume. This is implemented as
a controller specific addon to the common yenta operations similar to the
open/init case.

4) The final bang was when I realized that after all that done the
content of the CardBus/ExCA register space was total garbage after
resume. And, even worse, it completely failed to restore - not even
0's written to it could be read back as such. This turned out to be a
io-mapping issue! Believe it or not - my solution is to disable the
cardbus controller in BIOS setup. The rationale is as follows:

- When controller is enabled the BIOS assigns BASE_0 to 0xe6000/0xe7000.
  This is mapped to 0xc00e6000 by ioremap(). Everything works fine until
  we suspend. Furthermore I've proved by use of virt_to_bus() and vice
  versa the mapping is still there after resume. However the content is
  not writeable anymore and contains some arbitrary garbage - which always
  stays the same, even over cold reboot. But no Oops or so - just if
  you were writing to /dev/null and reading some hardwired bytes.
  Even unmapping it at suspend and remapping after resume did not help.

- With controller disabled on the other hand the BIOS does not assign
  BASE_0. So we do it during pciscan (btw., that's why I needed the VLSI
  router stuff first, since the IRQ is unrouted too in this case). This
  assigns bus-address like 0x1000 to the guy which we are mapping
  to 0xc3-somewhere - fine. This mapping however does not only survive the
  suspend/resume like the first one, its content also remains valid -
  i.e. no garbage and writeable - here we go :)

Well, at the end yenta is now working together with pm if 1-4) applied.
So I would stop here with this workaround for me and things to be
addressed later at 2.5. Of course I could prepare 2 or 3 patches in case
it might be helpful at pre-2.4. All changes are to yenta_socket only, so
it would at least not break anything else.

However, I don't see what makes bus address 0xe6000 differ from 0x1000
- except we are crossing the 1M barrier.
>From the i386/ioremap() code I've seen the 640k-1M range is handled
separately since it's always mapped. Some chance to loose something here
during suspend? Pagetables/-caches are expected to remain valid - right?
Btw, all access to the cardbus/exca registers go to the inlines at the top
of yenta.c using read[bwl]() - which is (for the i386) defined to simply
dereference __io_virt(addr). But we have addr pointing somewhere to the
cardbus registers already memory mapped, so we could simply say *addr.
Just a minor notational inconsistency or is there good reason to access
iomem one way or the other (aliasing, caching,...)?

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] VLSI irq router (was: PCI irq routing..)

2000-12-10 Thread Martin Diehl


On Thu, 7 Dec 2000, Linus Torvalds wrote:

> > btw, I'm thinking I could guess the routing from the VLSI config space,
> > but I don't have any doc's. Would it be worth to try to add some specific
> 
> Please do. You might leave them commented out right now, but this is

Ok. Apparently it's the "pirq is nibble index in config space" kind of
routing which makes guessing a change bios and lspci procedure.
patch vs. 2.4.0-t12p8 below. Tested as (in)complete as my bios permits.
Works fine for several days and correctly assigns IRQ's when unassigned 
due to "pnp os". So I feel confident enough to not leave it commented out.
Test example attached.

Regards
Martin

-

diff -Nur linux-2.4.0-t12p8/arch/i386/kernel/pci-irq.c 
linux-2.4.0-t12p8-md/arch/i386/kernel/pci-irq.c
--- linux-2.4.0-t12p8/arch/i386/kernel/pci-irq.cMon Dec 11 00:29:42 2000
+++ linux-2.4.0-t12p8-md/arch/i386/kernel/pci-irq.c Mon Dec 11 00:58:48 2000
@@ -298,6 +298,33 @@
return 1;
 }
 
+/*
+ * VLSI: nibble offset 0x74 - educated guess due to routing table and
+ *   config space of VLSI 82C534 PCI-bridge/router (1004:0102)
+ *   Tested on HP OmniBook 800 covering PIRQ 1, 2, 4, 8 for onboard
+ *   devices, PIRQ 3 for non-pci(!) soundchip and (untested) PIRQ 6
+ *   for the busbridge to the docking station.
+ */
+
+static int pirq_vlsi_get(struct pci_dev *router, struct pci_dev *dev, int pirq)
+{
+   if (pirq > 8) {
+   printk("VLSI router pirq escape (%d)\n", pirq);
+   return 0;
+   }
+   return read_config_nybble(router, 0x74, pirq-1);
+}
+
+static int pirq_vlsi_set(struct pci_dev *router, struct pci_dev *dev, int pirq, int 
+irq)
+{
+   if (pirq > 8) {
+   printk("VLSI router pirq escape (%d)\n", pirq);
+   return 0;
+   }
+   write_config_nybble(router, 0x74, pirq-1, irq);
+   return 1;
+}
+
 #ifdef CONFIG_PCI_BIOS
 
 static int pirq_bios_set(struct pci_dev *router, struct pci_dev *dev, int pirq, int 
irq)
@@ -329,6 +356,7 @@
 
{ "NatSemi", PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5520, pirq_cyrix_get, 
pirq_cyrix_set },
{ "SIS", PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_503, pirq_sis_get, pirq_sis_set },
+{ "VLSI 82C534", PCI_VENDOR_ID_VLSI, PCI_DEVICE_ID_VLSI_82C534, 
+pirq_vlsi_get, pirq_vlsi_set },
{ "default", 0, 0, NULL, NULL }
 };




PCI: BIOS32 Service Directory structure at 0xc00ec060
PCI: BIOS32 Service Directory entry at 0xeefb0
PCI: BIOS probe returned s=00 hw=11 ver=02.10 l=01
PCI: PCI BIOS revision 2.10 entry at 0xeefc2, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Scanning for ghost devices on bus 0
PCI: Scanning for ghost devices on bus 1
PCI: IRQ init
PCI: Interrupt Routing Table found at 0xc00f36e0
00:03 slot=00 0:00/ 1:00/ 2:00/ 3:00/
00:04 slot=00 0:01/8eb8 1:04/8eb8 2:04/ 3:04/
00:05 slot=00 0:08/8eb8 1:08/ 2:08/ 3:08/
00:06 slot=00 0:02/8eb8 1:02/ 2:02/ 3:02/
01:00 slot=00 0:08/8eb8 1:08/ 2:08/ 3:08/
01:06 slot=01 0:06/8eb8 1:06/8eb8 2:06/8eb8 3:06/8eb8
PCI: Using IRQ router VLSI 82C534 [1004/0102] at 00:01.0
PCI: IRQ fixup
00:03.0: ignoring bogus IRQ 255
00:04.0: ignoring bogus IRQ 255
00:04.1: ignoring bogus IRQ 255
IRQ for 00:03.0(0) via 00:03.0 -> not routed
IRQ for 00:04.0(0) via 00:04.0 -> PIRQ 01, mask 8eb8, excl  -> newirq=0 ... failed
IRQ for 00:04.1(1) via 00:04.1 -> PIRQ 04, mask 8eb8, excl  -> newirq=0 ... failed
PCI: Allocating resources
PCI: Resource c000-c03f (f=1208, d=0, p=0)
PCI: Resource 3100-31ff (f=101, d=0, p=0)
PCI: Resource 1f00-1fff (f=200, d=0, p=0)
PCI: Resource 3000-301f (f=101, d=0, p=0)
  got res[1000:1fff] for resource 0 of Texas Instruments PCI1131
  got res[10001000:10001fff] for resource 0 of Texas Instruments PCI1131 (#2)
PCI: Sorting device list...

Linux PCMCIA Card Services 3.1.22
  options:  [pci] [cardbus] [pm]
PCI: Enabling device 00:04.0 ( -> 0002)
IRQ for 00:04.0(0) via 00:04.0 -> PIRQ 01, mask 8eb8, excl  -> newirq=5 -> 
assigning IRQ 5 ... OK
PCI: Assigned IRQ 5 for device 00:04.0
PCI: Enabling device 00:04.1 ( -> 0002)
IRQ for 00:04.1(1) via 00:04.1 -> PIRQ 04, mask 8eb8, excl  -> newirq=9 -> 
assigning IRQ 9 ... OK
PCI: Assigned IRQ 9 for device 00:04.1
Yenta IRQ list 0ad8, PCI irq5
Socket status: 3110
Yenta IRQ list 08d8, PCI irq9
Socket status: 3010

00:00.0 Host bridge: VLSI Technology Inc 82C535 (rev 03)
Flags: bus master, medium devsel, latency 0

00:01.0 PCI bridge: VLSI Technology Inc 82C534 (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 4000-7fff
Memory behind bridge: 2000-2fff
P

Re: PCI irq routing..

2000-12-07 Thread Martin Diehl


On Thu, 7 Dec 2000, Linus Torvalds wrote:

> > btw, I'm thinking I could guess the routing from the VLSI config space,
> 
> Please do. You might leave them commented out right now, but this is
> actually how most of the pirq router entries have been created: by looking
> at various pirq tables and matching them up with other (non-pirq)
> documentation and testing. Th epirq "link" value is basically completely
> NDA'd, and is per-chipset-specific. Nobody documents it except in their
> bios writers guide, if even there.

Ok - will do it. Unfortunately the BIOS of this notebook has no
customizeable routing option which I could use to to play with.
So testing here will hardly cover orthogonal cases.

> > The reason for this is in drivers/pci.c where bridges are touched
> > twice: once as a device on a bus and once via ->self from the bus behind.
> 
> Not intended behaviour. The self case should be removed.

I was wondering whether there might be bridges which have to be awoken
from both sides because they have different config spaces there.
Is bus->children->self guarantied to be identical to bus->device for
all kinds of bridge devices?
Sure, dividing bridges wouldn't make too much sense - at least I don't see
what half a bridge might be good for, but ...

Removing self cases is straightforward - pci_pm-2.4.0-t9p3-patch below.

> Ok, definitely needs some more work. Thanks for testing - I have no
> hardware where this is needed.

Could do some more testing if a day or two for feedback is ok.

Two more things I've noticed:

- when all pcmcia/yenta stuff is in modules and doing suspend/resume
immediately after fresh cold reboot there is nothing our cardbus stuff
might have set up which was lost in suspend. Nevertheless, what happens is
the pcmcia_core/yenta_socket/ds modules get loaded without problem but the
"Socket status" printk from yenta_open_bh() is completely garbage. This is
not the case when the modules are loaded before the suspend.
Despite the garbage, subsequent cardmgr startup does not give any error
message - but the cards in the slots are not recognized (no beeps, no
status to retrieve from cardctl). Reboot is the only solution.
My conclusion is, the reason must be in the init-path doing or forgetting
something prohibited/required after suspend - or the TI1131 is broken.

- when, after yenta sockets became unusable due to pm suspend, I try to
eject/insert the cards from a slot, the box freezes. This turned out
to be a loop in yenta_interrupt being called endlessly. Apparently the
yenta_bh() -> pcmcia-handler path somehow triggers the next IRQ.
But this might be a consequence of the former issue.

According to the forecasts, next weekend will be rainy, so...

Thank you for the time!

Regards
Martin

-
--- linux-2.4.0-t12p3/drivers/pci/pci.c.origMon Dec  4 14:21:26 2000
+++ linux-2.4.0-t12p3/drivers/pci/pci.c Fri Dec  8 00:17:50 2000
@@ -1089,6 +1089,9 @@
return 0;
 }
 
+
+/* take care to suspend/resume bridges only once */
+
 static int pci_pm_suspend_bus(struct pci_bus *bus)
 {
struct list_head *list;
@@ -1100,9 +1103,6 @@
/* Walk the device children list */
list_for_each(list, &bus->devices)
pci_pm_suspend_device(pci_dev_b(list));
-
-   /* Suspend the bus controller.. */
-   pci_pm_suspend_device(bus->self);
return 0;
 }
 
@@ -1110,8 +1110,6 @@
 {
struct list_head *list;
 
-   pci_pm_resume_device(bus->self);
-
/* Walk the device children list */
list_for_each(list, &bus->devices)
pci_pm_resume_device(pci_dev_b(list));
@@ -1125,18 +1123,26 @@
 static int pci_pm_suspend(void)
 {
struct list_head *list;
+   struct pci_bus *bus;
 
-   list_for_each(list, &pci_root_buses)
-   pci_pm_suspend_bus(pci_bus_b(list));
+   list_for_each(list, &pci_root_buses) {
+   bus = pci_bus_b(list);
+   pci_pm_suspend_bus(bus);
+   pci_pm_suspend_device(bus->self);
+   }
return 0;
 }
 
 static int pci_pm_resume(void)
 {
struct list_head *list;
+   struct pci_bus *bus;
 
-   list_for_each(list, &pci_root_buses)
-   pci_pm_resume_bus(pci_bus_b(list));
+   list_for_each(list, &pci_root_buses) {
+   bus = pci_bus_b(list);
+   pci_pm_resume_device(bus->self);
+   pci_pm_resume_bus(bus);
+   }
return 0;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI irq routing..

2000-12-07 Thread Martin Diehl

On Wed, 6 Dec 2000, Linus Torvalds wrote:

> On Wed, 6 Dec 2000, Martin Diehl wrote:
> > 
> [Cardbus config space lost after APM suspend/resume]
> 
> Can you remind me in a day or two if I haven't gotten back to you? I don't
> have any machines that need this, but I've seen ones that do, and if
> you're willing to test..

sure, will to do testing (and reminding ;-)

> Yes, this is expected for routers that we don't know about: we will still
> use the irq that the device claims it has, but we will obviously fail to
> try to route it (but it still works if the BIOS had already routed it -
> which is how the old code always worked anyway).

btw, I'm thinking I could guess the routing from the VLSI config space,
but I don't have any doc's. Would it be worth to try to add some specific
get/set methods for this device? What about testers (or people who have
access to the docs)?

> Anyway, for the suspend-resume thing, if you want to go ahead on your own
> without a real patch from me, the fix is along the lines of

well, took me some time to follow all the paths thru cardbus/pcmcia stuff
wrt suspend/resume from pm - but ended up at:

>  - add two functions:
> 
>   static void yenta_save_config(pci_socket_t *socket)
>   static void yenta_restore_config(pci_socket_t *socket)

That's the crucial point, imho. The PCI layer forwards the PM events to
the cardbus-driver's suspend/resume methods, which are calling 
pcmcia_suspend/resume_socket(). The latter in turn will call back the
appropriate yenta_operations which are registered to it. So much for sure.

However, there is no pcmcia_resume path forwarded to yenta since the
traditional pccard_operations did not provide such a method and pcmcia
simply re-initialized it's sockets. My assumption is, you haven't meant
to add a do-nothing resume to all the pcmcia-stuff (including i82365,
tcic) just to allow yenta to register for a resume operation which would
be there for cardbus only.
So my suggestion is to have cardbus_save/restore_config() exactly doing
what you've said for yenta_*.

>  - do a "yenta_save_config()" in "yenta_suspend()" and a
>"yenta_restore_config()" at the top of "yenta_resume()"

yenta_resume() does not exist.
yenta_*() replaced by cardbus_*() as explained.

>  - test. Also test with the "pci_set_power_state(3)" in suspend enabled,
>because it may/should actually work with that enabled too.

same point: pci_set_power_state(3) should go to cardbus_suspend(), not
yenta.

A first try of this ended oopsing at pm suspend somewhere below
pci_pm_*(). It turned out the reason was the cardbus_suspend() (and resume
too) method which was *invoked several times* in a row!

The reason for this is in drivers/pci.c where bridges are touched
twice: once as a device on a bus and once via ->self from the bus behind.
I'm not sure whether this is the intended behavior - but it definitely
calls cardbus_suspend/resume() twice which breaks when forwarding to
pcmcia_suspend/resume_socket().

So I've tentatively worked around using a "once" flag added to
pci_socket_t. This solves the problems during suspend/resume and the
cardbus' config space appears to be restored as intended - good.

The bad news however is, the sockets are still broken after
resume. Unfortunately there are several candidates I've spotted:

- calling yenta_init() stuff at resume - is this sufficient?
  Probably we have to forward the pm-triggered resume from pm along
  pci -> cardbus -> pcmcia -> yenta (last link currently missed,
  because the pcmcia layer switches from incoming resume notification
  to init path)

- some content of the mem/io regions might need to be preserved

- some TI1131 oddity wrt to CSC-INT's - requested IRQ's show up correctly
  in /proc/interrupts and are properly triggered and handled at card
  insert/eject. But after pm suspend/resume the box freezed when inserting
  or ejecting the cards (no response to SysRq anymore).

I'll try to continue on this.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI irq routing..

2000-12-06 Thread Martin Diehl

On Tue, 5 Dec 2000, Linus Torvalds wrote:

> Anybody else who has had problems with PCI interrupt routing (tends to be
> "new" devices like CardBus, ACPI, USB etc), can you verify that this
> either fixes things or at least doesn't break a setup that had started
> working earlier..

problems with recent 2.4.0-test1* on my HP OmniBook 800 are probably
combined PCMCIA(CB) / PCI / APM issues. The point is my 16bit cards
(modem+ne2k) are working perfectly fine with yenta sockets until the first
suspend/resume. Afterwards the PCI config space of the Cardbus
bridge(s) is completely messed up forcing me to reboot.

So I just applied your patch vs. 2.4.0-t12p3 (had to cleanup one rejected
hunk due to an eisa_set_level_irq() which is already there).
pcmcia-cs is 3.1.22.

result: issue remains unchanged but nothing seems to be broken so far.
The only difference I've noticed is the following two lines appearing when
modprobing the pcmcia_core/yenta stuff:

IRQ for 00:04.0(0) via 00:04.0 -> PIRQ 01, mask 8eb8, \
excl  -> newirq=9 ... failed
IRQ for 00:04.1(1) via 00:04.1 -> PIRQ 04, mask 8eb8, \
excl  -> newirq=7 ... failed

My guess: might be due to the PCI-IRQ-router (VLSI 82C534 PCI-bridge, 
id=1004:0102) without special support (defaults to r->get == NULL).

Furthermore, I've noticed at 2.4.0-t10 the PCI-IRQ's of the CB-bridges
were lost (reset to 0) during suspend/resume whereas at 2.4.0-t12p3 they
survive (-t11 not tried). However memory and io-mapping get lost.
I'd consider this the main reason for the failure, but I'm not sure
whether it's the Cardbus bridges' fault or a PCI or APM issue.

But nothing - neither fixed nor broken - has changed for me by this patch,
except for the two lines which apparently do not matter anyway.

attached: dmesg and lspci traces with some comments.

What more information/debugging would be helpful?

Regards
Martin

# dmesg identical (except trivial stuff) for:

Linux version 2.4.0-test12 ([EMAIL PROTECTED]) (gcc version 2.95.3 19991030 
(prerelease)) #4 Mon Dec 4 16:50:54 CET 2000

# and

Linux version 2.4.0-test12-pci-irq ([EMAIL PROTECTED]) (gcc version 2.95.3 19991030 
(prerelease)) #5 Wed Dec 6 01:04:14 CET 2000

BIOS-provided physical RAM map:
 BIOS-e820: 0009fc00 @  (usable)
 BIOS-e820: 0400 @ 0009fc00 (reserved)
 BIOS-e820: 0002 @ 000e (reserved)
 BIOS-e820: 02f0 @ 0010 (usable)
 BIOS-e820: 0010 @ fff0 (reserved)
Scan SMP from c000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f for 65536 bytes.
Scan SMP from c009fc00 for 4096 bytes.
On node 0 totalpages: 12288
zone(0): 4096 pages.
zone(1): 8192 pages.
zone(2): 0 pages.
mapped APIC to e000 (010cd000)
Kernel command line: BOOT_IMAGE=linux.2412p ro root=301 video=vesa
Initializing CPU#0
Detected 164.661 MHz processor.
Console: colour dummy device 80x25
Calibrating delay loop... 328.50 BogoMIPS
Memory: 46516k/49152k available (929k kernel code, 2248k reserved, 76k data, 200k 
init, 0k highmem)
Dentry-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 4096 (order: 3, 32768 bytes)
VFS: Diskquotas version dquot_6.4.0 initialized
CPU: Before vendor init, caps: 008001bf  , vendor = 0
Intel Pentium with F0 0F bug - workaround enabled.
CPU: After vendor init, caps: 008001bf   
CPU: After generic, caps: 008001bf   
CPU: Common caps: 008001bf   
CPU: Intel Pentium MMX stepping 03
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: BIOS32 Service Directory structure at 0xc00ec060
PCI: BIOS32 Service Directory entry at 0xeefb0
PCI: BIOS probe returned s=00 hw=11 ver=02.10 l=01
PCI: PCI BIOS revision 2.10 entry at 0xeefc2, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Scanning for ghost devices on bus 0
PCI: Scanning for ghost devices on bus 1
PCI: IRQ init
PCI: Interrupt Routing Table found at 0xc00f36e0
00:03 slot=00 0:00/ 1:00/ 2:00/ 3:00/
00:04 slot=00 0:01/8eb8 1:04/8eb8 2:04/ 3:04/
00:05 slot=00 0:08/8eb8 1:08/ 2:08/ 3:08/
00:06 slot=00 0:02/8eb8 1:02/ 2:02/ 3:02/
01:00 slot=00 0:08/8eb8 1:08/ 2:08/ 3:08/
01:06 slot=01 0:06/8eb8 1:06/8eb8 2:06/8eb8 3:06/8eb8
PCI: Using IRQ router default [1004/0102] at 00:01.0
PCI: IRQ fixup
00:03.0: ignoring bogus IRQ 255
IRQ for 00:03.0(0) via 00:03.0 -> not routed
PCI: Allocating resources
PCI: Resource c000-c03f (f=1208, d=0, p=0)
PCI: Resource 000e6000-000e6fff (f=200, d=0, p=0)
PCI: Resource 000e7000-000e7fff (f=200, d=0, p=0)
PCI: Resource 3000-301f (f=101, d=0, p=0)
PCI: Sorting device

[RFC] udp_err compliance: RFC1122 vs. BSD

2000-10-08 Thread Martin Diehl



Hi,

just thinking about the RFC1122 vs. BSD compliance issue wrt error 
reporting on unconnected udp sockets i'd like to make a proposal for some
kind of solution:

Synopsis:
  Approach to have a default policy whether error reporting on udp
  sockets follows the official internet standard from RFC1122 or
  traditional BSD. Depending on this choice code, following the
  corresponding design will work without any changes. Other code
  will work after minor compatibility changes. Nothing new here.

Background:
  There is a continuing effort to make Linux IP protocols comply to
  RFC1122 which can be seen from the "Status" comments at the beginning
  of various protocol (icmp/tcp/udp) specific files in net/ipv4. From 1.x
  on Linux implemented what is required by rule 4.1.3.3 of RFC1122:
  "UDP MUST pass to the application layer all ICMP error messages that it
  receives from the IP layer." On the other hand, for IPv4/UDP there is
  a different approach from BSD-like applications which do not pass icmp
  errors on unconnected udp sockets to the application layer. Linux
  implemented a SOL_SOCKET level option called SO_BSDCOMPAT to switch
  from default RFC1122 compliant behaviour to BSD-like on a per-socket
  basis. The described behaviour was consistenly realized including latest
  2.2.18-pre* kernel versions. It is documented (as of man-pages-1.29:
  socket(4), socket(7), udp(4)) exactly this way. Especially, fixing
  BSD-like error handling instead of setting SO_BSDCOMPAT is encouraged.
  Furthermore, SO_BSDCOMPAT is claimed to be scheduled for future removal,
  i.e. RFC1122 compliant behaviour being the only option then.

Issue:
  Linux' handling of icmp errors on unconnected udp sockets was changed
  somewhere at 2.3.4x. It now favours BSD-like behaviour while dropping
  RFC1122 conformance. This change is hard-coded into the respective
  socket layer functions, i.e. there is no option to change this without
  kernel patch. While SO_BSDCOMPAT is still syntactically there, its
  semantics disappeared. Especially with SO_BSDCOMPAT set to 0 (regardless
  whether by default or explicitly) the socket still behaves BSD-like and
  violates RFC1122.
  This change potentially breaks code relying on previous Linux behaviour.
  While probably most of the arising pitfalls might be attributed to
  misconfigured system configuration, there might be a number of network
  applications designed to rely on RFC1122 to minimize network stalls. 
  On the other hand, RFC1122 compliant error handling was shown to cause
  significant trouble at hight network load due to delayed errors from
  former requests passed to the wrong socket.
  Taken together, both approaches are somehow buggy depending on the
  individual point of view. While the former implementation allowed
  overriding the default behaviour by changing it to BSD-like on a
  per-socket basis there is no such opportunity from user space with
  the new one.

Goal:
  Both approaches are desirable from there respective application but
  do mutually exclude each other. Hence there is some need for a
  solution which allows the user to specify a default policy for
  error handling on unconnected udp sockets. It should be feasible to
  override this default behaviour to help porting from both ends.

Detailed requirements:
  Q1: There must be a system wide error handling policy for ipv4/udp
  sockets.
  Q2: The allowed policies are "RFC1122 compliant" or "BSD-like".
  Q3: udp sockets must be created to honor the default policy at the
  moment they are created.
  Q4: Once created, it must be possible to change the way each individual
  socket of this kind is handled at any moment any number of times.
  Q5: If an icmp error is received for an udp socket it is tentatively
  assigned to it. If the socket behaves "BSD-like" at the moment when
  the error arrives, it is disregarded unless the socket is connected
  or IP_RECVERR is set. If the socket is RFC1122 compliant however,
  the error must be passed to the application layer in any case.
  Q6: The handling of potential races (default policy changed while socket
  is created for example) is unspecified.
  Comment: This shouldn't be much an issue since default policy should
  be set once when booting. Sharing sockets between processes on
  SMP boxes would be another candidate not taken into
  consideration.
  Q7: There must be a hard-coded value for setting the default policy at
  bootup. This should make everything behave like 2.2.x.
  Q8: Non ipv4/udp sockets must not be influenced by this approach.

Design outline:
  D1: Introduce a new sysctl option called "udp_rfc1122" to be placed at
  sysctl net.ipv4 with boolean semantics. (covers Q1+Q2)
  D2: Set the present "bsdish" socket property when creating ipv4/udp
  sockets. (covers Q3)
  D3: Keep the SO_BSDCOMPAT socket option which corresponds to the
  "bsdish" socket property and is managed by ge

Re: poll(2) semantics changed in 2.4.0-? vs. 2.2.16?

2000-10-08 Thread Martin Diehl

On Fri, 6 Oct 2000, Andi Kleen wrote:

[icmp errors on unconnected udp sockets not passed to application layer]
> 
> Alexey Kuznetsov ([EMAIL PROTECTED]) changed it. Ask him why he did it,
> I agree with you that it would make more sense to keep the old behaviour
> (even though it is differing from most other BSD sockets implementations) 
> 
> To answer your question: you'll only get the error reported now when the
> UDP socket is either connect(2)ed or when you enabled asynchronous
> error reporting using IP_RECVERR.

Thanks, both works fine. To understand the reason for this change I've
browsed the discussion on l-k when it was introduced in 2.3.4x.
My impression is, there are two approaches (RFC1122 vs. BSD) which are
mutually exclusive and could be attributed as broken from the respective
point of view.
I'm just going to formulate an idea I have not to solve but to work around
this ambiguity and retain maximum compatibility and honor both sides of
the story :)

Regards,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: poll(2) semantics changed in 2.4.0-? vs. 2.2.16?

2000-10-05 Thread Martin Diehl

On Thu, 5 Oct 2000, David S. Miller wrote:

> Fix your /etc/nsswitch.conf to not try to use NIS/NIS+ if you
> do not have these services available.

Yes, of course you are right - incorrect nsswitch.conf will reveal this
problem too - but:
No, the issue I've tried to demonstrate with my polltest.c program is
completely independent. The point is:

udp-sendmsg() to an unbound port on reachable peer (localhost:12345 e.g. -
will try eth0 too) results in returning icmp: port unreachable. Ok so far.
The following poll() on the sending fd should IMHO immediately return
POLLERR due to pending ECONNREFUSED instead of blocking until timeout -
right? At least, that is the behavior described in man udp(4)/RFC1122 and
realized in 2.2.16 (at least since 2.0.x I think and probably up to
2.2.18pre?, as long as there is no corresponding 2.4 backport there,
I believe - not tested). It's still working this way if I add a sleep(5)
or so between sendto() and poll() in 2.2. Setting or unsetting
SO_BSDCOMPAT doesn't change anything either.

For 2.4.0-test9 however the poll() ignores the returned (still pending)
icmp error and blocks until timeout (or forever), no matter if 
SO_BSDCOMPAT is set or not.

So, for me the 2.4.0-test9 behavior does not only differ from 2.2 and what
manpages say - I'm just wondering how to detect the unreachable peer port?
poll()-timeout means no response at all, which is sth different and forces
blocking for some time. Nonblocking recvfrom() without poll() wouldn't
help, since the pending error isn't passed to it either.

Sorry if my first post wasn't clear enough wrt that what I meant - a
general poll() semantics question - which indeed might be hit pretty
hard by incorrect /etc/nsswitch configuration for example.

Regards,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

poll(2) semantics changed in 2.4.0-? vs. 2.2.16?

2000-10-05 Thread Martin Diehl


Hi,
had some long network stalls during initscripts in 2.4.0-test9.
newaliases appeared to block until timeout in do_poll() on udp socket. 
I've written a demo program to show what's going on without depending
on initscript and config (DNS/RPC/NIS) issues - attached polltest.c
It sends something to an *unbound* udp port on localhost and polls for
reply. In 2.2.16 it returned immediately with ECONNREFUSED while for
2.4.0-t9 it blocks until timout, despite the ICMP port unreachable packet
returned in both cases - tcpdump -i lo gives:

14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345:
   udp 8 (DF)
14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345:
   udp 8 (DF)
14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp:
   localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0]
14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp:
   localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0]

(i believe everything appears twice because tcpdump sees both ends of lo)
So my impression is, the returned ICMP packet is disregarded somehow.
I also include a diff of the straces for polltest on 2.2.16 vs.
2.4.0-t9 (trivial things like getpid, gettimeofday, fstat64 removed):

--- polltrace-2.2.16Thu Oct  5 13:57:13 2000
+++ polltrace-2.4.0-t9  Thu Oct  5 14:05:23 2000
 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
 bind(3, {sin_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0")}}, 16) = 0
 sendto(3, "foo bar\0", 8, 0, {sin_family=AF_INET, sin_port=htons(12345),
sin_addr=inet_addr("127.0.0.1")}}, 16) = 8
-poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 1) = 1
-recvfrom(3, 0xbcbc, 100, 0, 0xbd24, 0xbcb8) = -1 ECONNREFUSED
(Connection refused)
+poll([{fd=3, events=POLLIN}], 1, 1) = 0
-write(1, "recvfrom: errno=111 - Connection"..., 41) = 41
+write(1, "poll - timeout\n", 15)= 15

To reproduce the problem just start polltest on 2.2.16 (or similar ?) and
recent 2.4 to compare the results. It accepts an optional command line
argument which is the remote port to poll on. This should be an unbound
port and defaults to 12345 if not given.

Is this an intentional change? Is it required by SuS e.g.?
Am I completely wrong when believing this might brake a number of
network programs (traceroute e.g.)?

Haven't tried neither non-loopback connection nor other
tcp/udp/poll/select combinations. But sane semantics should be consistent
IMHO. The testbox was UP in case that matters (lost events at scheduling
points???)

What am I missing?

Regards
Martin



#include 
#include 
#include 
#include 
#include 
#include 
#include 

inline int CheckErrOut( int retcode, const char *msg )
{
  if (retcode != -1)
return 0;
  printf("%s: errno=%d - %s\n", msg, errno, strerror(errno));
  exit(1);
}

int main(int argc, char *argv[])
{
  const char  send_msg[] = "foo bar";
  int ret = 0;
  int fd = -1;
  struct pollfd   pfd;
  struct sockaddr_in  from, to;
  int port = 0;

  if (argc > 1  &&  sscanf(argv[1], " %d", &port)==1)
;
  else
port = 12345;

  fd = socket(PF_INET,SOCK_DGRAM,IPPROTO_UDP);
  CheckErrOut(fd,"socket");

  memset(&from,0,sizeof(from));
  from.sin_family = AF_INET;
  from.sin_addr.s_addr = INADDR_ANY;
  from.sin_port = 0;
  ret = bind(fd,&from,sizeof(from));
  CheckErrOut(ret,"bind");

  memset(&to,0,sizeof(to));
  to.sin_family = AF_INET;
  to.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
  to.sin_port = htons(port);
  ret = sendto(fd,send_msg,sizeof(send_msg),0,&to,sizeof(to));
  CheckErrOut(ret,"sendto");
  
  memset(&pfd,0,sizeof(pfd));
  pfd.fd = fd;
  pfd.events = POLLIN;  
  ret = poll(&pfd,1,1);
  CheckErrOut(ret,"poll");

  if (ret != 0) {
char   recv_buf[100];
intaddrlen;

ret = recvfrom(fd,recv_buf,sizeof(recv_buf),0,&to,&addrlen);
CheckErrOut(ret,"recvfrom");
printf("received: %u bytes\n", ret);
  }
  else
printf("poll - timeout\n");
 
  close(fd);
  return 0;
}

IDE problems 2.4.0-t9p8 and later

2000-10-04 Thread Martin Diehl



Hi,
the following change from t9p7->t9p8 in ide-pci.c
 
-   if ((dev->class & ~(0xfa)) != ((PCI_CLASS_STORAGE_IDE << 8) | 5)) {
+   if ((dev->class & ~(0xff)) != (PCI_CLASS_STORAGE_IDE << 8)) {

causes a lot of trouble to me. Seems to be the same thing, which has
already been reported to l-k, but to my best knowledge it's unsolved.
So I had a look into this issue:
My IDE-Chipset is a SiS 5513 integrated into SiS 5591 Northbridge.
dev->class is 0x01018a. Hence the old test (there from 2.2.* to
2.4.0-t9p7) said "true" (due to the "|5") while the new one says "false".
So the chipset mode is now identified "native" although it's in
compatibility mode. PCI-IRQ 14 is used for both ports instead of IRQ 15
for ide1 in compatibility mode. Needless to say, everything (including
BM-DMA) works fine for me before t9p8 but now hangs when initializing the
devices on the 2nd ide port. (hdc/hdd: lost interrupt). Reverting the
changes cures everything.
I've double checked the crucial "|5" change against the documentation
for the SiS 5591 chipset, which I have here. Value=0x8a in PCI-register 9
definitely means "bus master capable" (0x80) and "operating mode is
programmable" (0x0a) and "compatibility mode" (~0x05) for both channels.
So the old code was the correct one.
I've seen several complains on this to l-k during the last days which
appeared to be misunderstood as broken changes wrt PCI_CLASS_STORAGE_IDE
(the leading 0x0101 in dev->class). The crucial point however is the
"|5" on the trailing byte.
So, may I ask if there was some good reason for this change?
What have I missed?
Comments?

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test9-pre8

2000-10-04 Thread Martin Diehl

On Tue, 3 Oct 2000, Rik van Riel wrote:
> On Tue, 3 Oct 2000, Martin Diehl wrote:
> > 
> > Just tried 2.4.0-t9p8 + t9p8-vmpatch: No change here. Box
> > appears to hang upon "init 2" (or higher) when starting several
> > things (sendmail, xfs e.g.) with (according to SysRq+p)
> > idle_task being the only one R.
> 
> Now that I think of it ... this could be a new (old?) case
> of a UP-only bug. Is anybody seeing this upon booting their
> SMP system with 'mem=8m' ??

first I've realized it is pretty unaltered no matter whether mem=8M or
mem=192M. Next unexpected observation was apache with 5 preforked childs
starting within seconds (at mem=8M) while sendmail needed 3 minutes or
so, even with mem=192M. So I started to execute everything from the
initscripts by hand, step by step.
Result: it's not VM-related! Several processes (newaliases e.g.) are
blocking in do_poll on udp socket for minutes - probably until timeout.
This behavior started from 2.4.0-t9p8 onwards. Had to hack the box' config
a little bit, but now it's ok (read: working, but not yet understood).
So for me with 2.4.0-test9 (final) on UP there is no VM-related deadlock
anymore (except the rather esoteric swap-to-file-on-ext2-on-ramdisk thing)

Sorry for the noise wrt the init-hang!

Btw, does anybody have an idea, which change (sysctl interface f.e.) on
t9p7->t9p8 may have caused this? Probably I should try to write a test
program to gather more information...

Regards,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test9-pre8

2000-10-03 Thread Martin Diehl

On Tue, 3 Oct 2000, Rik van Riel wrote:

> On Tue, 3 Oct 2000, Martin Diehl wrote:
> 
> > * deadlock in initscripts (even for runlevel 2). SysRq shows idle_task
> >   being the only one ever getting the CPU when deadlocked.
> 
> This suggests tasks yielding the CPU while task->state !=
> TASK_RUNNABLE, which results in them never being rescheduled
> again ...

Just tried 2.4.0-t9p8 + t9p8-vmpatch: No change here. Box appears
to hang upon "init 2" (or higher) when starting several things (sendmail,
xfs e.g.) with (according to SysRq+p) idle_task being the only one R.
However, if I just wait for some 15 minutes or so it finally reaches
the console login prompt - but I'm unable to login, since this seems
to require the same duration, so the login timeout is killing me.
Things seem to speed up, when *continously* pressing SysRq+[tpm]:
Besides screen scrolling there is some minor disk activity, but always
EIP in idle_task. Looks similar to what somebody else reported.
Problem appeares for the first time when switching from vanilla t9p7 to
t9p8+t9p[78]-vmpatch (Haven't tried vanilla t9p[89]).
Will try 2.4.0-test9 (final).

> (time to hunt down the rescheduling points)

Willing to try the patch for this when available.

Regards,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test9-pre8

2000-10-02 Thread Martin Diehl

Hi,

On Mon, 2 Oct 2000, Rik van Riel wrote:

> On Sun, 1 Oct 2000, Linus Torvalds wrote:
> 
> >  - pre8:
> > - quintela: fix the synchronous wait on kmem_cache_shrink().
> >   This should fix the mmap02 lockup.
> 
> It probably doesn't. People will want to apply my patch
> (on http://www.surriel.com/patches/) to -test9-pre8 to
> see if that really makes the box solid.

just repeated the tests wich caused deadlocks on my UP-box (192M RAM,
500M swap) using 2.4.0-t9p8 + 2.4.0-t9p7-vmpatch. All tests done in
single-user (why? - see below):

- mmap002: no deadlock anymore

- swptst (provided by Mark Galbraith - basically ipc001 with shmget()
+friends replaced by anonymous mmap()):  no deadlocks anymore

- boot with mem=8M und doing (several simultaneous) dd's if=/dev/urandom
of=/dev/null with bs=10M: fine too

- boot with mem=8M and make bzImage: works too

so far for the good news - however, there is some bad too as I still
have 3 "box lockup" situations. The first one (not covered here) is at
IDE-initialization when booting and needs more investigantion.
The other 2 are VM-related:

* deadlock in initscripts (even for runlevel 2). SysRq shows idle_task
  being the only one ever getting the CPU when deadlocked. I think I'll
  have to hack my initscripts to analyze this step by step to provide
  more information, if I'm the only one, hanging there.

* Following a suggestion from Jeff Garzik to save the disk from heavy
  trashing during my mem=8M test, I've tried to use a ramdisk for
  swapping - Yes, I know, this is pretty stupid in normal use and might
  even be illegal (i.e. not expected to work by design). Anyway, I've
  tried it and was working when used as a swapdevice (size=64M, bs=4k).
  Added with priority 0 and the normal swap partition kept for fallback
  with prio=-1. No problems. It did even gracefully swapoff the ramdisk
  while it was already filled and the box was swapping to disk.
  To make thinks even more stupid, I've tried a second thing: create
  an ext2-fs (bs=4k) on the ramdisk, mount it, and use a swapfile on
  top of it. This deadlocks (with kswapd being current forever) at the
  very moment the swapfile ist filled and swapping has to go to the
  fallback raw swap partition.
  As already said, I wouldn't be surprised, if swapping to rd were
  broken. But swapping to a rd-partition appears solid while a rd-based
  swapfile deadlocks. Could the difference be explained somehow or might
  it indicate some deadlock path due to VM-fs interaction not
  covered otherwise - so far?

More comments:

2.4.0-t9p8 + 2.4.0-t9p7-vmpatch appears to be a big step in the
right direction. What did impress me most, was the performance boost:
make bzImage with mem=8M needed about 2h to complete - whereas for
t9p7 it was 6-7h! According to vmstat the difference goes in parallel
with CPU-utilisation (u/s/i=10/30/60 for t9p8, was something like 2/8/90
for t9p7).
Haven't tried other combinations (vanilla t9p8 or t9p7+t9p7-vmpatch f.e.)
Waiting for Rik's next patch recently announced to look for the initscript
issue - if it's still there then.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: more testing on 2.4.0-t9p[456] VM deadlocks

2000-09-27 Thread Martin Diehl

On Tue, 26 Sep 2000, Ingo Molnar wrote:

> > Test of 2.4.0-t9p6 + vmfixes-2.4.0-test9-B2 + vmfixes-B2-deadlock.patch
> 
> note that this is effectively test9-pre7 (with a couple of more fixes and
> the new multiqueue stuff), so you might want to test that as well.

Hi,

have tried the same test (mem=8M, make bzImage on UP-box) with vanilla
2.4.0-t9p7. Seems to work for me too: no VM-problems (deadlock, fatal OOM)
after more than 10h of heavy paging. X-3.3.6 apparently has no problems
either, although far from being useable with mem=8M.
However, there is one thing I've changed besides using t9p7: to save
my disk the swap partition on another disk on the second IDE-channel was
used instead of the one on the first, where all mounted ext2-fs's are.
Hence I might have benefit due to some parallel IO. But I believe the
deadlocks happened within the paging code path, thus the parallel access
to the normal fs for .c/.o/tmp rw wouldn't help. Rereading flushed pages
from gcc-binaries on /usr/bin may have relaxed the stress to some extend.
The total time for make bzImage however was not reduced significantly.

BTW, some numbers about scalability - using make bzImage of t9p7 with
identical .config on fresh-booted box as some kind of benchmark:
mem=total duration   max. swap used   kswapd-time   CPU-idle
128M10 min 0 0   0%
 32M11 min 5M1 sec  <5% (but peaks)
 16M32 min20M   45 sec osz. 40+-30%
  8M   6.5 h  13M   28 min  75% (+-20%)

These numbers must not be over-interpreted - just to give an idea.

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: more testing on 2.4.0-t9p[456] VM deadlocks

2000-09-26 Thread Martin Diehl



On Mon, 25 Sep 2000, Marcelo Tosatti wrote:

> There is a known deadlock with Ingo's patch.
> 
> I'm attaching a patch which should fix it. (on top of
> vmfixes-2.4.0-test9-B2) 

Hi,

Thank You! Seems to be much better now:

Test of 2.4.0-t9p6 + vmfixes-2.4.0-test9-B2 + vmfixes-B2-deadlock.patch
did not reveal any deadlock or other VM-related problem during 10h
test (UP, mem=8M, make bzImage and friends).
Compared to my previous tests there was even more stress this time:
- cron.daily started to logrotate/tmpwatch/slocate... This caused
  a jump of the av. compilation time for a single kernel .c-file from
  about 1 min to 30 min or more.
- while the box was paging almost all the time I was able to switch
  swap from raw swap-partition to swapfile on ext2-fs and back without
  problem or fs-related locking issue.
Shm/DMA related stuff not tested so far.
Haven't tried to compare the performance under this next-to-OOM situation
to the old VM.

Taken together, the problems I've seen when testing the new VM on UP box
under low memory/high pressure conditions are apparently solved now.
Well done!

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: more testing on 2.4.0-t9p[456] VM deadlocks

2000-09-25 Thread Martin Diehl




On Mon, 25 Sep 2000, Martin Diehl wrote:

> PS: vmfixes-2.4.0-test9-B2 not yet tested - will do later.

Hi - done now:

using 2.4.0-t9p6 + vmfixes-2.4.0-test9-B2 I ended up with the box
deadlocked again! Was "make bzImage" on UP booted with mem=8M.
After about 4 hours at load 2-3 and almost continously paging the box
is apparently locked up. SysRq+t still shows several processes including
kswapd being scheduled "current" (one after the other of course).

Mem-Info (retyped from SysRq+m):
Active: 847 / inactive dirty: 67 / inactive clean: 0 / free: 64
2x16 + 1x32 + 1x64 + 1x128 = 256kB
Swap cache: add 3353996, delete 3353209, find 2300336/9605753
Free swap: 496144kB
2048 pages of RAM
0 pages of HIGHMEM
490 reserved pages
74 pages shared
787 pages cached
0 pages in page table cache
Buffer memory: 236kB

No change on this at all, despite the scheduling activity still observed.

I've looked up several EIP-values (given by SysRq+p) vs. System.map to get
an idea what is still going on. The functions I've recorded (this # often):

page_launder(10)
try_to_free_buffer(5)
deactivate_page_nolock(4)
refill_inactive_scan(3)
nr_free_pages(1)
wakeup_kswapd(1)
__wake_up(1)
kmem_cache_reap(1)
sys_fstatfs(1)
sys_statfs(1)

The results of this very rudimentary "profiling the deadlock" are far from
statistical significance of course. The only ordering rule implied in this
list is the number of occurences - i.e., I don't see any pattern or call
chain there.

Finally, SysRq+e solved the problem: hanging processes term'ed, VM
deadlock released, box seems to be as useable as after it was booted.

Comments?

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

more testing on 2.4.0-t9p[456] VM deadlocks

2000-09-24 Thread Martin Diehl



Hi,

want to summarize my observations wrt the VM-deadlock issue. Everything
tested on UP box bootet with mem=8M and 500M swap.

2.4.0-t9p4 (vanilla)
 deadlocks almost everywhere (even in initscripts!), simple dd with
 large enough bs deadlock's as soon as page_out should start - i.e. no
 swapspace ever used. Fortunately, deadlock can be cured by SysRq+e.

2.4.0-t9p5+vmfixes-2.4.0-test9-A1
 much better, but still some deadlock. Much harder to trigger in a
 reproducible way. "make dep" on /usr/src/linux seems to be a good
 candidate, initscripts too. At some time some (undetermined) process
 hangs - this process remains the only one ever being current forever!
 However EIP still changes according to SysRq+p. This behaviour can
 not be solved by SysRq+[ekil]. SysRq+[sb] not succesful either,
 so its time for unclean reboot und fsck (which deadlocks too).

2.4.0-t9p6 (vanilla)
 very similar to last one. Put some printk() into it to find out, it
 ends up looping forever due to goto try_again in __alloc_pages().
 Just to give it a try I've added a counter to it restricting the
 number of retries (to some arbitrary number - 5 in may case).
 If this is reached I continue to the code following the goto.
 result: no "deadlocks" anymore! I was even able to run the
 initscripts and make bzImage. However, from time to time some
 processes were killed due to lack of memory. sysctl/overcommit_memory
 was 0 and there was plenty of free swapspace.
 Finally I followed Ingo' suggestion to change __GFP_IO to __GFP_WAIT
 in refill_inactive(). With this there was no process killed anymore!
 I was able to complete make bzImage without anything going wrong.
 Furthermore it was possible to start X and login to KDE - although
 the system performance was far from being useable there was no
 problem (besides disk activity). The box run at av. load 2 for
 several hours doing almost nothing but page_in/out. My try_again
 escape was hit quit often (several 1000 times) but the fallback to
 the critical case seems to work. As I printk()'ed the value of
 memory_pressure in this case I can tell the box was running at
 memory_pressure of 3-4 for more than 5 hours with peaks
 beyond 5 when starting X/KDE e.g. - just in case these numbers
 tell you something.

Taken together, the described fix seems to do the job on my system. The
patch vs. vanilla 2.4.0-t9p6 is included below. I would not claim it
to be "Right" or even more than a hack to find out what's going wrong.
Just want to give you some feedback.

PS: vmfixes-2.4.0-test9-B2 not yet tested - will do later.

Regards
Martin


diff -Nur linux-2.4.0-t9p6.orig/mm/page_alloc.c linux-2.4.0-t9p6/mm/page_alloc.c
--- linux-2.4.0-t9p6.orig/mm/page_alloc.c   Mon Sep 25 01:08:52 2000
+++ linux-2.4.0-t9p6/mm/page_alloc.cMon Sep 25 01:16:50 2000
@@ -295,6 +295,7 @@
int direct_reclaim = 0;
unsigned int gfp_mask = zonelist->gfp_mask;
struct page * page = NULL;
+   int  retry = 5;
 
/*
 * Allocations put pressure on the VM subsystem.
@@ -444,9 +445,14 @@
 * processes, etc).
 */
if (gfp_mask & __GFP_WAIT) {
-   try_to_free_pages(gfp_mask);
-   memory_pressure++;
-   goto try_again;
+   if (--retry > 0) {
+   try_to_free_pages(gfp_mask);
+   memory_pressure++;
+   goto try_again;
+   }
+   printk(KERN_CRIT "critical memory fallback\n");
+   printk("%s - memory_pressure = %i\n",
+   __FUNCTION__, memory_pressure);
}
}
 
diff -Nur linux-2.4.0-t9p6.orig/mm/vmscan.c linux-2.4.0-t9p6/mm/vmscan.c
--- linux-2.4.0-t9p6.orig/mm/vmscan.c   Mon Sep 25 01:08:40 2000
+++ linux-2.4.0-t9p6/mm/vmscan.cSun Sep 24 13:14:14 2000
@@ -891,7 +891,7 @@
do {
made_progress = 0;
 
-   if (current->need_resched && (gfp_mask & __GFP_IO)) {
+   if (current->need_resched && (gfp_mask & __GFP_WAIT)) {
__set_current_state(TASK_RUNNING);
schedule();
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch *] VM deadlock fix

2000-09-22 Thread Martin Diehl

On Thu, 21 Sep 2000, Rik van Riel wrote:

> I've found and fixed the deadlocks in the new VM. They turned out 
> to be single-cpu only bugs, which explains why they didn't crash my
> SMP tesnt box ;)

Hi,

tried
> http://www.surriel.com/patches/2.4.0-t9p2-vmpatch
applied to 2.4.0-t9p4 on UP box booted with mem=8M.

The deadlock behaviour appears to be somehow different compared
to vanilla 2.4.0-t9p4 - however, for me it makes things even worse:

I booted into singleuser and used

dd if=/dev/urandom of=/dev/null count=1 bs=x

to trigger the issue by increasing bs-values. As soon as bs is big
enough to force swapping (about 3M in my case) the box "deadlocks".
What has become worse is, that SysRq+e (or k) doesn't help anymore
with this patch applied. So I had to SysRq+b and ended fscking (but
no fs-corruption). Without the patch this was not a problem.

Some more points I've notized:

* apparently, the deadlock happens when the box begins to swap. I never
  found any used swapspace with the new VM from 2.4.0-t9p*. If memory
  requests force the use of swapspace, the machine deadlocks.

* when, after deadlocking, I pressed SysRq+t several times I found
  - either dd or kswapd being current task in vanilla 2.4.0-t9p4
  - neither dd nor kswapd ever being current with this patch

* as an printk() in the main loop shows, kreclaimd *never* awoke

* My impression was similar to what somebody has already reported:
  seems something related to refill_inactive_scan() is recursing to
  infinity when the "deadlock" happens.

* the behaviour of kswapd without this last patch differs significantly
  before and after the first deadlock happens (and released by SysRq+e):
  only *after* pressing SysRq+e (or k) kswapd awoke once per second
  on the idle box. This is strange since it should sleep with timeout=HZ
  in its main loop.

Especially the last point suggests to me there might be a problem at
initialization. I'm not sure, whether everything called from kswapd
is properly initialized at the time when the kswapd-thread is created.
To check this, I've tentatively added an additional
interruptible_sleep_on_timeout() before kswapd's main loop to delay it
until initialization has finished. Probably it would be more "Right" to
move the sleep from the end of the main loop to its beginning - however,
I just tried a quick hack and did not check if the *_shortage() stuff is
ready to be called at init time.

The additional sleep before kswapd enters its main loop was a major
improvement for me:

* my dd-tests did not deadlock anymore - even with bs=100M and mem=8M

* swap space was really used now.

* i was able to advance beyond singleuser with 2.4.0-t9p* and mem=8M
  for the very first time (always deadlocked in the init-scripts)

* i was even able to make bzImage - but it dumped core after about 15 Min
  for unknown reason (probably out of memory) but without any deadlock.
  Box was at av. load 3 and 15M swap used at this time.

* I found kreclaimd *was* awoken several times.

* however, kswapd still not awaking every second after fresh boot. Now
  it begins to awake as soon as real swapping starts.

So, my conclusion is the "deadlock" issue might be mainly an
initialization problem. Probably some more special handling is needed
at swapon later. Currently my guess is there is a initialization problem
when kswapd starts and some kind of blocking when refill_inactive_scan()
is called before swapon.

Comments?
Will do some more tests (including your latest patch).

Regards
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Oops: Unable to handle kernel null pointer

2000-09-21 Thread Martin Diehl



On Wed, 20 Sep 100, Sebastian Willing wrote:

It's almost impossible to extract some useful information from your
oops without your kernel symbols (Documentation/oops-tracing.txt).
However, to make a guess, this

> Unable to handle kernel NULL pointer dereference at
virtual address 0034

looks similar to the quota issue, for wich I've posted a fix some days
ago. If, after applying ksymoops to your trace, you find out the oops
happens in check_idq() called from dquot_transfer() you might be
happy with this patch (search for [PATCH] and dquot_transfer in l-k).

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [Oops] Unable to handle kernel NULL pointer dereference

2000-09-21 Thread Martin Diehl




On Wed, 20 Sep 2000, J Brook wrote:

> >>EIP; c01527b9<= 
> Trace; c015357b  

this is the quota issue for which I've posted a fix some days ago.
It's (as of 2.4.0-t9p5) waiting on the TODO list to be merged.
I'd consider it "critical" (wrt what Linus accepts for 2.4.0) as
processes calling sys_chown() may be trapped in D-state forever so you end
up fscking.

Martin

--- linux-2.4.0-test8/fs/dquot.c.orig   Mon Sep 11 01:42:56 2000
+++ linux-2.4.0-test8/fs/dquot.cMon Sep 11 02:12:04 2000
@@ -1285,12 +1285,15 @@
blocks = isize_to_blocks(inode->i_size, BLOCK_SIZE_BITS);
else
blocks = (inode->i_blocks >> 1);
-   for (cnt = 0; cnt < MAXQUOTAS; cnt++)
+   for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
+   if (transfer_to[cnt] == NODQUOT)
+   continue;
if (check_idq(transfer_to[cnt], 1) == NO_QUOTA ||
check_bdq(transfer_to[cnt], blocks, 0) == NO_QUOTA) {
cnt = MAXQUOTAS;
goto put_all;
}
+   }
 
if ((error = notify_change(dentry, iattr)))
goto put_all; 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Update Linux 2.4 Status/TODO list

2000-09-13 Thread Martin Diehl



On Tue, 12 Sep 2000, David Ford wrote:

> Please add 'Quota support causes OOPS'  Someone posted a patch but I don't
> have the reference offhand.  That patch appears to have fixed one person's
> problems.
[..]
> 
> >  * Oops in dquot_transfer (David Ford, Martin Diehl) (Jan Kara has a
> >potential patch)
> 
> I believe this would be the referenced patch.

after getting some positive feedback on this patch I've just sent it to
Linus. Haven't CC'd to l-k as it was already there. I'm reposting it below
in case somebody is missing it. Alan did already include it in 2.2.18pre5
which had the same Oops introduced in 2.2.18pre4.

Martin

--- linux-2.4.0-test8/fs/dquot.c.orig   Mon Sep 11 01:42:56 2000
+++ linux-2.4.0-test8/fs/dquot.cMon Sep 11 02:12:04 2000
@@ -1285,12 +1285,15 @@
blocks = isize_to_blocks(inode->i_size, BLOCK_SIZE_BITS);
else
blocks = (inode->i_blocks >> 1);
-   for (cnt = 0; cnt < MAXQUOTAS; cnt++)
+   for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
+   if (transfer_to[cnt] == NODQUOT)
+   continue;
if (check_idq(transfer_to[cnt], 1) == NO_QUOTA ||
check_bdq(transfer_to[cnt], blocks, 0) == NO_QUOTA) {
cnt = MAXQUOTAS;
goto put_all;
}
+   }
 
if ((error = notify_change(dentry, iattr)))
goto put_all; 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.18pre4

2000-09-11 Thread Martin Diehl

On Sun, 10 Sep 2000, Alan Cox wrote:

> 2.2.18pre4
> o Fix some of the dquot races (Jan Kara)

this appears to be basically the same patch as applied to 2.4.0t8 vs. t7
producing an Oops in dquot_transfer(). This issue can (at least) be
triggered by chown'ing a file on an (userquota && !groupquota) filesystem
from an user with unlimited quota to one who is restricted.
I've sent a fix for this to the diskquota-maintainer ([EMAIL PROTECTED])
and l-k yesterday. With a few lines offset the same patch should be
applicable to 2.2.18pre4 - but haven't tested in this context!

Martin

--- linux-2.4.0-test8/fs/dquot.c.orig   Mon Sep 11 01:42:56 2000
+++ linux-2.4.0-test8/fs/dquot.cMon Sep 11 02:12:04 2000
@@ -1285,12 +1285,15 @@
blocks = isize_to_blocks(inode->i_size, BLOCK_SIZE_BITS);
else
blocks = (inode->i_blocks >> 1);
-   for (cnt = 0; cnt < MAXQUOTAS; cnt++)
+   for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
+   if (transfer_to[cnt] == NODQUOT)
+   continue;
if (check_idq(transfer_to[cnt], 1) == NO_QUOTA ||
check_bdq(transfer_to[cnt], blocks, 0) == NO_QUOTA) {
cnt = MAXQUOTAS;
goto put_all;
}
+   }

if ((error = notify_change(dentry, iattr)))
goto put_all; 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] (was: [OOPS] dquot_transfer() - 2.4.0-test8)

2000-09-10 Thread Martin Diehl



On Mon, 11 Sep 2000, Martin Diehl wrote:

> transfer_to[cnt] is initialized to NODQUOT from the first loop
> (due to several continue's e.g.) when entering the second loop.
> Unfortunately I do not feel familiar enough to the quota code to
> provide a patch for this problem.

well, was a little bit to pessimistic. After some look at the code
I'm pretty sure the obvious check will solve it - succesfully tested
on local UP box.
Somebody with better knowledge of the logic behind dquot_transfer()
should check please, whether any special treatment is needed.

Martin

--- linux-2.4.0-test8/fs/dquot.c.orig   Mon Sep 11 01:42:56 2000
+++ linux-2.4.0-test8/fs/dquot.cMon Sep 11 02:12:04 2000
@@ -1285,12 +1285,15 @@
blocks = isize_to_blocks(inode->i_size, BLOCK_SIZE_BITS);
else
blocks = (inode->i_blocks >> 1);
-   for (cnt = 0; cnt < MAXQUOTAS; cnt++)
+   for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
+   if (transfer_to[cnt] == NODQUOT)
+   continue;
if (check_idq(transfer_to[cnt], 1) == NO_QUOTA ||
check_bdq(transfer_to[cnt], blocks, 0) == NO_QUOTA) {
cnt = MAXQUOTAS;
goto put_all;
}
+   }
 
if ((error = notify_change(dentry, iattr)))
goto put_all; 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[OOPS] dquot_transfer() - 2.4.0-test8

2000-09-10 Thread Martin Diehl



got a reproducible oops with 2.4.0-test8 when trying to login via kdm
as user with restricted quota on local fs - ssh/telnet do not trigger
this issue. 2.4.0-test7 was fine too.
The enclosed trace shows a NULL pointer dereference of an unchecked
struct dquot * passed to check_idq() - called from dquot_transfer().
Looking at the diff's of test7 vs. test8, I believe the reason might
be the new cnt=0..MAXQUOTAS-loop from which check_idq() is called.
Located after the first loop of this kind it might happen that
transfer_to[cnt] is initialized to NODQUOT from the first loop
(due to several continue's e.g.) when entering the second loop.
Unfortunately I do not feel familiar enough to the quota code to
provide a patch for this problem.

Martin

PS: chown of a root-owned file (no quota for root) to some user with
quota triggers the same problem. After several repetitions the chown
ended up in 'D' state even prohibiting sync'ing the disks.

output from ksymoops as follows:
---
ksymoops 2.3.3 on i586 2.4.0-test8.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0-test8/ (default)
 -m /boot/System.map-2.4.0-test8 (specified)

Sep 11 00:36:47 srv kernel: Unable to handle kernel NULL pointer
   dereference at virtual address 0034 
Sep 11 00:36:47 srv kernel: c015e131 
Sep 11 00:36:47 srv kernel: *pde =  
Sep 11 00:36:47 srv kernel: Oops:  
Sep 11 00:36:47 srv kernel: CPU:0 
Sep 11 00:36:47 srv kernel: EIP:0010:[check_idq+13/304] 
Sep 11 00:36:47 srv kernel: EFLAGS: 00010202 
Sep 11 00:36:47 srv kernel: eax:    ebx:    ecx: 0001
   edx: 0001 
Sep 11 00:36:47 srv kernel: esi: 8180   edi: 0004   ebp: c2f7df24
   esp: c2f7dee8 
Sep 11 00:36:47 srv kernel: ds: 0018   es: 0018   ss: 0018 
Sep 11 00:36:47 srv kernel: Process kdm (pid: 889, stackpage=c2f7d000) 
Sep 11 00:36:47 srv kernel: Stack:  c015ee77  0001
 c2f7df54 8180 c2fc71c0 bfffea6c  
Sep 11 00:36:47 srv kernel:0001 c2f7df2c 000b c01346a2
 ff86 df58 c2fe27e0   
Sep 11 00:36:47 srv kernel:   c012aba2
 c2fc71c0 c2f7df54 c2fe27e0   
Sep 11 00:36:47 srv kernel: Call Trace: [dquot_transfer+615/1168]
[cached_lookup+14/80]
[chown_common+254/280]
[__user_walk+75/84]
[sys_chown+47/68]
[sys_chown16+47/52]
[system_call+51/64]  
Sep 11 00:36:47 srv kernel: Code: f6 43 34 40 74 09 31 c0 e9 11 01 00 00
  89 f6 8b 53 48 85 d2  
Using defaults from ksymoops -t elf32-i386 -a i386

Code;   Before first symbol
 <_EIP>:
Code;   Before first symbol
   0:   f6 43 34 40   testb  $0x40,0x34(%ebx)
Code;  0004 Before first symbol
   4:   74 09 je f <_EIP+0xf> 000f Before first symbol
Code;  0006 Before first symbol
   6:   31 c0 xor%eax,%eax
Code;  0008 Before first symbol
   8:   e9 11 01 00 00jmp11e <_EIP+0x11e> 011e Before first symbol
Code;  000d Before first symbol
   d:   89 f6 mov%esi,%esi
Code;  000f Before first symbol
   f:   8b 53 48  mov0x48(%ebx),%edx
Code;  0012 Before first symbol
  12:   85 d2 test   %edx,%edx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Hmm.. "notify_parent()".

2000-08-30 Thread Martin Diehl




On Mon, 28 Aug 2000, Linus Torvalds wrote:

> "notify_parent()" uses p->p_pptr without any locking. As far as I can
> tell, that is wrong. It looks like it should have a read-lock on the
> tasklist_lock in order to not be racy (perhaps the parent does an exit on
> another CPU at just this moment), but it gets slightly ugly because it is
> already called occasionally from contexts that already have it, and in
> other places from contexts that do _not_ have it.
> 
> Is there some reason you can see why this isn't a bug? Fixing it looks
> simple, but either involves making all callers of "notify_parent()" get
> the tasklist lock, or by using a separate "already locked" version for the
> case where we have the lock before (ie "do_notify_parent()"). Issues?

FYI:
A few days ago somebody in a local list discovered the following message
in his syslog running 2.2.14 on a SMP machine:

kernel: eh? notify_parent with state 0?

It appears to me that tsk->state changed to TASK_RUNNING probably due to
a race. Although he did not observe any harmful impact on his system,
this might be the kind of bug you are talking about.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

46 matches

Mail list logo