Re: Attansic L1 page corruption (was: 2.6.22-rc5: pdflush oops under heavy disk load)

2007-06-25 Thread Luca

On 6/25/07, Jay L. T. Cornwall <[EMAIL PROTECTED]> wrote:

Jay Cliburn wrote:

> For reasons not yet clear to me, it appears the L1 driver has a bug or
> the device itself has trouble with DMA in high memory.  This patch,
> drafted by Luca Tettamanti, is being explored as a workaround.  I'd be
> interested to know if it fixes your problem.

Yes, it certainly seems to. Now running with this patch and 4GB active,
I've transferred about 15GB with no problem so far. It usually oopses
after a GB or two.

I guess it's not an ideal solution, architecturally speaking, but it's a
good deal better than an unstable driver.


It may cause a "bounce" (i.e. data is copied to another buffer in
lower memory) when a skb is allocated in high memory. Furthermore - at
least on AMD systems - it should be possible to use the IOMMU to remap
the memory to a bus address < 4GB.

Xiong can you comment on this issue? To recap: users are seeing hard
locks when L1 driver does a DMA to/from a high memory area (physical
address > 4GB). Limiting DMA to the lower 4GB with:

pci_set_dma_mask(pdev, DMA_32BIT_MASK);

cures the issue. Does L1 have any know problem decoding 64 addresses?

Luca
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Attansic L1 page corruption (was: 2.6.22-rc5: pdflush oops under heavy disk load)

2007-06-25 Thread Jay L. T. Cornwall
Jay Cliburn wrote:

> For reasons not yet clear to me, it appears the L1 driver has a bug or
> the device itself has trouble with DMA in high memory.  This patch,
> drafted by Luca Tettamanti, is being explored as a workaround.  I'd be
> interested to know if it fixes your problem.

Yes, it certainly seems to. Now running with this patch and 4GB active,
I've transferred about 15GB with no problem so far. It usually oopses
after a GB or two.

I guess it's not an ideal solution, architecturally speaking, but it's a
good deal better than an unstable driver. If there's any other patches
you'd like me to test or traces to capture, I'm happy to help out.
Otherwise I'll run with this one for now since it does the job!

Thanks.

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Jesper Juhl

On 22/06/07, Chuck Ebbert <[EMAIL PROTECTED]> wrote:
[snip]


Step 1: run fsck on the filesystem.


I agree that running fsck on the filesystem is a good idea, but still,
even a corrupt filesystem should never be able to cause an Oops. In
fact, nothing done from userspace should be able to cause an Oops.

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Jay Cliburn
On Sun, 24 Jun 2007 21:31:36 +0100
"Jay L. T. Cornwall" <[EMAIL PROTECTED]> wrote:

> Jay Cliburn wrote:
> 
> >> The common factor here seems to be the buffer_head circular list
> >> leading to invalid pointers in bh->b_this_page.
> >>
> >> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver
> >> (marked as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these
> >> panics on disk-to-disk copies or SCP across the localhost
> >> interface. However, SCP from a server onto either of two different
> >> HDDs hits these oopses fairly quickly.
> 
> > How much RAM is installed in your machine?  If it's 4GB or more,
> > does your problem go away if you boot with mem=3000M?
> 
> Intriguing. Yes, this machine has 4GB of RAM. If I boot with mem=3000M
> the problem does indeed go away - I can't induce an oops even after
> transferring tens of GB across the interface.
> 
> I'm not sure I follow why that would be the case, except that it
> relates to pci_map_page behaviour. But I guess you have an inkling?
> 

For reasons not yet clear to me, it appears the L1 driver has a bug or
the device itself has trouble with DMA in high memory.  This patch,
drafted by Luca Tettamanti, is being explored as a workaround.  I'd be
interested to know if it fixes your problem.

[Aside: For future reference, [EMAIL PROTECTED] is a
mailing list devoted to L1 driver development.]

Jay



diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index 6862c11..a600601 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -2104,15 +2104,12 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
if (err)
return err;
 
-   err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
+   err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
if (err) {
-   err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
-   if (err) {
-   dev_err(&pdev->dev, "no usable DMA configuration\n");
-   goto err_dma;
-   }
-   pci_using_64 = false;
+   dev_err(&pdev->dev, "no usable DMA configuration\n");
+   goto err_dma;
}
+   pci_using_64 = false;
/* Mark all PCI regions associated with PCI device
 * pdev as being reserved by owner atl1_driver_name
 */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Jay L. T. Cornwall
Jay Cliburn wrote:

>> The common factor here seems to be the buffer_head circular list
>> leading to invalid pointers in bh->b_this_page.
>>
>> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver
>> (marked as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these
>> panics on disk-to-disk copies or SCP across the localhost interface.
>> However, SCP from a server onto either of two different HDDs hits
>> these oopses fairly quickly.

> How much RAM is installed in your machine?  If it's 4GB or more, does
> your problem go away if you boot with mem=3000M?

Intriguing. Yes, this machine has 4GB of RAM. If I boot with mem=3000M
the problem does indeed go away - I can't induce an oops even after
transferring tens of GB across the interface.

I'm not sure I follow why that would be the case, except that it relates
to pci_map_page behaviour. But I guess you have an inkling?

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Jay Cliburn
On Sat, 23 Jun 2007 13:14:40 +0100
"Jay L. T. Cornwall" <[EMAIL PROTECTED]> wrote:


> The common factor here seems to be the buffer_head circular list
> leading to invalid pointers in bh->b_this_page.
> 
> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver
> (marked as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these
> panics on disk-to-disk copies or SCP across the localhost interface.
> However, SCP from a server onto either of two different HDDs hits
> these oopses fairly quickly.

How much RAM is installed in your machine?  If it's 4GB or more, does
your problem go away if you boot with mem=3000M?

Jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: (Last oops is Tainted: P) Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Oleg Verych
On Sun, Jun 24, 2007 at 11:10:09AM +0100, Jay L. T. Cornwall wrote:
> Oleg Verych wrote:
> 
> >> That sounds like a good theory: you're getting easily-hit oopses in one of
> >> the kernel's most-used codepaths which hasn't chanbged much in a long
> >> time.  So Something Odd Has Happened.
> 
> > Maybe this time it's just "Tainted: P"?
> 
> That'sthe NVIDIA module, which isn't doing much with X shut down
> regardless. It was bad form to forget this, of course, but is unrelated
> to the problem.
> 
> > And oops have no ext3, like prev. one.
> 
> I know. This isn't ext3 related and I'm fairly certain drivers/net/atl1
> is trashing... something. Perhaps the page table because:

Last oops log was with tainting (as subject reflects), before that i've
saw ext3 and "run fsck" reply. Thus, really clean oops log with all
details, not that you have just posted may be useful.

> [  153.785325] Bad page state in process 'scp'
> [  153.785327] page:81000308d020 flags:0x0040ad41dc050845
> mapping:53dfe57d17cc59cf mapcount:16885953 count:292554304
> [  153.785329] Trying to fix it up, but a reboot is needed
> 
> This one dismisses a reference counting issue because the page data here
> looks like garbage. And a panic in VLC, playing a video across the
> network hits a similar problem:

OK, i see now you are in Windows now, but i will try to ask you about
making testcase using `netcat' or `curl'. If hardware is in trouble,
probably network stressing could trigger that. And clean *one* test
script and no X (or other stuff) will surely help.

[ Netiquette here is being voluntary noise filter, after joining any   ]
[ thread, because reply-to-all is the way of communication in the LKML ]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: (Last oops is Tainted: P) Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Jay L. T. Cornwall
Oleg Verych wrote:

>> That sounds like a good theory: you're getting easily-hit oopses in one of
>> the kernel's most-used codepaths which hasn't chanbged much in a long
>> time.  So Something Odd Has Happened.

> Maybe this time it's just "Tainted: P"?

That'sthe NVIDIA module, which isn't doing much with X shut down
regardless. It was bad form to forget this, of course, but is unrelated
to the problem.

> And oops have no ext3, like prev. one.

I know. This isn't ext3 related and I'm fairly certain drivers/net/atl1
is trashing... something. Perhaps the page table because:

[  153.785325] Bad page state in process 'scp'
[  153.785327] page:81000308d020 flags:0x0040ad41dc050845
mapping:53dfe57d17cc59cf mapcount:16885953 count:292554304
[  153.785329] Trying to fix it up, but a reboot is needed

This one dismisses a reference counting issue because the page data here
looks like garbage. And a panic in VLC, playing a video across the
network hits a similar problem:

[ 9194.281809]  [] page_remove_rmap+0x53/0x110
[ 9194.281819]  [] unmap_vmas+0x4ec/0x7c0
[ 9194.281852]  [] unmap_region+0xcc/0x170
[ 9194.281867]  [] do_munmap+0x22a/0x2f0
[ 9194.281877]  [] __down_write_nested+0x12/0xb0
[ 9194.281892]  [] sys_shmdt+0xb6/0x150
[ 9194.281903]  [] system_call+0x7e/0x83
[ 9194.281921]
[ 9194.281924]
[ 9194.281925] Code: 48 2b ba 98 21 00 00 48 c1 ff 03 48 0f af f8 48 03
ba a8 21
[ 9194.281973] RIP  [] page_to_pfn+0x19/0x40

> Jay, check your oops against "Tainted: P" flag, which is not supported
> here, and not drop persons, who assisted you from the CC list.

My apologies, I had thought the etiquette was to only include
maintainers on the CC list.

I'll try and locate a maintainer for the Attansic driver a bit later,
but I've only seen people loosely related to it. In any case we may as
well let this thread die because it's not related to a filesystem bug
(which the CC list is presumably interested in).

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


(Last oops is Tainted: P) Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-24 Thread Oleg Verych
* From: Andrew Morton
* Newsgroups: linux.kernel
* Date: Sat, 23 Jun 2007 10:23:18 -0700
>
> On Sat, 23 Jun 2007 13:14:40 +0100 "Jay L.  T.  Cornwall" <[EMAIL PROTECTED]>
> wrote:
>
>> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver (marked
>> as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these panics on
>> disk-to-disk copies or SCP across the localhost interface. However, SCP
>> from a server onto either of two different HDDs hits these oopses fairly
>> quickly.
>
> That sounds like a good theory: you're getting easily-hit oopses in one of
> the kernel's most-used codepaths which hasn't chanbged much in a long
> time.  So Something Odd Has Happened.

Maybe this time it's just "Tainted: P"?

|-*- <[EMAIL PROTECTED]> -*-
...
i2c_algo_bit dib3000mc nvidia(P) dibx000_common snd_timer tveeprom atl1
compat_ioctl32 i2c_core videodev mii psmouse snd_seq_device v4l1_compat
video_buf v4l2_common btcx_risc pcspkr shpchp snd soundcore
snd_page_alloc intel_agp pci_hotplug serio_raw tsdev evdev sr_mod cdrom
ext3 jbd mbcache sg sd_mod pata_jmicron usbhid hid ata_generic ata_piix
ahci libata scsi_mod generic ehci_hcd uhci_hcd usbcore thermal
processor fan
[  628.139866] Pid: 201, comm: kswapd0 Tainted: P
...
|-*-

And oops have no ext3, like prev. one.

[ as you know we have no automatic noise tracking system, and ]
[ developers were not so productive in last discussion of it  ]

Jay, check your oops against "Tainted: P" flag, which is not supported
here, and not drop persons, who assisted you from the CC list.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-23 Thread Andrew Morton
On Sat, 23 Jun 2007 13:14:40 +0100 "Jay L.  T.  Cornwall" <[EMAIL PROTECTED]>
wrote:

> Jay L. T. Cornwall wrote:
> 
> > Already done. The filesystem came back as clean after the first oops,
> > but I forced a recheck with fsck to be safe - it found no problems.
> > 
> > This is reproducible on a clean filesystem.
> 
> Following up on this, I've now extracted another oops (at the bottom of
> this mail).
> 
> The common factor here seems to be the buffer_head circular list leading
> to invalid pointers in bh->b_this_page.
> 
> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver (marked
> as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these panics on
> disk-to-disk copies or SCP across the localhost interface. However, SCP
> from a server onto either of two different HDDs hits these oopses fairly
> quickly.

That sounds like a good theory: you're getting easily-hit oopses in one of
the kernel's most-used codepaths which hasn't chanbged much in a long
time.  So Something Odd Has Happened.

> Is it even possible for the Ethernet driver to corrupt ext3 data
> structures, short of trashing memory?

I suppose so.

I'd suggest that you enable every kernel debugging feature you can get your
hands on (in the Kernel Hacking menu) and see if that turns anything up.

Failing that, if you can whack a different network card in that machine it
would help to firm or deny your suspicion.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-23 Thread Jay L. T. Cornwall
Jay L. T. Cornwall wrote:

> Already done. The filesystem came back as clean after the first oops,
> but I forced a recheck with fsck to be safe - it found no problems.
> 
> This is reproducible on a clean filesystem.

Following up on this, I've now extracted another oops (at the bottom of
this mail).

The common factor here seems to be the buffer_head circular list leading
to invalid pointers in bh->b_this_page.

I'm beginning to suspect the Attansic L1 Gigabit Etherner driver (marked
as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these panics on
disk-to-disk copies or SCP across the localhost interface. However, SCP
from a server onto either of two different HDDs hits these oopses fairly
quickly.

Is it even possible for the Ethernet driver to corrupt ext3 data
structures, short of trashing memory?

[  628.135241] general protection fault:  [1] SMP
[  628.135422] CPU 1
[  628.135522] Modules linked in: usb_storage libusual netconsole
binfmt_misc rfcomm l2cap bluetooth ppdev capability commoncap
acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_ondemand
cpufreq_conservative cpufreq_powersave freq_table video container
battery dock asus_acpi ac sbs button af_packet ipv6 nls_utf8 ntfs
w83627ehf i2c_isa parport_pc lp parport fuse snd_hda_intel snd_pcm_oss
snd_mixer_oss mt2060 snd_pcm snd_seq_dummy cx22702 snd_seq_oss cx88_dvb
cx88_vp3054_i2c video_buf_dvb snd_seq_midi snd_rawmidi
snd_seq_midi_event snd_seq cx8800 nls_cp437 dvb_usb_dib0700 dib7000m
dib7000p dvb_usb cx8802 cx88xx dvb_core cifs ir_common dvb_pll
i2c_algo_bit dib3000mc nvidia(P) dibx000_common snd_timer tveeprom atl1
compat_ioctl32 i2c_core videodev mii psmouse snd_seq_device v4l1_compat
video_buf v4l2_common btcx_risc pcspkr shpchp snd soundcore
snd_page_alloc intel_agp pci_hotplug serio_raw tsdev evdev sr_mod cdrom
ext3 jbd mbcache sg sd_mod pata_jmicron usbhid hid ata_generic ata_piix
ahci libata scsi_mod generic ehci_hcd uhci_hcd usbcore thermal processor fan
[  628.139866] Pid: 201, comm: kswapd0 Tainted: P   2.6.22-rc5-edge #1
[  628.139952] RIP: 0010:[]  []
free_block+0x10e/0x160
[  628.140108] RSP: 0018:8101322ebaf0  EFLAGS: 00010046
[  628.140190] RAX: 3eac08c8be1ff284 RBX: 810039616f68 RCX:
810127524c00
[  628.140278] RDX: bc10c1d4beae1915 RSI: 810039616000 RDI:
dc050844
[  628.140366] RBP: 8101322ebb40 R08: 81013b07cc07 R09:
ffe9
[  628.140455] R10: 81013c68e1c0 R11: 88108740 R12:
81013b81b800
[  628.140542] R13: 0001 R14: 0001 R15:
0640
[  628.140630] FS:  () GS:81013b07cac0()
knlGS:
[  628.140750] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[  628.140832] CR2: 2ae5699e CR3: 0001001c5000 CR4:
06e0
[  628.140921] Process kswapd0 (pid: 201, threadinfo 8101322ea000,
task 81013b15a3c0)
[  628.141039] Stack:  8100e380 81013b078400
0640 0640
[  628.141320]  81013b81b800 0246 8101322ebd90
80292846
[  628.141563]  810039616f68 810039616f68 810039616f68
81012ad8d598
[  628.141746] Call Trace:
[  628.141886]  [] kmem_cache_free+0x1b6/0x1d0
[  628.141979]  [] free_buffer_head+0x20/0x50
[  628.142063]  [] try_to_free_buffers+0x64/0xa0
[  628.142153]  [] shrink_inactive_list+0x82a/0x960
[  628.142274]  [] shrink_active_list+0x421/0x4e0
[  628.142395]  [] shrink_zone+0xcb/0x140
[  628.142484]  [] kswapd+0x3ea/0x560
[  628.142578]  [] autoremove_wake_function+0x0/0x30
[  628.142679]  [] kswapd+0x0/0x560
[  628.142762]  [] kthread+0x4b/0x80
[  628.142846]  [] child_rip+0xa/0x12
[  628.142942]  [] kthread+0x0/0x80
[  628.143023]  [] child_rip+0x0/0x12
[  628.143106]
[  628.143171]
[  628.143172] Code: 48 89 30 0f 85 44 ff ff ff 48 83 c4 08 5b 5d 41 5c
41 5d 41
[  628.144015] RIP  [] free_block+0x10e/0x160
[  628.144130]  RSP 

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-22 Thread Jay L. T. Cornwall
Chuck Ebbert wrote:

> On 06/21/2007 08:07 PM, Jay L. T. Cornwall wrote:
>> [  724.350222] general protection fault:  [1] SMP
>> [  724.350413] CPU 1
>> 
>> [  724.355028] Pid: 199, comm: pdflush Not tainted 2.6.22-rc5-edge #1
>> [  724.355125] RIP: 0010:[]  []
>> :ext3:walk_page_buffers+0x34/0x90

> Step 1: run fsck on the filesystem.

Already done. The filesystem came back as clean after the first oops,
but I forced a recheck with fsck to be safe - it found no problems.

This is reproducible on a clean filesystem.

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5: pdflush oops under heavy disk load

2007-06-22 Thread Chuck Ebbert
On 06/21/2007 08:07 PM, Jay L. T. Cornwall wrote:
> Hi,
> 
> Kernel version: 2.6.22-rc5 (confirmed also on 2.6.20)
> Kernel config : Ubuntu 7.04 default (SMP)
> 
> Relevant hardware:
>   Asus P5K (Intel P35 chipset)
>   Core 2 Duo E6600 2.4GHz
>   Western Digital 10KRPM 150GB HDD on JMicron 20360/20363 AHCI
> 
> Netconsoled dump:
> 
> [  724.350222] general protection fault:  [1] SMP
> [  724.350413] CPU 1
> [  724.350520] Modules linked in: usb_storage libusual netconsole
> binfmt_misc rfcomm l2cap bluetooth ppdev capability commoncap
> acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_ondemand
> cpufreq_conservative cpufreq_powersave freq_table video container
> battery dock asus_acpi ac sbs button af_packet nls_utf8 ntfs w83627ehf
> i2c_isa parport_pc lp parport fuse mt2060 snd_hda_intel snd_pcm_oss
> snd_mixer_oss snd_pcm cx22702 snd_seq_dummy snd_seq_oss dvb_usb_dib0700
> dib7000m dib7000p dvb_usb cx88_dvb cx88_vp3054_i2c snd_seq_midi
> snd_rawmidi video_buf_dvb dvb_core ipv6 snd_seq_midi_event snd_seq
> snd_timer dvb_pll cx8800 cx8802 cx88xx sr_mod ir_common snd_seq_device
> cdrom i2c_algo_bit dib3000mc dibx000_common tveeprom atl1 usbhid psmouse
> videodev compat_ioctl32 hid mii i2c_core v4l2_common v4l1_compat
> btcx_risc video_buf serio_raw snd soundcore pcspkr shpchp pci_hotplug
> snd_page_alloc intel_agp tsdev evdev ext3 jbd mbcache sg sd_mod
> pata_jmicron ata_generic ata_piix ahci libata scsi_mod ehci_hcd generic
> uhci_hcd usbcore thermal processor fan
> [  724.355028] Pid: 199, comm: pdflush Not tainted 2.6.22-rc5-edge #1
> [  724.355125] RIP: 0010:[]  []
> :ext3:walk_page_buffers+0x34/0x90

Step 1: run fsck on the filesystem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc5: pdflush oops under heavy disk load

2007-06-21 Thread Jay L. T. Cornwall
Hi,

Kernel version: 2.6.22-rc5 (confirmed also on 2.6.20)
Kernel config : Ubuntu 7.04 default (SMP)

Relevant hardware:
  Asus P5K (Intel P35 chipset)
  Core 2 Duo E6600 2.4GHz
  Western Digital 10KRPM 150GB HDD on JMicron 20360/20363 AHCI

Netconsoled dump:

[  724.350222] general protection fault:  [1] SMP
[  724.350413] CPU 1
[  724.350520] Modules linked in: usb_storage libusual netconsole
binfmt_misc rfcomm l2cap bluetooth ppdev capability commoncap
acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_ondemand
cpufreq_conservative cpufreq_powersave freq_table video container
battery dock asus_acpi ac sbs button af_packet nls_utf8 ntfs w83627ehf
i2c_isa parport_pc lp parport fuse mt2060 snd_hda_intel snd_pcm_oss
snd_mixer_oss snd_pcm cx22702 snd_seq_dummy snd_seq_oss dvb_usb_dib0700
dib7000m dib7000p dvb_usb cx88_dvb cx88_vp3054_i2c snd_seq_midi
snd_rawmidi video_buf_dvb dvb_core ipv6 snd_seq_midi_event snd_seq
snd_timer dvb_pll cx8800 cx8802 cx88xx sr_mod ir_common snd_seq_device
cdrom i2c_algo_bit dib3000mc dibx000_common tveeprom atl1 usbhid psmouse
videodev compat_ioctl32 hid mii i2c_core v4l2_common v4l1_compat
btcx_risc video_buf serio_raw snd soundcore pcspkr shpchp pci_hotplug
snd_page_alloc intel_agp tsdev evdev ext3 jbd mbcache sg sd_mod
pata_jmicron ata_generic ata_piix ahci libata scsi_mod ehci_hcd generic
uhci_hcd usbcore thermal processor fan
[  724.355028] Pid: 199, comm: pdflush Not tainted 2.6.22-rc5-edge #1
[  724.355125] RIP: 0010:[]  []
:ext3:walk_page_buffers+0x34/0x90
[  724.355305] RSP: 0018:8101322e7bb0  EFLAGS: 00010202
[  724.355394] RAX:  RBX: 9d8145bd RCX:
1000
[  724.355491] RDX: 9d8145bd RSI: 908553557cc5eb6f RDI:
81012e1052a0
[  724.355587] RBP: 3b028b7a R08:  R09:
880f1ba0
[  724.355684] R10:  R11: 0001 R12:
9d8145bd
[  724.355780] R13: 908553557cc5eb6f R14: 8100369a5200 R15:

[  724.357278] FS:  () GS:81013b07cac0()
knlGS:
[  724.357410] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[  724.357501] CR2: 2b776e178000 CR3: 00013a245000 CR4:
06e0
[  724.357598] Process pdflush (pid: 199, threadinfo 8101322e6000,
task 81013b15aaa0)
[  724.357730] Stack:  880f1ba0 1000
81012e1052a0 81013de27c38
[  724.358031]  81012e1052a0 2e1052a0 8100369a5200
8101322e7e50
[  724.358292]  000e 880f4fca 81012e545b08
0003
[  724.358489] Call Trace:
[  724.358638]  [] :ext3:bget_one+0x0/0x10
[  724.358742]  [] :ext3:ext3_ordered_writepage+0xea/0x190
[  724.358846]  [] __writepage+0xa/0x30
[  724.358937]  [] write_cache_pages+0x224/0x350
[  724.359030]  [] __writepage+0x0/0x30
[  724.359147]  [] do_writepages+0x2b/0x40
[  724.359239]  [] __writeback_single_inode+0xa6/0x3e0
[  724.359348]  [] sync_sb_inodes+0x1f6/0x2f0
[  724.359445]  [] writeback_inodes+0xbf/0x100
[  724.359542]  [] background_writeout+0xa9/0xe0
[  724.359648]  [] pdflush+0x0/0x220
[  724.359739]  [] pdflush+0x140/0x220
[  724.359829]  [] background_writeout+0x0/0xe0
[  724.359927]  [] kthread+0x4b/0x80
[  724.360018]  [] child_rip+0xa/0x12
[  724.360120]  [] kthread+0x0/0x80
[  724.360208]  [] child_rip+0x0/0x12
[  724.360298]
[  724.360369]
[  724.360370] Code: 4c 8b 6e 08 41 8d 1c 14 76 39 89 d8 44 29 e0 3b 44
24 08 73
[  724.361260] RIP  [] :ext3:walk_page_buffers+0x34/0x90
[  724.361395]  RSP 

The system runs stably under light load. Heavy disk writes, here induced
by 200Mbit scp's onto the drive, cause the oops within a minute or two.
It's entirely reproducible and appears to give the same trace each time.

I'll have a go at digging up the root of this problem, but anyone with
more experience is welcome to pitch in!

-- 
Jay L. T. Cornwall, http://www.esuna.co.uk/~jay/
PhD Student
Imperial College London
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/