Re: 3.16.49 Oops, does not boot on two socket server
Hello, just want to give a follow up. I have tested this with 3.16.51 and the problem still exists. It seems the 3.16.x tree is no longer usable for two socket servers :-( Regards, Holger PS: here the panic with 3.16.51: smpboot: Total of 24 processors activated (95963.71 BogoMIPS) [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 init_overlap_sched_group+0x114/0x120() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.51-1.el6.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 880fe96c7da8 815432dc 16b3 880fe96c7de8 8104cc72 880fff803c00 880fe8d05650 881fe96ba3a8 880fe96af540 Call Trace: [] dump_stack+0x4e/0x6a [] warn_slowpath_common+0x82/0xb0 [] warn_slowpath_null+0x15/0x20 [] init_overlap_sched_group+0x114/0x120 [] build_overlap_sched_groups+0x134/0x1e0 [] build_sched_domains+0x159/0x330 [] sched_init_smp+0x65/0xf8 [] kernel_init_freeable+0xb2/0x12d [] ? rest_init+0x80/0x80 [] kernel_init+0x9/0xf0 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 ---[ end trace 207206398bdf8ddb ]--- BUG: unable to handle kernel paging request at 01024a7f IP: [] init_overlap_sched_group+0xae/0x120 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.51-1.el6.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 task: 880fe96d ti: 880fe96c4000 task.ti: 880fe96c4000 RIP: 0010:[] [] init_overlap_sched_group+0xae/0x120 RSP: :880fe96c7e08 EFLAGS: 00010246 RAX: 0100 RBX: 880fe8d05650 RCX: 0020 RDX: 00014a80 RSI: 0020 RDI: 0020 RBP: 880fe96c7e28 R08: 880fe96af558 R09: R10: 0002 R11: 0001 R12: 881fe96ba3a8 R13: 880fe96af540 R14: R15: 881fe96ba3a8 FS: () GS:880fffc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 01024a7f CR3: 01714000 CR4: 000407f0 Stack: 880fe8d05650 880fe96c7ea8 81079b04 0011 880fe96af540 cd68 Call Trace: [] build_overlap_sched_groups+0x134/0x1e0 [] build_sched_domains+0x159/0x330 [] sched_init_smp+0x65/0xf8 [] kernel_init_freeable+0xb2/0x12d [] ? rest_init+0x80/0x80 [] kernel_init+0x9/0xf0 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Code: 60 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff e8 31 f9 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 00 00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f RIP [] init_overlap_sched_group+0xae/0x120 RSP CR2: 01024a7f ---[ end trace 207206398bdf8ddc ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 On Wed, 18 Oct 2017, Holger Kiehl wrote: > Hello, > > just tried to boot 3.16.49 on a 2 socket server and it fails with the > following error: > >smpboot: Total of 24 processors activated (95818.36 BogoMIPS) >[ cut here ] >WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 > init_overlap_sched_group+0x114/0x120() >Modules linked in: >CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1 >Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 > 880bfd6d3da8 81542f1c > 16b3 880bfd6d3de8 8104cd72 880c0f803c00 > 880bfcc69650 8817fd695ca8 880bfd6e2300 >Call Trace: > [] dump_stack+0x4e/0x6a > [] warn_slowpath_common+0x82/0xb0 > [] warn_slowpath_null+0x15/0x20 > [] init_overlap_sched_group+0x114/0x120 > [] build_overlap_sched_groups+0x134/0x1e0 > [] build_sched_domains+0x159/0x330 > [] sched_init_smp+0x65/0xf8 > [] kernel_init_freeable+0xb2/0x12d > [] ? rest_init+0x80/0x80 > [] kernel_init+0x9/0xf0 > [] ret_from_fork+0x58/0x90 > [] ? rest_init+0x80/0x80 >---[ end trace a491a27c866dd06e ]--- >BUG: unable to handle kernel paging request at 010247bf >IP: [] init_overlap_sched_group+0xae/0x120 >PGD 0 >Oops: [#1] SMP >Modules linked in: >CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 > #1 >Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 >task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d >RIP: 0010:[] [] > init_overlap_sched_group+0xae/0x120 >RSP: :880bfd6d3e08 EFLAGS: 00010246 >RAX: 0100 RBX: 880bfcc69650 RCX:
3.16.49 Oops, does not boot on two socket server
Hello, just tried to boot 3.16.49 on a 2 socket server and it fails with the following error: smpboot: Total of 24 processors activated (95818.36 BogoMIPS) [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 init_overlap_sched_group+0x114/0x120() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 880bfd6d3da8 81542f1c 16b3 880bfd6d3de8 8104cd72 880c0f803c00 880bfcc69650 8817fd695ca8 880bfd6e2300 Call Trace: [] dump_stack+0x4e/0x6a [] warn_slowpath_common+0x82/0xb0 [] warn_slowpath_null+0x15/0x20 [] init_overlap_sched_group+0x114/0x120 [] build_overlap_sched_groups+0x134/0x1e0 [] build_sched_domains+0x159/0x330 [] sched_init_smp+0x65/0xf8 [] kernel_init_freeable+0xb2/0x12d [] ? rest_init+0x80/0x80 [] kernel_init+0x9/0xf0 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 ---[ end trace a491a27c866dd06e ]--- BUG: unable to handle kernel paging request at 010247bf IP: [] init_overlap_sched_group+0xae/0x120 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d RIP: 0010:[] [] init_overlap_sched_group+0xae/0x120 RSP: :880bfd6d3e08 EFLAGS: 00010246 RAX: 0100 RBX: 880bfcc69650 RCX: 0020 RDX: 000147c0 RSI: 0020 RDI: 0020 RBP: 880bfd6d3e28 R08: 880bfd6e2318 R09: R10: 0002 R11: 0001 R12: 8817fd695ca8 R13: 880bfd6e2300 R14: R15: 8817fd695ca8 FS: () GS:880c0fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 010247bf CR3: 001714000 CR4: 000407f0 Stack: 880bfcc69650 880bfd6d3ea8 81079974 0011 880bfd6e2300 cac8 Call Trace: [] build_overlap_sched_groups+0x134/0x1e0 [] build_sched_domains+0x159/0x330 [] sched_init_smp+0x65/0xf8 [] kernel_init_freeable+0xb2/0x12d [] ? rest_init+0x80/0x80 [] kernel_init+0x9/0xf0 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Code: 61 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff e8 51 fa 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 00 00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f RIP [] init_overlap_sched_group+0xae/0x120 RSP CR2: 010247bf ---[ end trace a491a27c866dd06f ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Rebooting in 5 seconds.. This happened on three different systems. On a similar system with just one CPU in a socket it boots fine. The last Kernel of this series I tried was 2.16.48 and that worked fine. Any idea what is wrong? In case it is useful I have attached my kernel config. Regards, Holger# # Automatically generated file; DO NOT EDIT. # Linux/x86 3.16.49 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_INTEL_TXT=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not se
Re: [PATCH] MD: make bio mergeable
On Thu, 28 Apr 2016, Shaohua Li wrote: > On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote: > > Hello, > > > > On Mon, 25 Apr 2016, Shaohua Li wrote: > > > > > blk_queue_split marks bio unmergeable, which makes sense for normal bio. > > > But if dispatching the bio to underlayer disk, the blk_queue_split > > > checks are invalid, hence it's possible the bio becomes mergeable. > > > > > > In the reported bug, this bug causes trim against raid0 performance slash > > > https://bugzilla.kernel.org/show_bug.cgi?id=117051 > > > > > This patch makes a huge difference. On a system with two Samsung 850 Pro > > in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec! > > > > However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's > > also setup as one big MD Raid0, the patch does not make any difference > > at all. fstrim takes more then 4 hours! > > Does the raid0 cross two partitions or two SSD? > Two SSD's. Where it works, for the two Samsung 850 Pro SATA SSD it was via partitions. > can you post blktrace data in the bugzilloa, I'll track the bug there. > I did the blktrace on the two md raid0 devices /dev/nvme[01]n1 for 2 minutes and attached them to the bug 117051 as a tar.bz2 file: https://bugzilla.kernel.org/show_bug.cgi?id=117051 Please just ask if I have forgotten anything. And many thanks for looking at this and all the good work! Regards, Holger
Re: [PATCH] MD: make bio mergeable
Hello, On Mon, 25 Apr 2016, Shaohua Li wrote: > blk_queue_split marks bio unmergeable, which makes sense for normal bio. > But if dispatching the bio to underlayer disk, the blk_queue_split > checks are invalid, hence it's possible the bio becomes mergeable. > > In the reported bug, this bug causes trim against raid0 performance slash > https://bugzilla.kernel.org/show_bug.cgi?id=117051 > This patch makes a huge difference. On a system with two Samsung 850 Pro in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec! However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's also setup as one big MD Raid0, the patch does not make any difference at all. fstrim takes more then 4 hours! Any idea what could be wrong? Regards, Holger > Reported-by: Park Ju Hyung > Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio) > Cc: sta...@vger.kernel.org (v4.3+) > Cc: Ming Lei > Cc: Jens Axboe > Cc: Neil Brown > Signed-off-by: Shaohua Li > --- > drivers/md/md.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 194580f..14d3b37 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, > struct bio *bio) >* go away inside make_request >*/ > sectors = bio_sectors(bio); > + /* bio could be mergeable after passing to underlayer */ > + bio->bi_rw &= ~REQ_NOMERGE; > mddev->pers->make_request(mddev, bio); > > cpu = part_stat_lock(); > -- > 2.8.0.rc2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard
On Thu, 21 May 2015, NeilBrown wrote: On Thu, 21 May 2015 06:44:27 + (UTC) Holger Kiehl wrote: On Thu, 21 May 2015, NeilBrown wrote: On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov wrote: On Wed, 20 May 2015 20:12:31 + (UTC) Holger Kiehl wrote: The kernel I was running when I discovered the problem was 4.0.2 from kernel.org. However, after reinstalling from DVD I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember the last numbers). So that kernel seems also effected, but I assume it contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram. My system seems to be now running stable for some days with kernel.org kernel 4.0.3 and with discard DISABLED. But I am still unsure what could be the real cause. It is a bug in the 4.0.2 kernel, fixed in 4.0.3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672 https://bbs.archlinux.org/viewtopic.php?id=197400 https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711 I suspect that is a different bug. I think this one is https://bugzilla.kernel.org/show_bug.cgi?id=98501 Should there not be a big fat warning going around telling users to disable discard on Raid 0 until this is fixed? This breaks the filesystem completely and I believe there is absolutly no way one can get back the data. Probably. Would you like to do that? Is this fixed in 4.0.4? And which kernels are effected? There could be many people running systems that have not noticed this and don't know in what dangerous situation they are when they delete data. The patch was only added to my tree today. I will send to Linus tomorrow so it should appear in the next -rc. Any -stable kernel released since mid-April probably has the bug. It was caused by commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd Once the fix gets into Linus' tree, it should get into subsequent -stable releases. The fix is here: http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f commit id should remain unchanged. I would like to confirm that with this patch and discard enabled, I no longer see any corruption. Many thanks for the quick fix! Regards, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: Software Raid 0 on SSD's and discard corrupts data
Hello, all users using a Software Raid 0 on SSD's with discard should disable discard, if they use any recent kernel since mid-April 2015. The bug was introduced by commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd and the fix is not yet in Linus tree. The fix can be found here: http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f Users should immediately remove the discard option from any mounted software Raid 0 filesystems. Any delete or modification of files can lead to random destruction on the filesystem. Use the remount option of the mount command to remove the discard option. Do not do it via editing /etc/fstab if your root filesystem is on a software raid 0. Regards, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard
On Thu, 21 May 2015, NeilBrown wrote: On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov wrote: On Wed, 20 May 2015 20:12:31 + (UTC) Holger Kiehl wrote: The kernel I was running when I discovered the problem was 4.0.2 from kernel.org. However, after reinstalling from DVD I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember the last numbers). So that kernel seems also effected, but I assume it contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram. My system seems to be now running stable for some days with kernel.org kernel 4.0.3 and with discard DISABLED. But I am still unsure what could be the real cause. It is a bug in the 4.0.2 kernel, fixed in 4.0.3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672 https://bbs.archlinux.org/viewtopic.php?id=197400 https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711 I suspect that is a different bug. I think this one is https://bugzilla.kernel.org/show_bug.cgi?id=98501 Should there not be a big fat warning going around telling users to disable discard on Raid 0 until this is fixed? This breaks the filesystem completely and I believe there is absolutly no way one can get back the data. Is this fixed in 4.0.4? And which kernels are effected? There could be many people running systems that have not noticed this and don't know in what dangerous situation they are when they delete data. Regards, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard
Hello, I had a terrible weekend recovering my home system. Always when files where deleted some data got corrupted. At first I did not notice it, but when I rebooted the system would not come up again, systemd crashed with SIGSEGV and that was it. Booting from an USB stick I saw that some glibc lib had a different size from that in the original RPM. So all I did reinstalled that lib from USB stick and everything was fine after rebooting from Raid 0. But I then wanted to make sure that no other files where corrupted so I checked and found more. So again I reinstalled those RPM's and rebooted. To my big surprise the system was again broken and failed to boot. I again tried to recover my system from USB stick, but this time did not manage to recover the system. So decided to reinstall the system completely from DVD. Everything looked good until that moment when I had activated the discard option in /etc/fstab. After doing some more work (adding and removing things) I rebooted and again the system failed to boot. Booting from the USB stick I saw that the /etc/fstab was all filled with NULL's. This gave me the clue that there must be some problem with discard (trim). My system is using a software raid 0 IMSM (intel 'fake' raid) on two Samsung SSD 840 pro. A window system on the same disks (that is why I am using IMSM raid) was not effected by this problem. I have checked the ram with memtest86 and everything is ok. The kernel I was running when I discovered the problem was 4.0.2 from kernel.org. However, after reinstalling from DVD I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember the last numbers). So that kernel seems also effected, but I assume it contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram. My system seems to be now running stable for some days with kernel.org kernel 4.0.3 and with discard DISABLED. But I am still unsure what could be the real cause. Regards, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
qlcnic very high TX values, as of 3.13.x
Hello, upgrading from 3.10.x to the next stable series 3.14.x I noticed that ifconfig reports very high TX values. Taking the qlcnic source from 3.15.5 and compile it under 3.14.12, the problem remains. Going backwards always just copying the qlcnic source from the older kernels to the 3.14.12 tree, I noticed that the 3.12.x kernel was the last version that does not generate those high TX values. So the problem started with the qlcnic driver in 3.13.x. However, comparing 3.13.x and 3.14.x the numbers go higher in 3.14.x much quicker. In 3.14.x I get TX values in Terabytes very quickly after boot. I once even got Petabyte values! Hardware is the following: HP ProLiant DL380 G7 2 x Intel Xeon X5690 (24 cores with hypertreading) 106 GByte Ram 1 x NC523SFP 10Gb 2-port Server Adapter Board Chip rev 0x54 (qlcnic) 1 x Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (ixgbe) The qlcnic and ixgbe cards are bonded together in fault-tolerance (active-backup) mode. And even when I switch to the Intel card, after I get crazy TX values on qlcnic card, the TX vaules on this card still go up at a very quick rate. This only stops when I reset the card (reload the module). Also, there is no differnce if I compile the driver in or use it as module. There are no strange messages in /var/log/messages or dmesg. Here the output with the 3.13.x driver in 3.14.12 when system boots: [ 18.229195] QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.3.52 [ 18.229415] qlcnic :1a:00.0: 2048KB memory map [ 18.854134] qlcnic :1a:00.0: Default minidump capture mask 0x1f [ 19.602491] qlcnic :1a:00.0: FW dump enabled [ 19.631257] qlcnic :1a:00.0: Supports FW dump capability [ 19.667072] qlcnic :1a:00.0: Driver v5.3.52, firmware v4.14.26 [ 19.704279] qlcnic :1a:00.0: Set 4 Tx rings [ 19.733001] qlcnic :1a:00.0: Set 4 SDS rings [ 19.898808] qlcnic: 2c:27:d7:50:04:48: NC523SFP 10Gb 2-port Server Adapter Board Chip rev 0x54 [ 19.949325] qlcnic :1a:00.0: irq 129 for MSI/MSI-X [ 19.949329] qlcnic :1a:00.0: irq 130 for MSI/MSI-X [ 19.949333] qlcnic :1a:00.0: irq 131 for MSI/MSI-X [ 19.949336] qlcnic :1a:00.0: irq 132 for MSI/MSI-X [ 19.949340] qlcnic :1a:00.0: irq 133 for MSI/MSI-X [ 19.949343] qlcnic :1a:00.0: irq 134 for MSI/MSI-X [ 19.949347] qlcnic :1a:00.0: irq 135 for MSI/MSI-X [ 19.949350] qlcnic :1a:00.0: irq 136 for MSI/MSI-X [ 19.949369] qlcnic :1a:00.0: using msi-x interrupts [ 19.982782] qlcnic :1a:00.0: Set 4 Tx queues [ 20.055099] qlcnic :1a:00.0: eth2: XGbE port initialized [ 20.090408] qlcnic :1a:00.1: 2048KB memory map [ 20.179836] qlcnic :1a:00.1: Default minidump capture mask 0x1f [ 20.217848] qlcnic :1a:00.1: FW dump enabled [ 20.246979] qlcnic :1a:00.1: Supports FW dump capability [ 20.282318] qlcnic :1a:00.1: Driver v5.3.52, firmware v4.14.26 [ 20.320238] qlcnic :1a:00.1: Set 4 Tx rings [ 20.350038] qlcnic :1a:00.1: Set 4 SDS rings [ 20.429714] qlcnic :1a:00.1: irq 137 for MSI/MSI-X [ 20.429718] qlcnic :1a:00.1: irq 138 for MSI/MSI-X [ 20.429722] qlcnic :1a:00.1: irq 139 for MSI/MSI-X [ 20.429726] qlcnic :1a:00.1: irq 140 for MSI/MSI-X [ 20.429729] qlcnic :1a:00.1: irq 141 for MSI/MSI-X [ 20.429732] qlcnic :1a:00.1: irq 142 for MSI/MSI-X [ 20.429736] qlcnic :1a:00.1: irq 143 for MSI/MSI-X [ 20.429739] qlcnic :1a:00.1: irq 144 for MSI/MSI-X [ 20.429757] qlcnic :1a:00.1: using msi-x interrupts [ 20.458895] qlcnic :1a:00.1: Set 4 Tx queues [ 20.486907] qlcnic :1a:00.1: eth3: XGbE port initialized My kernel config can be downloaded here: ftp://ftp.dwd.de/pub/afd/test/.config Please, just ask if I need to provide more details and please CC me, since I am not on the list. Thanks, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel panic with 3.10.33 and possible hpwdt watchdog
Hello, I use a plain kernel.org kernel 3.10.33 and when I do a HP ILO (proprietary embedded server management technology) reset of my Proliant 380p server, the system hangs. Unfortunatly I cannot do a serial trace, so copied everything by hand what I could read from console: [] ? vga_set_palette+0xd1/0x130 [] ? panic+0x18c/0x1c7 [] ? panic+0xf4/0x1c7 [] ? hpwdt_pretimeout+0xc5/0xd0 [hpwdt] [] ? nmi_handle+0x59/0x80 [] ? default_do_nmi+0x12f/0x2a0 [] ? do_nmi+0x88/0xd0 [] ? end_repeat_nmi+0x1e/0x2e [] ? intel_idle+0xb6/0x120 [] ? intel_idle+0xb6/0x120 [] ? intel_idle+0xb6/0x120 <> [] ? cpuidle_enter_state+0x3d/0xd0 [] ? cpuidle_idle_call+0xba/0x140 [] ? __tick_nohz_idle_enter+0x8d/0x120 [] ? arch_cpu_idle+0x9/0x30 [] ? cpu_idle_loop+0x92/0x160 [] ? cpu_startup_entry+0x6b/0x70 [] ? start_kernel+0x3e2/0x3ed [] ? repair_env_string+0x5e/0x5e [] ? x86_64_start_kernel+0x12a/0x130 ---[ end trace 2a7f5aee76758ec0 ]--- dmar: DRHD: handling fault status reg 2 dmar: DMAR:[DMA Read] Request device [01:00.2] fault addr e9000 DMAR:[fault reason 06] PTE Read access is not set If I remove the hpwdt driver and I then reset the HP ILO system, the system also hangs, but continuously at an interval of aprrox. 2 seconds writes the following to console: NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. Also, setting nmi_watchdog=0 does not change anything. This does not happen when I do take the default kernel of the disrtibution (Scientific Linux 6.5) 2.6.32-431.5.1.el6.x86_64. The bad thing is that when the hpwdt driver is loaded, the watchdog does not reset the system, ie. it hangs forever. And I cannot use Intel TCO WatchDog Timer Driver since it is disabled in bios. Please, can someone give me a hint where the error could be and what I can do so I can continue to use the kernel.org kernel. Many thanks in advance, Holger PS: Please CC me since I am not subscribed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in bug in isolate_migratepages_range
On Mon, 3 Feb 2014, David Rientjes wrote: On Mon, 3 Feb 2014, Vlastimil Babka wrote: It seems to come from balloon_page_movable() and its test page_count(page) == 1. Hmm, I think it might be because compound_head() == NULL here. Holger, this looks like a race condition when allocating a compound page, did you only see it once or is it actually reproducible? No, this only happened once. It is not reproducable, the system was running for four days without problems. And before this kernel, five years without any problems. Thanks, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in bug in isolate_migratepages_range
On Mon, 3 Feb 2014, Michal Hocko wrote: On Mon 03-02-14 14:29:22, Holger Kiehl wrote: I have attached it. Please, tell me if you do not get the attachment. I hoped it would help me to get a closer compiled code to yours but I am probably using too different gcc. I have an old gcc, it is 4.4.1-2. Anyway I've tried to check whether I can hook on something and it seems that this is a race with thp merge/split or something like that. [...] Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer dereference at 001c Jan 31 13:07:43 asterix kernel: IP: [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0 Jan 31 13:07:43 asterix kernel: Oops: [#1] SMP Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore usb_common [last unloaded: microcode] Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 3.12.9 #1 Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY RX300 S4 /D2519, BIOS 4.06 Rev. 1.04.2519 07/30/2008 Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 task.ti: 8807d30b2000 Jan 31 13:07:43 asterix kernel: RIP: 0010:[] [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928 EFLAGS: 00010286 Jan 31 13:07:43 asterix kernel: RAX: RBX: 0020ec09 RCX: 0002 Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 RDI: 006c Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 R09: 0001 Jan 31 13:07:43 asterix kernel: R10: R11: ea000733a000 R12: 8807d30b3a58 Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14: R15: 88083ffe1d80 Jan 31 13:07:43 asterix kernel: FS: 7f9d9e72f910() GS:88083fd4() knlGS: Jan 31 13:07:43 asterix kernel: CS: 0010 DS: ES: CR0: 8005003b Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 CR4: 000407e0 Jan 31 13:07:43 asterix kernel: Stack: Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 ea2e6af0 8807d30b3998 Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 8807d30b08c0 0020f000 Jan 31 13:07:43 asterix kernel: 083b 000a 8807d30b3a68 Jan 31 13:07:43 asterix kernel: Call Trace: Jan 31 13:07:43 asterix kernel: [] ? lru_add_drain_cpu+0x25/0x97 Jan 31 13:07:43 asterix kernel: [] compact_zone+0x2b5/0x319 Jan 31 13:07:43 asterix kernel: [] ? put_super+0x20/0x2c Jan 31 13:07:43 asterix kernel: [] compact_zone_order+0xad/0xc4 Jan 31 13:07:43 asterix kernel: [] try_to_compact_pages+0x91/0xe8 Jan 31 13:07:43 asterix kernel: [] ? page_alloc_cpu_notify+0x3e/0x3e Jan 31 13:07:43 asterix kernel: [] __alloc_pages_direct_compact+0xae/0x195 Jan 31 13:07:43 asterix kernel: [] __alloc_pages_nodemask+0x772/0x7b5 Jan 31 13:07:43 asterix kernel: [] alloc_pages_vma+0xd6/0x101 Jan 31 13:07:43 asterix kernel: [] do_huge_pmd_anonymous_page+0x199/0x2ee Jan 31 13:07:43 asterix kernel: [] handle_mm_fault+0x1b7/0xceb Jan 31 13:07:43 asterix kernel: [] ? __dequeue_entity+0x2e/0x33 Jan 31 13:07:43 asterix kernel: [] __do_page_fault+0x3bd/0x3e4 Jan 31 13:07:43 asterix kernel: [] ? mprotect_fixup+0x1c9/0x1fb Jan 31 13:07:43 asterix kernel: [] ? vm_mmap_pgoff+0x6d/0x8f Jan 31 13:07:43 asterix kernel: [] ? SyS_futex+0x103/0x13d Jan 31 13:07:43 asterix kernel: [] do_page_fault+0x9/0xb Jan 31 13:07:43 asterix kernel: [] page_fault+0x22/0x30 Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 <8b> 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2 Jan 31 13:07:43 asterix kernel: RIP [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: RSP Jan 31 13:07:43 asterix kernel: CR2: 001c Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]--- This seems to match: 17027: 49 8b 17mov(%r15),%rdx # page->flags 1702a: 4c 89 f8mov%r15,%rax 1702d: 80 e6 80and$0x80,%dh # PageTail test 17030: 74 04 je 17036 17032: 49 8b 47 30 mov0x30(%r15),%rax # page = page->first_page 17036: 8b 40 1cmov0x1c(%rax),%eax <<< page->_count 17039: ff c8 dec%eax Which seems to be inlined comp
Need help in bug in isolate_migratepages_range
Hello, today one of our system got a kernel bug message. It kept on running but more and more process begin to be stuck in D state (eg. a simple w command would never return) and I eventually had to reboot. Here the full message: Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer dereference at 001c Jan 31 13:07:43 asterix kernel: IP: [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0 Jan 31 13:07:43 asterix kernel: Oops: [#1] SMP Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore usb_common [last unloaded: microcode] Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 3.12.9 #1 Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY RX300 S4 /D2519, BIOS 4.06 Rev. 1.04.2519 07/30/2008 Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 task.ti: 8807d30b2000 Jan 31 13:07:43 asterix kernel: RIP: 0010:[] [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928 EFLAGS: 00010286 Jan 31 13:07:43 asterix kernel: RAX: RBX: 0020ec09 RCX: 0002 Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 RDI: 006c Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 R09: 0001 Jan 31 13:07:43 asterix kernel: R10: R11: ea000733a000 R12: 8807d30b3a58 Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14: R15: 88083ffe1d80 Jan 31 13:07:43 asterix kernel: FS: 7f9d9e72f910() GS:88083fd4() knlGS: Jan 31 13:07:43 asterix kernel: CS: 0010 DS: ES: CR0: 8005003b Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 CR4: 000407e0 Jan 31 13:07:43 asterix kernel: Stack: Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 ea2e6af0 8807d30b3998 Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 8807d30b08c0 0020f000 Jan 31 13:07:43 asterix kernel: 083b 000a 8807d30b3a68 Jan 31 13:07:43 asterix kernel: Call Trace: Jan 31 13:07:43 asterix kernel: [] ? lru_add_drain_cpu+0x25/0x97 Jan 31 13:07:43 asterix kernel: [] compact_zone+0x2b5/0x319 Jan 31 13:07:43 asterix kernel: [] ? put_super+0x20/0x2c Jan 31 13:07:43 asterix kernel: [] compact_zone_order+0xad/0xc4 Jan 31 13:07:43 asterix kernel: [] try_to_compact_pages+0x91/0xe8 Jan 31 13:07:43 asterix kernel: [] ? page_alloc_cpu_notify+0x3e/0x3e Jan 31 13:07:43 asterix kernel: [] __alloc_pages_direct_compact+0xae/0x195 Jan 31 13:07:43 asterix kernel: [] __alloc_pages_nodemask+0x772/0x7b5 Jan 31 13:07:43 asterix kernel: [] alloc_pages_vma+0xd6/0x101 Jan 31 13:07:43 asterix kernel: [] do_huge_pmd_anonymous_page+0x199/0x2ee Jan 31 13:07:43 asterix kernel: [] handle_mm_fault+0x1b7/0xceb Jan 31 13:07:43 asterix kernel: [] ? __dequeue_entity+0x2e/0x33 Jan 31 13:07:43 asterix kernel: [] __do_page_fault+0x3bd/0x3e4 Jan 31 13:07:43 asterix kernel: [] ? mprotect_fixup+0x1c9/0x1fb Jan 31 13:07:43 asterix kernel: [] ? vm_mmap_pgoff+0x6d/0x8f Jan 31 13:07:43 asterix kernel: [] ? SyS_futex+0x103/0x13d Jan 31 13:07:43 asterix kernel: [] do_page_fault+0x9/0xb Jan 31 13:07:43 asterix kernel: [] page_fault+0x22/0x30 Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 <8b> 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2 Jan 31 13:07:43 asterix kernel: RIP [] isolate_migratepages_range+0x32d/0x653 Jan 31 13:07:43 asterix kernel: RSP Jan 31 13:07:43 asterix kernel: CR2: 001c Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]--- Kernel is a plain kernel.org kernel 3.12.9 and it uses drbd to replicate data to another host. Any idea what the cause of this bug is? Could it be hardware? The system has been running now for five years without any problems. Please CC me since I am not on the list. Many thanks in advance. Regards, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Problems with ixgbe driver
Hello, first, thank you for the quick help! On Fri, 14 Jun 2013, Tantilov, Emil S wrote: -Original Message- From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On Behalf Of Holger Kiehl Sent: Friday, June 14, 2013 4:50 AM To: e1000-de...@lists.sf.net Cc: linux-kernel; net...@vger.kernel.org Subject: Problems with ixgbe driver Hello, I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with a total of 12 cores. Hyperthreading is enabled so there are 24 cores. The problem I have is that when other systems send large amount of data the network with the intel ixgbe driver gets very slow. Ping times go up from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2 minutes. What is strange is that heatbeat is configured on the system with a serial connection to another node and kernel always reports If the network slows down so much there should be some indication in dmesg. Like Tx hangs perhaps. Can you provide the output of dmesg and ethtool -S from the offending interface after the issue occurs? No, there is absolute no indication in dmesg or /var/log/messages. But here the ethtool output when ping times go up: root@helena:~# ethtool -S eth6 NIC statistics: rx_packets: 4410779 tx_packets: 8902514 rx_bytes: 2014041824 tx_bytes: 13199913202 rx_errors: 0 tx_errors: 0 rx_dropped: 0 tx_dropped: 0 multicast: 4245 collisions: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 0 rx_missed_errors: 28143 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 rx_pkts_nic: 2401276937 tx_pkts_nic: 3868619482 rx_bytes_nic: 868282794731 tx_bytes_nic: 5743382228649 lsc_int: 4 tx_busy: 0 non_eop_descs: 743957 broadcast: 1745556 rx_no_buffer_count: 0 tx_timeout_count: 0 tx_restart_queue: 425 rx_long_length_errors: 0 rx_short_length_errors: 0 tx_flow_control_xon: 171 rx_flow_control_xon: 0 tx_flow_control_xoff: 277 rx_flow_control_xoff: 0 rx_csum_offload_errors: 0 alloc_rx_page_failed: 0 alloc_rx_buff_failed: 0 lro_aggregated: 0 lro_flushed: 0 rx_no_dma_resources: 0 hw_rsc_aggregated: 1153374 hw_rsc_flushed: 129169 fdir_match: 2424508153 fdir_miss: 1706029 fdir_overflow: 33 os2bmc_rx_by_bmc: 0 os2bmc_tx_by_bmc: 0 os2bmc_tx_by_host: 0 os2bmc_rx_by_host: 0 tx_queue_0_packets: 470182 tx_queue_0_bytes: 690123121 tx_queue_1_packets: 797784 tx_queue_1_bytes: 1203968369 tx_queue_2_packets: 648692 tx_queue_2_bytes: 950171718 tx_queue_3_packets: 647434 tx_queue_3_bytes: 948647518 tx_queue_4_packets: 263216 tx_queue_4_bytes: 394806409 tx_queue_5_packets: 426786 tx_queue_5_bytes: 629387628 tx_queue_6_packets: 253708 tx_queue_6_bytes: 371774276 tx_queue_7_packets: 544634 tx_queue_7_bytes: 812223169 tx_queue_8_packets: 279056 tx_queue_8_bytes: 407792510 tx_queue_9_packets: 735792 tx_queue_9_bytes: 1092693961 tx_queue_10_packets: 393576 tx_queue_10_bytes: 583283986 tx_queue_11_packets: 712565 tx_queue_11_bytes: 1037740789 tx_queue_12_packets: 264445 tx_queue_12_bytes: 386010613 tx_queue_13_packets: 246828 tx_queue_13_bytes: 370387352 tx_queue_14_packets: 191789 tx_queue_14_bytes: 281160607 tx_queue_15_packets: 384581 tx_queue_15_bytes: 579890782 tx_queue_16_packets: 175119 tx_queue_16_bytes: 261312970 tx_queue_17_packets: 151219 tx_queue_17_bytes: 220259675 tx_queue_18_packets: 467746 tx_queue_18_bytes: 707472612 tx_queue_19_packets: 30642 tx_queue_19_bytes: 44896997 tx_queue_20_packets: 157957 tx_queue_20_bytes: 238772784 tx_queue_21_packets: 287819 tx_queue_21_bytes: 434965075 tx_queue_22_packets: 269298 tx_queue_22_bytes: 407637986 tx_queue_23_packets: 102344 tx_queue_23_bytes: 145542751 rx_queue_0_packets: 219438 rx_queue_0_bytes: 273936020 rx_queue_1_packets: 398269 rx_queue_1_bytes: 52080243 rx_queue_2_packets: 285870 rx_queue_2_bytes: 102299543 rx_queue_3_packets: 347238 rx_queue_3_bytes: 145830086 rx_queue_4_packets: 118448 rx_queue_4_bytes: 17515218 rx_queue_5_packets: 228029 rx_queue_5_bytes: 114142681 rx_queue_6_packets: 94285 rx_queue_6_bytes: 107618165 rx_queue_7_packets: 289615 rx_queue_7_bytes: 168428647
Problems with ixgbe driver
Hello, I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with a total of 12 cores. Hyperthreading is enabled so there are 24 cores. The problem I have is that when other systems send large amount of data the network with the intel ixgbe driver gets very slow. Ping times go up from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2 minutes. What is strange is that heatbeat is configured on the system with a serial connection to another node and kernel always reports ttyS0: 4 input overrun(s) when lot of data is send and the ping time goes up. On the network there are three vlan's configured. The network is bonded (active-backup) together with another HP NC523SFP 10Gb 2-port Server Adapter. When I switch the network to this card the problem goes away. Also the ttyS0 input overruns disappear. Note also both network cards are connected to the same switch. The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice it because traffic always went over the HP NC523SFP qlcnic card. In search for a solution to the problem I found a newer ixgbe driver 3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same problem. However when I load the module as follows: modprobe ixgbe RSS=8,8 the problem goes away. The kernel.org ixgbe driver does not offer this option. Why? It seems that both drivers have problems on systems with 24 cpu's. But I cannot believe that I am the only one who noticed this, since ixgbe is widely used. It would really be nice if one could set the RSS=8,8 option for kernel.org ixgbe driver too. Or if someone could tell me where I can force the driver to Receive Side Scaling to 8 even if it means editing the source code. Below I have added some additional information. Please CC me since I am not subscribed to any of these lists. And please do not hesitate to ask if more information is needed. Many thanks in advance. Regards, Holger Loading ixgbe module 3.15.1 without any options: 2013-06-14T10:01:15.001506+00:00 helena kernel: [74474.075411] Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1 2013-06-14T10:01:15.033866+00:00 helena kernel: [74474.116422] Copyright (c) 1999-2013 Intel Corporation. 2013-06-14T10:01:15.204956+00:00 helena kernel: [74474.319440] ixgbe :10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80 2013-06-14T10:01:15.317447+00:00 helena kernel: [74474.362568] ixgbe :10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006 2013-06-14T10:01:15.317465+00:00 helena kernel: [74474.394068] bonding: bond0: Adding slave eth6. 2013-06-14T10:01:15.317468+00:00 helena kernel: [74474.431805] ixgbe :10:00.0 eth6: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC 2013-06-14T10:01:15.519117+00:00 helena kernel: [74474.599206] 8021q: adding VLAN 0 to HW filter on device eth6 2013-06-14T10:01:15.592853+00:00 helena kernel: [74474.633370] bonding: bond0: enslaving eth6 as a backup interface with a down link. 2013-06-14T10:01:15.592864+00:00 helena kernel: [74474.666823] ixgbe :10:00.0 eth6: detected SFP+: 5 2013-06-14T10:01:15.634509+00:00 helena kernel: [74474.707900] ixgbe :10:00.0 eth6: Intel(R) 10 Gigabit Network Connection 2013-06-14T10:01:15.888030+00:00 helena kernel: [74474.917771] ixgbe :10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81 2013-06-14T10:01:15.888032+00:00 helena kernel: [74474.918516] ixgbe :10:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX 2013-06-14T10:01:15.981283+00:00 helena kernel: [74475.001538] ixgbe :10:00.1 eth7: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006 2013-06-14T10:01:15.981293+00:00 helena kernel: [74475.006351] bonding: bond0: link status definitely up for interface eth6, 1 Mbps full duplex. 2013-06-14T10:01:16.025063+00:00 helena kernel: [74475.094633] ixgbe :10:00.1 eth7: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC 2013-06-14T10:01:16.067357+00:00 helena kernel: [74475.138402] ixgbe :10:00.1 eth7: Intel(R) 10 Gigabit Network Connection Loading ixgbe module 3.15.1 with RSS=8,8: 2013-06-14T10:04:24.790464+00:00 helena kernel: [74663.558702] Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1 2013-06-14T10:04:24.790484+00:00 helena kernel: [74663.601435] Copyright (c) 1999-2013 Intel Corporation. 2013-06-14T10:04:24.853174+00:00 helena kernel: [74663.630652] ixgbe: Receive-Side Scaling (RSS) set to 8 2013-06-14T10:04:25.043310+00:00 helena kernel: [74663.813984] ixgbe :10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80 2013-06-14T10:04:25.113547+00:00 helena kernel: [74663.853937] ixgbe :10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006 2013-06-14T10:04:25.113561+00:00 helena kernel: [74663.882910] bonding: bond0: Adding slave eth6. 2013-06-14T10:04:25.159260+00:00 helena kernel: [74663.924060] ixgbe :10:
Re: Enabling hardlink restrictions to the Linux VFS in 3.6 by default
Hello Kees, first, many thanks for trying to help! On Thu, 25 Oct 2012, Kees Cook wrote: Hi Holger, On Thu, Oct 25, 2012 at 12:13:40PM +, Holger Kiehl wrote: as of linux 3.6 hardlink restrictions to the Linux VFS have been enabled by default. This breaks the application AFD [1] of which I am the author. Sorry this created a problem for you! Internally it uses hardlink to distribute files. The reason for hardlinks is that AFD can distribute one file to many destinations and for each distributing process it creates a directory with hardlinks to the original file. That way AFD itself never needs to copy the content of a file. Another nice feature about hardlinks was that there is no need to have any logic in the code needing AFD to know where the original file was, each distributing process could delete its hardlink and the last one would delete the real file. This way AFD could distribute files at rates of more then 2 files per second (in benchmarks). This has worked from the first linux kernel up to 3.5.7 and with solaris, hpux, aix, ftx, irix. As of 3.6 this does not work for files where AFD does not have write permissions. It was always sufficient to just have read permission on a file it wants to distribute. Just to clarify, not even read access was needed for hardlinks: $ whoami kees $ ls -l /etc/shadow -r--r- 1 root shadow 3112 Oct 22 17:02 /etc/shadow $ ln /etc/shadow /tmp/ohai $ ls -l /tmp/ohai -r--r- 2 root shadow 3112 Oct 22 17:02 ohai Correct, but when AFD wants to distribute the file via for example FTP it must have read access on the file. Because it needs to read the file when it wants to send it on a socket. You mention "the last one would delete the real file". That would have required AFD to have write permission to the directory where the original file existed? Maybe there is something in your architecture that could take advantage of that? Directory group-write set-gid? I haven't taken a look at AFD's code. Right, it must have write permission on the directory that is monitored by AFD. When it detects a file it moves (rename()) it to an internal directory where AFD works. So this step still works. But from there it creates hardlinks for each distributing job. But this no longer works if AFD does not have write access on the file itself. So even if set-gid is set, this would still not work if the file does not have write permission for the group. The fix for the "at" daemon [2] mentioned in the commitdiff [3] cannot be used for AFD since it is not run with root privileges. Is there any other way I can "fix" my application? I currently can see no other way then doing it via: echo 0 > /proc/sys/fs/protected_hardlinks You said you have read access to these files, so perhaps you can make a copy when you have read but not write, and then all the subsequent duplication would be able to hardlink? This is exactly what AFD tries to avoid. AFD is used on systems where it distributes Terabytes of data daily and if it would need to copy the file first imagine the strain it imposes on those servers. If you wanted to turn off the sysctl, you could have AFD could ship files in /etc/sysctl.d/ (or your distro equivalent) to turn it off. Yes, that could be done. However, I do not want as a maintainer of one software package by default disable or enable anything in the kernel. I do not think the system administrators would like this. I'm sure there are plenty of options available. Sorry, I cannot see them. But please if you or others have more ideas I am certainly open to change AFD if it can be done efficiently. Why is such a fundamentally change to the linux kernel activated by default? Based on about two years of testing in Ubuntu, the number of problems was vanishingly small, so the security benefit is seen to outweigh the downside. Ubuntu is known to be very user friendly and mostly used by users on their laptops/pc's and not so common in the server environment such as Redhat, SLES, etc. So I question the statement "vanishingly small", when you enable it in those environments by default. And I think there is a real benefit in that one can do hardlinks on a file that one does not own, which I think was not seen by those that disable this feature now by default. Would it not be better if it is the other way around, that the system administrator or distributions enable this? Virtually all distributions would have turned this on by default, so it seemed better to many people to just make it the default in the kernel. Only unusual corner-cases would need it disabled. So you too would say not all distributions would enable it by default. Would it then not be better for them to first try this and see if the number of problems is really "vanishingly small". And then if all distributions enable this by default one can do it in the kernel by default as well. Has it not always
Enabling hardlink restrictions to the Linux VFS in 3.6 by default
Hello, as of linux 3.6 hardlink restrictions to the Linux VFS have been enabled by default. This breaks the application AFD [1] of which I am the author. Internally it uses hardlink to distribute files. The reason for hardlinks is that AFD can distribute one file to many destinations and for each distributing process it creates a directory with hardlinks to the original file. That way AFD itself never needs to copy the content of a file. Another nice feature about hardlinks was that there is no need to have any logic in the code needing AFD to know where the original file was, each distributing process could delete its hardlink and the last one would delete the real file. This way AFD could distribute files at rates of more then 2 files per second (in benchmarks). This has worked from the first linux kernel up to 3.5.7 and with solaris, hpux, aix, ftx, irix. As of 3.6 this does not work for files where AFD does not have write permissions. It was always sufficient to just have read permission on a file it wants to distribute. The fix for the "at" daemon [2] mentioned in the commitdiff [3] cannot be used for AFD since it is not run with root privileges. Is there any other way I can "fix" my application? I currently can see no other way then doing it via: echo 0 > /proc/sys/fs/protected_hardlinks Why is such a fundamentally change to the linux kernel activated by default? Would it not be better if it is the other way around, that the system administrator or distributions enable this? Regards, Holger PS: Please CC me as I am not on the list. [1] http://www.dwd.de/AFD [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 [3] https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=800179c9b8a1e796e441674776d11cd4c05d61d7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
What happened to TRIM support for raid linear/0/1/10?
Hello, I have been using the patches posted by Shaohua Li on 16th March 2012: http://lkml.indiana.edu/hypermail/linux/kernel/1203.2/00048.html for several month on a very busy file server (serving 9 million files with 5.3 TiB daily) without any problems. Is there any chance that these patches will go into the official kernel? Or what is the reason that these patches are no applied? I have attached the patch set in one big patch for 3.5. Please do not use it since I am not sure if it is correct. Shaohua could you please take a look if it is correct and maybe post a new one? Personally, I would think that TRIM support MD would be a very good thing. Regards, Holgerdiff -u --recursive --new-file linux-3.5.orig/drivers/md/linear.c linux-3.5/drivers/md/linear.c --- linux-3.5.orig/drivers/md/linear.c 2012-07-21 20:58:29.0 + +++ linux-3.5/drivers/md/linear.c 2012-07-27 06:53:39.507121434 + @@ -138,6 +138,7 @@ struct linear_conf *conf; struct md_rdev *rdev; int i, cnt; + bool discard_supported = false; conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info), GFP_KERNEL); @@ -171,6 +172,8 @@ conf->array_sectors += rdev->sectors; cnt++; + if (blk_queue_discard(bdev_get_queue(rdev->bdev))) + discard_supported = true; } if (cnt != raid_disks) { printk(KERN_ERR "md/linear:%s: not enough drives present. Aborting!\n", @@ -178,6 +181,11 @@ goto out; } + if (!discard_supported) + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue); + else + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue); + /* * Here we calculate the device offsets. */ @@ -326,6 +334,14 @@ bio->bi_sector = bio->bi_sector - start_sector + tmp_dev->rdev->data_offset; rcu_read_unlock(); + + if (unlikely((bio->bi_rw & REQ_DISCARD) && + !blk_queue_discard(bdev_get_queue(bio->bi_bdev { + /* Just ignore it */ + bio_endio(bio, 0); + return; + } + generic_make_request(bio); } diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid0.c linux-3.5/drivers/md/raid0.c --- linux-3.5.orig/drivers/md/raid0.c 2012-07-21 20:58:29.0 + +++ linux-3.5/drivers/md/raid0.c2012-07-27 06:53:39.507121434 + @@ -88,6 +88,7 @@ char b[BDEVNAME_SIZE]; char b2[BDEVNAME_SIZE]; struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL); + bool discard_supported = false; if (!conf) return -ENOMEM; @@ -195,6 +196,9 @@ if (!smallest || (rdev1->sectors < smallest->sectors)) smallest = rdev1; cnt++; + + if (blk_queue_discard(bdev_get_queue(rdev1->bdev))) + discard_supported = true; } if (cnt != mddev->raid_disks) { printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - " @@ -272,6 +276,11 @@ blk_queue_io_opt(mddev->queue, (mddev->chunk_sectors << 9) * mddev->raid_disks); + if (!discard_supported) + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue); + else + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue); + pr_debug("md/raid0:%s: done.\n", mdname(mddev)); *private_conf = conf; @@ -422,6 +431,7 @@ if (md_check_no_bitmap(mddev)) return -EINVAL; blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors); + blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors); /* if private is not null, we are here after takeover */ if (mddev->private == NULL) { @@ -509,7 +519,7 @@ sector_t sector = bio->bi_sector; struct bio_pair *bp; /* Sanity check -- queue functions should prevent this happening */ - if (bio->bi_vcnt != 1 || + if ((bio->bi_vcnt != 1 && bio->bi_vcnt != 0) || bio->bi_idx != 0) goto bad_map; /* This is a one page bio that upper layers @@ -535,6 +545,13 @@ bio->bi_sector = sector_offset + zone->dev_start + tmp_dev->data_offset; + if (unlikely((bio->bi_rw & REQ_DISCARD) && + !blk_queue_discard(bdev_get_queue(bio->bi_bdev { + /* Just ignore it */ + bio_endio(bio, 0); + return; + } + generic_make_request(bio); return; diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid10.c linux-3.5/drivers/md/raid10.c --- linux-3.5.orig/drivers/md/raid10.c 2012-07-21 20:58:29.0 + +++ linux-3.5/drivers/md/raid10.c 2012-07-27
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Holger Kiehl wrote: On Thu, 1 Sep 2005, Nick Piggin wrote: Holger Kiehl wrote: meminfo.dump: MemTotal: 8124172 kB MemFree: 23564 kB Buffers: 7825944 kB Cached: 19216 kB SwapCached: 0 kB Active: 25708 kB Inactive: 7835548 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 8124172 kB LowFree: 23564 kB SwapTotal:15631160 kB SwapFree: 15631160 kB Dirty: 3145604 kB Hmm OK, dirty memory is pinned pretty much exactly on dirty_ratio so maybe I've just led you on a goose chase. You could echo 5 > /proc/sys/vm/dirty_background_ratio echo 10 > /proc/sys/vm/dirty_ratio To further reduce dirty memory in the system, however this is a long shot, so please continue your interaction with the other people in the thread first. Yes, this does make a difference, here the results of running dd if=/dev/full of=/dev/sd?1 bs=4M count=4883 on 8 disks at the same time: 34.273340 33.938829 33.598469 32.970575 32.841351 32.723988 31.559880 29.778112 That's 32.710568 MB/s on average per disk with your change and without it it was 24.958557 MB/s on average per disk. I will do more tests tomorrow. Just rechecked those numbers. Did a fresh boot and run the test several times. With defaults (dirty_background_ratio=10, dirty_ratio=40) I get for the dd write tests an average of 24.559491 MB/s (8 disks in parallel) per disk. With the suggested values (dirty_background_ratio=5, dirty_ratio=10) 32.390659 MB/s per disk. I then did a SW raid0 over all disks with the following command: mdadm -C /dev/md3 -l0 -n8 /dev/sd[cdefghij]1 (dirty_background_ratio=10, dirty_ratio=40) 223.955995 MB/s (dirty_background_ratio=5, dirty_ratio=10) 234.318936 MB/s So the differnece is not so big anymore. Something else I notice while doing the dd over 8 disks is the following (top just before they are finished): top - 08:39:11 up 2:03, 2 users, load average: 23.01, 21.48, 15.64 Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 17.7% sy, 0.0% ni, 0.0% id, 78.9% wa, 0.2% hi, 3.1% si Mem: 8124184k total, 8093068k used,31116k free, 7831348k buffers Swap: 15631160k total,13352k used, 15617808k free, 5524k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3423 root 18 0 55204 460 392 R 12.0 0.0 1:15.55 dd 3421 root 18 0 55204 464 392 D 11.3 0.0 1:17.36 dd 3418 root 18 0 55204 464 392 D 10.3 0.0 1:10.92 dd 3416 root 18 0 55200 464 392 D 10.0 0.0 1:09.20 dd 3420 root 18 0 55204 464 392 D 10.0 0.0 1:10.49 dd 3422 root 18 0 55200 460 392 D 9.3 0.0 1:13.58 dd 3417 root 18 0 55204 460 392 D 7.6 0.0 1:13.11 dd 158 root 15 0 000 D 1.3 0.0 1:12.61 kswapd3 159 root 15 0 000 D 1.3 0.0 1:08.75 kswapd2 160 root 15 0 000 D 1.0 0.0 1:07.11 kswapd1 3419 root 18 0 51096 552 476 D 1.0 0.0 1:17.15 dd 161 root 15 0 000 D 0.7 0.0 0:54.46 kswapd0 1 root 16 0 4876 372 332 S 0.0 0.0 0:01.15 init 2 root RT 0 000 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 000 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 000 S 0.0 0.0 0:00.00 migration/1 5 root 34 19 000 S 0.0 0.0 0:00.00 ksoftirqd/1 6 root RT 0 000 S 0.0 0.0 0:00.00 migration/2 7 root 34 19 000 S 0.0 0.0 0:00.00 ksoftirqd/2 8 root RT 0 000 S 0.0 0.0 0:00.00 migration/3 9 root 34 19 000 S 0.0 0.0 0:00.00 ksoftirqd/3 A loadaverage of 23 for 8 dd's seems a bit high. Also why is kswapd working so hard? Is that correct. Please just tell me if there is anything else I can test or dumps that could be useful. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Thu, 1 Sep 2005, Nick Piggin wrote: Holger Kiehl wrote: meminfo.dump: MemTotal: 8124172 kB MemFree: 23564 kB Buffers: 7825944 kB Cached: 19216 kB SwapCached: 0 kB Active: 25708 kB Inactive: 7835548 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 8124172 kB LowFree: 23564 kB SwapTotal:15631160 kB SwapFree: 15631160 kB Dirty: 3145604 kB Hmm OK, dirty memory is pinned pretty much exactly on dirty_ratio so maybe I've just led you on a goose chase. You could echo 5 > /proc/sys/vm/dirty_background_ratio echo 10 > /proc/sys/vm/dirty_ratio To further reduce dirty memory in the system, however this is a long shot, so please continue your interaction with the other people in the thread first. Yes, this does make a difference, here the results of running dd if=/dev/full of=/dev/sd?1 bs=4M count=4883 on 8 disks at the same time: 34.273340 33.938829 33.598469 32.970575 32.841351 32.723988 31.559880 29.778112 That's 32.710568 MB/s on average per disk with your change and without it it was 24.958557 MB/s on average per disk. I will do more tests tomorrow. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Dr. David Alan Gilbert wrote: * Holger Kiehl ([EMAIL PROTECTED]) wrote: On Wed, 31 Aug 2005, Jens Axboe wrote: Full vmstat session can be found under: Have you got iostat? iostat -x 10 might be interesting to see for a period while it is going. The following is the result from all 8 disks at the same time with the command dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 There is however one difference, here I had set /sys/block/sd?/queue/nr_requests to 4096. avg-cpu: %user %nice%sys %iowait %idle 0.100.00 21.85 58.55 19.50 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.300.002.40 0.00 1.20 8.00 0.001.00 1.00 0.03 sdb 0.70 0.00 0.10 0.306.402.40 3.20 1.2022.00 0.004.25 4.25 0.17 sdc8276.90 0.00 267.10 0.00 68352.000.00 34176.00 0.00 255.90 1.957.29 3.74 100.02 sdd9098.50 0.00 293.50 0.00 75136.000.00 37568.00 0.00 256.00 1.936.59 3.41 100.03 sde10428.40 0.00 336.40 0.00 86118.400.00 43059.20 0.00 256.00 1.925.71 2.97 100.02 sdf11314.90 0.00 365.10 0.00 93440.000.00 46720.00 0.00 255.93 1.925.26 2.74 99.98 sdg7973.20 0.00 257.20 0.00 65843.200.00 32921.60 0.00 256.00 1.947.53 3.89 100.01 sdh9436.30 0.00 304.70 0.00 77928.000.00 38964.00 0.00 255.75 1.936.35 3.28 100.01 sdi10604.80 0.00 342.40 0.00 87577.600.00 43788.80 0.00 255.78 1.925.62 2.92 100.02 sdj10914.30 0.00 352.20 0.00 90132.800.00 45066.40 0.00 255.91 1.915.43 2.84 100.00 md0 0.00 0.00 0.00 0.100.000.80 0.00 0.40 8.00 0.000.00 0.00 0.00 md2 0.00 0.00 0.80 0.006.400.00 3.20 0.00 8.00 0.000.00 0.00 0.00 md1 0.00 0.00 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice%sys %iowait %idle 0.070.00 24.49 66.818.62 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.40 0.00 1.000.00 11.20 0.00 5.6011.20 0.001.30 0.50 0.05 sdb 0.00 0.40 0.00 1.000.00 11.20 0.00 5.6011.20 0.001.50 0.70 0.07 sdc8161.90 0.00 263.70 0.00 67404.800.00 33702.40 0.00 255.61 1.957.38 3.79 100.02 sdd9157.30 0.00 295.50 0.00 75622.400.00 37811.20 0.00 255.91 1.936.53 3.38 100.00 sde10505.60 0.00 339.20 0.00 86758.400.00 43379.20 0.00 255.77 1.935.68 2.95 99.99 sdf11212.50 0.00 361.90 0.00 92595.200.00 46297.60 0.00 255.86 1.915.28 2.76 100.00 sdg7988.40 0.00 258.00 0.00 65971.200.00 32985.60 0.00 255.70 1.937.49 3.88 99.98 sdh9436.20 0.00 304.40 0.00 77924.800.00 38962.40 0.00 255.99 1.926.32 3.28 99.99 sdi10406.10 0.00 336.30 0.00 85939.200.00 42969.60 0.00 255.54 1.925.70 2.97 100.00 sdj11027.00 0.00 356.00 0.00 91064.000.00 45532.00 0.00 255.80 1.925.40 2.81 99.96 md0 0.00 0.00 0.00 1.000.008.00 0.00 4.00 8.00 0.000.00 0.00 0.00 md2 0.00 0.00 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 md1 0.00 0.00 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice%sys %iowait %idle 0.080.00 22.23 60.44 17.25 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.300.002.40 0.00 1.20 8.00 0.001.00 1.00 0.03 sdb 0.00 0.00 0.00 0.300.002.40 0.00 1.20 8.00 0.000.67 0.67 0.02 sdc8204.50 0.00 264.76 0.00 67754.150.00 33877.08 0.00 255.90 1.957.38 3.78 100.12 sdd9166.47 0.00 295.90 0.00 75698.100.00 37849.05 0.00 255.83 1.946.55 3.38 100.12 sde10534.93 0.00 339.94 0.00 86999.000.00 43499.50 0.00 255.92 1.935.67 2.95 100.12 sdf11282.68 0.00 364.16 0.00 93174.770.00 46587.39 0.00 255.86 1.925.28 2.75 100.10 sdg8114.61 0.00 261.76 0.00 67011.010.00 33505.51 0.00 256.00 1.957.44 3.82 100.11 sdh9380.68 0.00 302.60 0.00 77466.270.00 38733.13 0.00 256.00 1.936.38
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Holger Kiehl wrote: # ./oread /dev/sdX and it will read 128k chunks direct from that device. Run on the same drives as above, reply with the vmstat info again. Using kernel 2.6.12.5 again, here the results: [snip] Ok, reads as expected, like the buffered io but using less system time. And you are still 1/3 off the target data rate, hmmm... With the reads, how does the aggregate bandwidth look when you add 'clients'? Same as with writes, gradually decreasing per-device throughput? I performed the following tests with this command: dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 Single disk tests: /dev/sdc1 74.954715 MB/s /dev/sdg1 74.973417 MB/s Following disks in parallel: 2 disks on same channel /dev/sdc1 75.034191 MB/s /dev/sdd1 74.984643 MB/s 3 disks on same channel /dev/sdc1 75.027850 MB/s /dev/sdd1 74.976583 MB/s /dev/sde1 75.278276 MB/s 4 disks on same channel /dev/sdc1 58.343166 MB/s /dev/sdd1 62.993059 MB/s /dev/sde1 66.940569 MB/s /dev/sdd1 70.986072 MB/s 2 disks on different channels /dev/sdc1 74.954715 MB/s /dev/sdg1 74.973417 MB/s 4 disks on different channels /dev/sdc1 74.959030 MB/s /dev/sdd1 74.877703 MB/s /dev/sdg1 75.009697 MB/s /dev/sdh1 75.028138 MB/s 6 disks on different channels /dev/sdc1 49.640743 MB/s /dev/sdd1 55.935419 MB/s /dev/sde1 58.795241 MB/s /dev/sdg1 50.280864 MB/s /dev/sdh1 54.210705 MB/s /dev/sdi1 59.413176 MB/s So this looks different from writting, only as of four disks does the performance begin to drop. I just noticed, did you want me to do these test with the oread program? Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Holger Kiehl wrote: On Wed, 31 Aug 2005, Jens Axboe wrote: Nothing sticks out here either. There's plenty of idle time. It smells like a driver issue. Can you try the same dd test, but read from the drives instead? Use a bigger blocksize here, 128 or 256k. I used the following command reading from all 8 disks in parallel: dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 Here vmstat output (I just cut something out in the middle): procs ---memory-- ---swap-- -io --system-- cpu^M r b swpd free buff cache si sobibo incs us sy id wa^M 3 7 4348 42640 7799984 961200 322816 0 3532 4987 0 22 0 78 1 7 4348 42136 7800624 958400 322176 0 3526 4987 0 23 4 74 0 8 4348 39912 7802648 966800 322176 0 3525 4955 0 22 12 66 1 7 4348 38912 7803700 963600 322432 0 3526 5078 0 23 Ok, so that's somewhat better than the writes but still off from what the individual drives can do in total. You might want to try the same with direct io, just to eliminate the costly user copy. I don't expect it to make much of a difference though, feels like the problem is elsewhere (driver, most likely). Sorry, I don't know how to do this. Do you mean using a C program that sets some flag to do direct io, or how can I do that? I've attached a little sample for you, just run ala # ./oread /dev/sdX and it will read 128k chunks direct from that device. Run on the same drives as above, reply with the vmstat info again. Using kernel 2.6.12.5 again, here the results: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 0 8009648 4764 4059200 0 0 101132 0 0 100 0 0 0 0 8009648 4764 4059200 0 0 101134 0 0 100 0 0 0 0 8009648 4764 4059200 0 0 100861 0 0 100 0 0 0 0 8009648 4764 4059200 0 0 100626 0 0 100 0 0 8 0 8006372 4764 4059200 120192 0 1944 1929 0 1 89 10 2 8 0 8006372 4764 4059200 319488 0 3502 4999 0 2 75 24 0 8 0 8006372 4764 4059200 319488 0 3506 4995 0 2 75 24 0 8 0 8006372 4764 4059200 319744 0 3504 4999 0 1 75 24 0 8 0 8006372 4764 4059200 319488 0 3507 5009 0 2 75 23 0 8 0 8006372 4764 4059200 319616 0 3506 5011 0 2 75 24 0 8 0 8005124 4800 4110000 319976 0 3536 4995 0 2 73 25 0 8 0 8005124 4800 4110000 323584 0 3534 5000 0 2 75 23 0 8 0 8005124 4800 4110000 323968 0 3540 5035 0 1 75 24 0 8 0 8005124 4800 4110000 319232 0 3506 4811 0 1 75 24 0 8 0 8005504 4800 4110000 317952 0 3498 4747 0 1 75 24 0 8 0 8005504 4800 4110000 318720 0 3495 4672 0 2 75 23 1 8 0 8005504 4800 4110000 318720 0 3509 4707 0 1 75 24 0 8 0 8005504 4800 4110000 318720 0 3499 4667 0 2 75 23 0 8 0 8005504 4808 4109200 31884840 3509 4674 0 1 75 24 0 8 0 8005380 4808 4109200 318848 0 3497 4693 0 2 72 26 0 8 0 8005380 4808 4109200 318592 0 3500 4646 0 2 75 23 0 8 0 8005380 4808 4109200 318592 0 3495 4828 0 2 61 37 0 8 0 8005380 4808 4109200 318848 0 3499 4827 0 1 62 37 1 8 0 8005380 4808 4109200 318464 0 3495 4642 0 2 75 23 0 8 0 8005380 4816 4108400 31884832 3511 4672 0 1 75 24 0 8 0 8005380 4816 4108400 320640 0 3512 4877 0 2 75 23 0 8 0 8005380 4816 4108400 322944 0 3533 5047 0 2 75 24 0 8 0 8005380 4816 4108400 322816 0 3531 5053 0 1 75 24 0 8 0 8005380 4816 4108400 322944 0 3531 5048 0 2 75 23 0 8 0 8005380 4816 4108400 322944 0 3529 5043 0 1 75 24 0 0 0 8008360 4816 4108400 266880 0 3112 4224 0 2 78 20 0 0 0 8008360 4816 4108400 0 0 101228 0 0 100 0 Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Nick Piggin wrote: Holger Kiehl wrote: 3236497 total 1.4547 2507913 default_idle 52248.1875 158752 shrink_zone 43.3275 121584 copy_user_generic_c 3199.5789 34271 __wake_up_bit713.9792 31131 __make_request23.1629 22096 scsi_request_fn 18.4133 21915 rotate_reclaimable_page 80.5699 ^ I don't think this function should be here. This indicates that lots of writeout is happening due to pages falling off the end of the LRU. There was a bug recently causing memory estimates to be wrong on Opterons that could cause this I think. Can you send in 2 dumps of /proc/vmstat taken 10 seconds apart while you're writing at full speed (with 2.6.13 or the latest -git tree). I took 2.6.13, there where no git snapshots at www.kernel.org when I looked. With 2.6.13 I must load the Fusion MPT driver as module. Compiling it in it does not detect the drive correctly, as module there is no problem. Here is what I did: #!/bin/bash time dd if=/dev/full of=/dev/sdc1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdd1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sde1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdf1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdg1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdh1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdi1 bs=4M count=4883 & time dd if=/dev/full of=/dev/sdj1 bs=4M count=4883 & sleep 20 cat /proc/vmstat > /root/vmstat-1.dump sleep 10 cat /proc/vmstat > /root/vmstat-2.dump cat /proc/zoneinfo > /root/zoneinfo.dump cat /proc/meminfo > /root/meminfo.dump exit 0 vmstat-1.dump: nr_dirty 787282 nr_writeback 44317 nr_unstable 0 nr_page_table_pages 633 nr_mapped 6373 nr_slab 53030 pgpgin 263362 pgpgout 5260352 pswpin 0 pswpout 0 pgalloc_high 0 pgalloc_normal 2448628 pgalloc_dma 1041 pgfree 2457343 pgactivate 5775 pgdeactivate 2113 pgfault 465679 pgmajfault 321 pgrefill_high 0 pgrefill_normal 5940 pgrefill_dma 33 pgsteal_high 0 pgsteal_normal 148759 pgsteal_dma 0 pgscan_kswapd_high 0 pgscan_kswapd_normal 153813 pgscan_kswapd_dma 1089 pgscan_direct_high 0 pgscan_direct_normal 0 pgscan_direct_dma 0 pginodesteal 0 slabs_scanned 0 kswapd_steal 148759 kswapd_inodesteal 0 pageoutrun 5304 allocstall 0 pgrotated 0 nr_bounce 0 vmstat-2.dump: nr_dirty 786397 nr_writeback 44233 nr_unstable 0 nr_page_table_pages 640 nr_mapped 6406 nr_slab 53027 pgpgin 263382 pgpgout 7835732 pswpin 0 pswpout 0 pgalloc_high 0 pgalloc_normal 3091687 pgalloc_dma 2420 pgfree 3101327 pgactivate 5817 pgdeactivate 2918 pgfault 466269 pgmajfault 322 pgrefill_high 0 pgrefill_normal 28265 pgrefill_dma 150 pgsteal_high 0 pgsteal_normal 789909 pgsteal_dma 1388 pgscan_kswapd_high 0 pgscan_kswapd_normal 904101 pgscan_kswapd_dma 4950 pgscan_direct_high 0 pgscan_direct_normal 0 pgscan_direct_dma 0 pginodesteal 0 slabs_scanned 1152 kswapd_steal 791297 kswapd_inodesteal 0 pageoutrun 28299 allocstall 0 pgrotated 562 nr_bounce 0 zoneinfo.dump: Node 3, zone Normal pages free 899 min 726 low 907 high 1089 active 3996 inactive 490989 scanned 0 (a: 16 i: 0) spanned 524287 present 524287 protection: (0, 0, 0) pagesets cpu: 0 pcp: 0 count: 2 low: 62 high: 186 batch: 31 cpu: 0 pcp: 1 count: 0 low: 0 high: 62 batch: 31 numa_hit: 10186 numa_miss: 3313 numa_foreign: 0 interleave_hit: 10136 local_node: 0 other_node: 13499 cpu: 1 pcp: 0 count: 13 low: 62 high: 186 batch: 31 cpu: 1 pcp: 1 count: 0 low: 0 high: 62 batch: 31 numa_hit: 6559 numa_miss: 1668 numa_foreign: 0 interleave_hit: 6559 local_node: 0 other_node: 8227 cpu: 2 pcp: 0 count: 84 low: 62 high: 186 batch: 31 cpu: 2 pcp: 1 count: 0 low: 0 high: 62
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Jens Axboe wrote: Nothing sticks out here either. There's plenty of idle time. It smells like a driver issue. Can you try the same dd test, but read from the drives instead? Use a bigger blocksize here, 128 or 256k. I used the following command reading from all 8 disks in parallel: dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 Here vmstat output (I just cut something out in the middle): procs ---memory-- ---swap-- -io --system-- cpu^M r b swpd free buff cache si sobibo incs us sy id wa^M 3 7 4348 42640 7799984 961200 322816 0 3532 4987 0 22 0 78 1 7 4348 42136 7800624 958400 322176 0 3526 4987 0 23 4 74 0 8 4348 39912 7802648 966800 322176 0 3525 4955 0 22 12 66 1 7 4348 38912 7803700 963600 322432 0 3526 5078 0 23 7 70 2 6 4348 37552 7805120 964400 322432 0 3527 4908 0 23 12 64 0 8 4348 41152 7801552 960800 322176 0 3524 5018 0 24 6 70 1 7 4348 41644 7801044 957200 322560 0 3530 5175 0 23 0 76 1 7 4348 37184 7805396 964000 322176 0 3525 4914 0 24 18 59 3 7 4348 41704 7800376 983200 32217620 3531 5080 0 23 4 73 1 7 4348 40652 7801700 973200 323072 0 3533 5115 0 24 13 64 1 7 4348 40284 7802224 961600 322560 0 3527 4967 0 23 1 76 0 8 4348 40156 7802356 968800 322560 0 3528 5080 0 23 2 75 6 8 4348 41896 7799984 981600 322176 0 3530 4945 0 24 20 57 0 8 4348 39540 7803124 960000 322560 0 3529 4811 0 24 21 55 1 7 4348 41520 7801084 960000 322560 0 3532 4843 0 23 22 55 0 8 4348 40408 7802116 958800 322560 0 3527 5010 0 23 4 72 0 8 4348 38172 7804300 958000 322176 0 3526 4992 0 24 7 69 4 7 4348 42264 7799784 981200 322688 0 3529 5003 0 24 8 68 1 7 4348 39908 7802520 966000 322700 0 3529 4963 0 24 14 62 0 8 4348 37428 7805076 962000 322420 0 3528 4967 0 23 15 62 0 8 4348 37056 7805348 968800 322048 0 3525 4982 0 24 26 50 1 7 4348 37804 7804456 969600 322560 0 3528 5072 0 24 16 60 0 8 4348 38416 7804084 966000 323200 0 3533 5081 0 24 23 53 0 8 4348 40160 7802300 967600 32320028 3543 5095 0 24 17 59 1 7 4348 37928 7804612 960800 323072 0 3532 5175 0 24 7 68 2 6 4348 38680 7803724 961200 322944 0 3531 4906 0 25 24 51 1 7 4348 40408 7802192 964800 322048 0 3524 4947 0 24 19 57 Full vmstat session can be found under: ftp://ftp.dwd.de/pub/afd/linux_kernel_debug/vmstat-256k-read And here the profile data: 2106577 total 0.9469 1638177 default_idle 34128.6875 179615 copy_user_generic_c 4726.7105 27670 end_buffer_async_read108.0859 26055 shrink_zone7. 23199 __make_request17.2612 17221 kmem_cache_free 153.7589 11796 drop_buffers 52.6607 11016 add_to_page_cache 52.9615 9470 __wake_up_bit197.2917 8760 buffered_rmqueue 12.4432 8646 find_get_page 90.0625 8319 __do_page_cache_readahead 11.0625 7976 kmem_cache_alloc 124.6250 7463 scsi_request_fn6.2192 7208 try_to_free_buffers 40.9545 6716 create_empty_buffers 41.9750 6432 __end_that_request_first 11.8235 6044 test_clear_page_dirty 25.1833 5643 scsi_dispatch_cmd 9.7969 5588 free_hot_cold_page19.4028 5479 submit_bh 18.0230 3903 __alloc_pages 3.2965 3671 file_read_actor9.9755 3425 thread_return 14.2708 generic_make_request 5.6301 3294 bio_alloc_bioset 7.6250 2868 bio_put 44.8125 2851 mpt_interrupt 2.8284 2697 mempool_alloc 8.8717 2642 block_read_full_page 3.9315 2512 do_generic_mapping_read2.1216 2394 set_page_refs149.6250 2235 alloc_page_buffers 9.9777 1992 __pagevec_lru_add 8.3000 1859 __memset 9.6823 1791 page_waitqueue
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Vojtech Pavlik wrote: On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: How does one determine the PCI-X bus speed? Usually only the card (in your case the Symbios SCSI controller) can tell. If it does, it'll be most likely in 'dmesg'. There is nothing in dmesg: Fusion MPT base driver 3.01.20 Copyright (c) 1999-2004 LSI Logic Corporation ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} Fusion MPT SCSI Host driver 3.01.20 To find where the bottleneck is, I'd suggest trying without the filesystem at all, and just filling a large part of the block device using the 'dd' command. Also, trying without the RAID, and just running 4 (and 8) concurrent dd's to the separate drives could show whether it's the RAID that's slowing things down. Ok, I did run the following dd command in different combinations: dd if=/dev/zero of=/dev/sd?1 bs=4k count=500 I think a bs of 4k is way too small and will cause huge CPU overhead. Can you try with something like 4M? Also, you can use /dev/full to avoid the pre-zeroing. Ok, I now use the following command: dd if=/dev/full of=/dev/sd?1 bs=4M count=4883 Here the results for all 8 disks in parallel: /dev/sdc1 24.957257 MB/s /dev/sdd1 25.290177 MB/s /dev/sde1 25.046711 MB/s /dev/sdf1 26.369777 MB/s /dev/sdg1 24.080695 MB/s /dev/sdh1 25.008803 MB/s /dev/sdi1 24.202202 MB/s /dev/sdj1 24.712840 MB/s A little bit faster but not much. Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Vojtech Pavlik wrote: On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: How does one determine the PCI-X bus speed? Usually only the card (in your case the Symbios SCSI controller) can tell. If it does, it'll be most likely in 'dmesg'. There is nothing in dmesg: Fusion MPT base driver 3.01.20 Copyright (c) 1999-2004 LSI Logic Corporation ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} Fusion MPT SCSI Host driver 3.01.20 To find where the bottleneck is, I'd suggest trying without the filesystem at all, and just filling a large part of the block device using the 'dd' command. Also, trying without the RAID, and just running 4 (and 8) concurrent dd's to the separate drives could show whether it's the RAID that's slowing things down. Ok, I did run the following dd command in different combinations: dd if=/dev/zero of=/dev/sd?1 bs=4k count=500 I think a bs of 4k is way too small and will cause huge CPU overhead. Can you try with something like 4M? Also, you can use /dev/full to avoid the pre-zeroing. That was my initial thought as well, but since he's writing the io side should look correct. I doubt 8 dd's writing 4k chunks will gobble that much CPU as to make this much difference. Holger, we need vmstat 1 info while the dd's are running. A simple profile would be nice as well, boot with profile=2 and do a readprofile -r; run tests; readprofile > foo and send the first 50 lines of foo to this list. Here vmstat for 8 dd's still with 4k blocksize: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 9 2 5244 38272 7738248 1040000 3 11444 39024 0 5 75 20 5 10 5244 30824 7747680 868400 0 265672 2582 1917 1 95 0 4 2 12 5244 30948 7747248 870800 0 222620 2858 292 0 33 0 67 4 10 5244 31072 7747516 864400 0 236400 3132 326 0 43 0 57 2 12 5244 31320 7747792 851200 0 250204 3225 285 0 37 0 63 1 13 5244 30948 7747412 85520024 227600 3261 312 0 41 0 59 2 12 5244 32684 7746124 861600 0 235392 3219 274 0 32 0 68 1 13 5244 30948 7747940 856800 0 228020 3394 296 0 37 0 63 0 14 5244 31196 7747680 862400 0 232932 3389 300 0 32 0 68 3 12 5244 31072 7747904 853600 0 233096 3545 312 0 33 0 67 1 13 5244 31072 7747852 852000 0 226992 3381 290 0 31 0 69 1 13 5244 31196 7747704 839600 0 230112 3372 265 0 28 0 72 0 14 5244 31072 7747928 851200 0 240652 3491 295 0 33 0 67 3 13 5244 31072 7748104 860800 0 222944 3433 269 0 27 0 73 1 13 5244 31072 7748000 850800 0 207944 3470 294 0 28 0 72 0 14 5244 31072 7747980 852800 0 234608 3496 272 0 31 0 69 2 12 5244 31196 7748148 849600 0 228760 3480 280 0 28 0 72 0 14 5244 30948 7748568 862000 0 214372 3551 302 0 29 0 71 1 13 5244 31072 7748392 852400 0 226732 3494 284 0 29 0 71 0 14 5244 31072 7748004 864000 0 229628 3604 273 0 26 0 74 1 13 5244 30948 7748392 866000 0 212868 3563 266 0 28 0 72 1 13 5244 30948 7748600 852000 0 228244 3568 294 0 30 0 70 1 13 5244 31196 7748228 841600 0 221692 3543 258 0 27 0 73 1 13 5244 31072 7748192 852000 0 241040 3983 330 0 25 0 74 1 13 5244 31196 7748288 856000 0 217108 3676 276 0 28 0 72 . . . This goses on up to the end. . . . 0 3 5244 825096 6949252 859600 0 241244 2683 223 0 7 71 22 0 2 5244 825108 6949252 859600 0 229764 2683 214 0 7 73 20 0 3 5244 826348 6949252 859600 0 116840 2046 450 0 4 71 26 0 3 5244 826976 6949252 859600 0 141992 188797 0 4 73 23 0 3 5244 827100 6949252 859600 0 137716 187193 0 4 70 26 0 3 5244 827100 6949252 859600 0 137032 189496 0 4 75 21 0 3 5244 827224 6949252 859600 0 131332 1860 288 0 4 73 23 0 1 5244 1943732 5833756 862000 0 72404 1560 481 0 24 61 16 0 2
Re: Where is the performance bottleneck?
On Mon, 29 Aug 2005, Vojtech Pavlik wrote: On Mon, Aug 29, 2005 at 06:20:56PM +, Holger Kiehl wrote: Hello I have a system with the following setup: Board is Tyan S4882 with AMD 8131 Chipset 4 Opterons 848 (2.2GHz) 8 GB DDR400 Ram (2GB for each CPU) 1 onboard Symbios Logic 53c1030 dual channel U320 controller 2 SATA disks put together as a SW Raid1 for system, swap and spares 8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf) are on one channel and the other four (sdg, sdh, sdi, sdj) on the other channel. The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other device on that bus. Unfortunatly I was unable to determine at what speed it is running, here the output from lspci -vv: How does one determine the PCI-X bus speed? Usually only the card (in your case the Symbios SCSI controller) can tell. If it does, it'll be most likely in 'dmesg'. There is nothing in dmesg: Fusion MPT base driver 3.01.20 Copyright (c) 1999-2004 LSI Logic Corporation ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} Fusion MPT SCSI Host driver 3.01.20 Anyway, I thought with this system I would get theoretically 640 MB/s using both channels. You can never use the full theoretical bandwidth of the channel for data. A lot of overhead remains for other signalling. Similarly for PCI. I tested several software raid setups to get the best possible write speeds for this system. But testing shows that the absolute maximum I can reach with software raid is only approx. 270 MB/s for writting. Which is very disappointing. I'd expect somewhat better (in the 300-400 MB/s range), but this is not too bad. To find where the bottleneck is, I'd suggest trying without the filesystem at all, and just filling a large part of the block device using the 'dd' command. Also, trying without the RAID, and just running 4 (and 8) concurrent dd's to the separate drives could show whether it's the RAID that's slowing things down. Ok, I did run the following dd command in different combinations: dd if=/dev/zero of=/dev/sd?1 bs=4k count=500 Here the results: Each disk alone /dev/sdc1 59.094636 MB/s /dev/sdd1 58.686592 MB/s /dev/sde1 55.282807 MB/s /dev/sdf1 62.271240 MB/s /dev/sdg1 60.872891 MB/s /dev/sdh1 62.252781 MB/s /dev/sdi1 59.145637 MB/s /dev/sdj1 60.921119 MB/s sdc + sdd in parallel (2 disks on same channel) /dev/sdc1 42.512287 MB/s /dev/sdd1 43.118483 MB/s sdc + sdg in parallel (2 disks on different channels) /dev/sdc1 42.938186 MB/s /dev/sdg1 43.934779 MB/s sdc + sdd + sde in parallel (3 disks on same channel) /dev/sdc1 35.043501 MB/s /dev/sdd1 35.686878 MB/s /dev/sde1 34.580457 MB/s Similar results for three disks (sdg + sdh + sdi) on the other channel /dev/sdg1 36.381137 MB/s /dev/sdh1 37.541758 MB/s /dev/sdi1 35.834920 MB/s sdc + sdd + sde + sdf in parallel (4 disks on same channel) /dev/sdc1 31.432914 MB/s /dev/sdd1 32.058752 MB/s /dev/sde1 31.393455 MB/s /dev/sdf1 33.208165 MB/s And here for the four disks on the other channel /dev/sdg1 31.873028 MB/s /dev/sdh1 33.277193 MB/s /dev/sdi1 31.91 MB/s /dev/sdj1 32.626744 MB/s All 8 disks in parallel /dev/sdc1 24.120545 MB/s /dev/sdd1 24.419801 MB/s /dev/sde1 24.296588 MB/s /dev/sdf1 25.609548 MB/s /dev/sdg1 24.572617 MB/s /dev/sdh1 25.552590 MB/s /dev/sdi1 24.575616 MB/s /dev/sdj1 25.124165 MB/s So from these results, I may assume that md is not the cause of the problem. What comes as a big surprise is that I loose 25% performance with only two disks and each hanging on its own channel! Is this normal? I wonder if other people have the same problem with other controllers or the same. What can I do next to find out if this is a kernel, driver or hardware problem? Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Mon, 29 Aug 2005, Al Boldi wrote: Holger Kiehl wrote: Why do I only get 247 MB/s for writting and 227 MB/s for reading (from the bonnie++ results) for a Raid0 over 8 disks? I was expecting to get nearly three times those numbers if you take the numbers from the individual disks. What limit am I hitting here? You may be hitting a 2.6 kernel bug, which has something to do with readahead, ask Jens Axboe about it! (see "[git patches] IDE update" thread) Sadly, 2.6.13 did not fix it either. I did read that threat, but due to my limited understanding about kernel code, don't see the relation to my problem. But I am willing to try any patches to solve the problem. Did you try 2.4.31? No. Will give this a try if the problem is not found. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the performance bottleneck?
On Mon, 29 Aug 2005, Mark Hahn wrote: The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other device on that bus. Unfortunatly I was unable to determine at what speed it is running, here the output from lspci -vv: ... Status: Bus=2 Dev=4 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, the "133MHz+" is a good sign. OTOH the latency (72) seems rather low - my understanding is that that would noticably limit the size of burst transfers. I have tried with 128 and 144, but the transfer rate is only a little bit higher barely measurable. Or what values should I try? Version 1.03--Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Raid0 (8 disk)15744M 54406 96 247419 90 100752 25 60266 98 226651 29 830.2 1 Raid0s(4 disk)15744M 54915 97 253642 89 73976 18 59445 97 198372 24 659.8 1 Raid0s(4 disk)15744M 54866 97 268361 95 72852 17 59165 97 187183 22 666.3 1 you're obviously saturating something already with 2 disks. did you play with "blockdev --setra" setings? Yes, I did play a little bit with it but this only changed read performance, it made no measurable difference when writting. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Where is the performance bottleneck?
Hello I have a system with the following setup: Board is Tyan S4882 with AMD 8131 Chipset 4 Opterons 848 (2.2GHz) 8 GB DDR400 Ram (2GB for each CPU) 1 onboard Symbios Logic 53c1030 dual channel U320 controller 2 SATA disks put together as a SW Raid1 for system, swap and spares 8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf) are on one channel and the other four (sdg, sdh, sdi, sdj) on the other channel. The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other device on that bus. Unfortunatly I was unable to determine at what speed it is running, here the output from lspci -vv: 02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion- Subsystem: LSI Logic / Symbios Logic: Unknown device 1000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- TAbort- /*/ /*File Write Performance */ /*== */ /*/ #include /* printf() */ #include /* strcmp() */ #include /* exit(), atoi(), calloc(), free() */ #include /* write(), sysconf(), close(), fsync() */ #include /* times(), struct tms */ #include #include #include #include #include #define MAXLINE 4096 #define BUFSIZE 512 #define DEFAULT_FILE_SIZE 31457280 #define TEST_FILE "test.file" #define FILE_MODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) static void err_doit(int, char *, va_list), err_quit(char *, ...), err_sys(char *, ...); /*### main() */ int main(int argc, char *argv[]) { registern, loops, rest; int fd, oflag, blocksize = BUFSIZE; off_t filesize = DEFAULT_FILE_SIZE; clock_t start, end, syncend; longclktck; char*buf; struct tms tmsdummy; if ((argc > 1) && (argc < 5)) { filesize = (off_t)atoi(argv[1]) * 1024; if (argc == 3) blocksize = atoi(argv[2]); else if (argc == 4) err_quit("Usage: %s [filesize] [blocksize]"); } else if (argc != 1) err_quit("Usage: %s [filesize] [blocksize]", argv[0]); if ((clktck = sysconf(_SC_CLK_TCK)) < 0) err_sys("sysconf error"); /* If clktck=0 it dosn't make sence to run the test */ if (clktck == 0) { (void)printf("0\n"); exit(0); } if ((buf = calloc(blocksize, sizeof(char))) == NULL) err_sys("calloc error"); for (n = 0; n < blocksize; n++) buf[n] = 'T'; loops = filesize / blocksize; rest = filesize % blocksize; oflag = O_WRONLY | O_CREAT; if ((fd = open(TEST_FILE, oflag, FILE_MODE)) < 0) err_quit("Could not open %s", TEST_FILE); if ((start = times(&tmsdummy)) == -1) err_sys("Could not get start time"); for (n = 0; n < loops; n++) if (write(fd, buf, blocksize) != blocksize) err_sys("write error"); if (rest > 0) if (write(fd, buf, rest) != rest) err_sys("write error"); if ((end = times(&tmsdummy)) == -1) err_sys("Could not get end time"); (void)fsync(fd); if ((syncend = times(&tmsdummy)) == -1) err_sys("Could not get end time"); (void)close(fd); free(buf); (void)printf("%f %f\n", (double)filesize / ((double)(end - start) / (double)clktck), (double)filesize / ((double)(syncend - start) / (double)clktck)); exit(0); } static void err_sys(char *fmt, ...) { va_list ap; va_start(ap, fmt); err_doit(1, fmt, ap); va_end(ap); exit(1); } static void err_quit(char *fmt, ...) { va_list ap; va_start(ap, fmt); err_doit(0, fmt, ap); va_end(ap); exit(1); } static void err_doit(int errnoflag, char *fmt, va_list ap) { int errno_save; char buf[MAXLINE]; errno_save = errno; (void)vsprintf(buf, fmt, ap); if (errnoflag) (void)sprintf(buf+strlen(buf), ": %s", strerror(errno_save)); (void)strcat(buf, "\n"); fflush(stdout); (void)fputs(buf, stderr); fflush(NULL); /* Flushes all stdio output streams */ return; }
RE: As of 2.6.13-rc1 Fusion-MPT very slow
On Mon, 8 Aug 2005, Moore, Eric Dean wrote: On Sunday, August 07, 2005 8:30 AM, James Bottomley wrote: On Sun, 2005-08-07 at 05:59 +, Holger Kiehl wrote: Thanks, removing those it compiles fine. This patch also solves my problem, here the output of dmesg: Well ... the transport class was supposed to help diagnose the problem rather than fix it. However, what it shows is that the original problem is in the fusion internal domain validation somewhere, but that we still don't know where... James I was corresponding to Mr Holger Hiehl in private email. What I understood the problem to be was when he compiled the drivers into the kernel, instead of as modules, we would get some drives negotiating as asyn narrow on the 2nd channel. It's always the first channel that has the problem. There are four disks and the first always negotiated as wide and has the full speed. Disk 2 to 4 are always narrow and give me only 2MB/s. On the 2nd channel everything is always ok, here all 4 disks have the full speed. What I was trying to do was reproduce the issue here, and I was unable to.Has Mr Holger Hiehl tried compiling your patch with the drivers compiled statically into the kernel, instead of modules? It was compilled in statically into the kernel. Anyways - My last suggesting was that he change the scsi cable, and reset the parameters in the bios configuration utility. I don't believe that fixed it. No. I exchanged cables still always the same results. Also on a second system that has identical hardware, as soon as I put kernel 2.6.13-rc1 I get the same problem. Here's my next suggestion. Recompile the driver with domain validation debugging enabled. Then send me the output dmesg so I can analyze it. This brings us closer to the root of the problem, I think. With domain validation debugging enabled, this problem is no longer reliably reproducable. I once even saw that only the forth disk on the first channel had the slow performance. Booting several times, gave me most the time full speed for all four disk on the first channel. But the results where not stable. I then took out some unused drivers (hardware watchdog and IPMI) and the system would always come up with all four disk at full speed. I then removed domain validation debugging but then the problem was there again. So I put in a msleep(2000) in ./drivers/block/elevator.c just after it prints out what elevator it used and enabled domain validation debugging again. Booting with this kernel I managed to capture the debugging output with disk 2 to 4 having only 2MB/s. So I think there is some timing problem, somewhere. I also have the output without the msleep(), that is with all four disk having full speed on the first channel. Please tell me if this is of intrest, then I will post it as well. Thanks, Holger --- Bootdata ok (command line is ro root=/dev/md0) Linux version 2.6.13-rc5-git3 ([EMAIL PROTECTED]) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #6 SMP Tue Aug 9 11:14:17 GMT 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009a000 (usable) BIOS-e820: 0009a000 - 000a (reserved) BIOS-e820: 000d2000 - 0010 (reserved) BIOS-e820: 0010 - f7f7 (usable) BIOS-e820: f7f7 - f7f76000 (ACPI data) BIOS-e820: f7f76000 - f7f8 (ACPI NVS) BIOS-e820: f7f8 - f800 (reserved) BIOS-e820: fec0 - fec00400 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) BIOS-e820: 0001 - 0002 (usable) ACPI: RSDP (v002 PTLTD ) @ 0x000f6a70 ACPI: XSDT (v001 PTLTD XSDT 0x0604 LTP 0x) @ 0xf7f72e3b ACPI: FADT (v003 AMDHAMMER 0x0604 PTEC 0x000f4240) @ 0xf7f72f97 ACPI: SRAT (v001 AMDHAMMER 0x0604 AMD 0x0001) @ 0xf7f75904 ACPI: SSDT (v001 PTLTD POWERNOW 0x0604 LTP 0x0001) @ 0xf7f75a3c ACPI: HPET (v001 AMDHAMMER 0x0604 PTEC 0x) @ 0xf7f75dac ACPI: SSDT (v001 AMD-K8 AMD-ACPI 0x0604 AMD 0x0001) @ 0xf7f75de4 ACPI: SSDT (v001 AMD-K8 AMD-ACPI 0x0604 AMD 0x0001) @ 0xf7f75e81 ACPI: MADT (v001 PTLTD APIC 0x0604 LTP 0x) @ 0xf7f75f1e ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x0604 PTL 0x0001) @ 0xf7f75fb0 ACPI: DSDT (v001 AMD-K8 AMDACPI 0x0604 MSFT 0x010e) @ 0x SRAT: PXM 0 -> APIC 0 -> CPU 0 -> Node 0 SRAT: PXM 1 -> APIC 1 -> CPU 1 -> Node 1 SRAT: PXM 2 -> APIC 2 -> CPU 2 -> Node 2 SRAT: PXM 3 -> APIC 3 -> CPU 3 -> Node 3 SRAT: Node 0 PXM 0 0-9 SRAT: Node 0 PXM 0 0-7fff SRAT: Node 1 PXM 1 8000-f7ff SRAT: Node 2 PXM 2 1-17fff SRAT: Node 3 PX
RE: As of 2.6.13-rc1 Fusion-MPT very slow
On Sat, 6 Aug 2005, James Bottomley wrote: On Sat, 2005-08-06 at 21:12 +, Holger Kiehl wrote: drivers/message/fusion/mptspi.c:505: error: unknown field â..get_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:505: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:505: warning: (near initialization for â..mptspi_transport_functionsâ..) drivers/message/fusion/mptspi.c:506: error: unknown field â..set_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:506: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:506: warning: (near initialization for â..mptspi_transport_functionsâ..) drivers/message/fusion/mptspi.c:507: error: unknown field â..show_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:507: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:507: warning: (near initialization for â..mptspi_transport_functionsâ..) This is actually because -mm is slightly behind the scsi-misc tree. It looks like the hold_mcs parameters haven't propagated into the -mm tree yet. You should be able to correct this by cutting these three lines: .get_hold_mcs = mptspi_read_parameters, .set_hold_mcs = mptspi_write_hold_mcs, .show_hold_mcs = 1, Out of the code at lines 505-507. You'll get a warning about mptspi_write_hold_mcs() being defined but not used which you can ignore. Thanks, removing those it compiles fine. This patch also solves my problem, here the output of dmesg: Fusion MPT base driver 3.03.02 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.03.02 ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} scsi4 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=217 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 target4:0:0: Beginning Domain Validation target4:0:0: Ending Domain Validation target4:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU (6.25 ns, offset 127) SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back sdc: sdc1 Attached scsi disk sdc at scsi4, channel 0, id 0, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 target4:0:1: Beginning Domain Validation target4:0:1: Ending Domain Validation target4:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU (6.25 ns, offset 127) SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back sdd: sdd1 Attached scsi disk sdd at scsi4, channel 0, id 1, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 target4:0:2: Beginning Domain Validation target4:0:2: Ending Domain Validation target4:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU (6.25 ns, offset 127) SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back sde: sde1 Attached scsi disk sde at scsi4, channel 0, id 2, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 target4:0:3: Beginning Domain Validation target4:0:3: Ending Domain Validation target4:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU (6.25 ns, offset 127) SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back sdf: sdf1 Attached scsi disk sdf at scsi4, channel 0, id 3, lun 0 ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} scsi5 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=225 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 target5:0:0: Beginning Domain Validation target5:0:0: Ending Domain Validation target5:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU (6.25 ns, offset 127) SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdg: drive cache: write back SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI d
RE: As of 2.6.13-rc1 Fusion-MPT very slow
On Sat, 6 Aug 2005, James Bottomley wrote: On Mon, 2005-08-01 at 15:40 +, Holger Kiehl wrote: No I did not get it. Can you please send it to me or tell me where I can download it? OK, since this has stalled, how about trying a different approach. If you apply the attached patch it will cause fusion to use the transport class domain validation. That should show us which parameters are causing the problem and exactly what the negotiations said. We can also tell you how to tweak the parameters. It should apply to any recent -mm (unless Andrew does a turn to pick up the fusion module rework). I tried from 2.6.13-rc2-mm2 up to 2.6.13-rc4-mm1 and always get the following error when applying this patch: CC drivers/message/fusion/mptbase.o CC drivers/message/fusion/mptscsih.o CC drivers/message/fusion/mptspi.o drivers/message/fusion/mptspi.c: In function â..mptspi_target_allocâ..: drivers/message/fusion/mptspi.c:113: error: invalid storage class for function â..mptspi_write_offsetâ.. drivers/message/fusion/mptspi.c:114: error: invalid storage class for function â..mptspi_write_widthâ.. drivers/message/fusion/mptspi.c:131: warning: implicit declaration of function â..mptspi_write_widthâ.. drivers/message/fusion/mptspi.c: At top level: drivers/message/fusion/mptspi.c:453: warning: conflicting types for â..mptspi_write_widthâ.. drivers/message/fusion/mptspi.c:453: error: static declaration of â..mptspi_write_widthâ.. follows non-static declaration drivers/message/fusion/mptspi.c:131: error: previous implicit declaration of â..mptspi_write_widthâ.. was here drivers/message/fusion/mptspi.c:505: error: unknown field â..get_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:505: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:505: warning: (near initialization for â..mptspi_transport_functionsâ..) drivers/message/fusion/mptspi.c:506: error: unknown field â..set_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:506: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:506: warning: (near initialization for â..mptspi_transport_functionsâ..) drivers/message/fusion/mptspi.c:507: error: unknown field â..show_hold_mcsâ.. specified in initializer drivers/message/fusion/mptspi.c:507: warning: excess elements in struct initializer drivers/message/fusion/mptspi.c:507: warning: (near initialization for â..mptspi_transport_functionsâ..) make[3]: *** [drivers/message/fusion/mptspi.o] Error 1 make[2]: *** [drivers/message/fusion] Error 2 make[1]: *** [drivers/message] Error 2 make: *** [drivers] Error 2 The first errors I was able to resolve by placing the function prototype definitions (line 113 and 114) outside the function. I am using gcc 4.0.1. But the errors in line 505 onwards I don't know what to do. Should I take an earlier -mm release? Thanks, Holger
RE: As of 2.6.13-rc1 Fusion-MPT very slow
No I did not get it. Can you please send it to me or tell me where I can download it? Thanks, Holger -- On Mon, 1 Aug 2005, Moore, Eric Dean wrote: I provided an application called getspeed as an attachment in the email I sent last Friday. Did you receive that, or do I need to resend? If possible, can run that application and send me the output. Regards, Eric Moore On Monday, August 01, 2005 4:16 AM, Holger Kiehl wrote: On Fri, 29 Jul 2005, Andrew Morton wrote: "Moore, Eric Dean" <[EMAIL PROTECTED]> wrote: Regarding the 1st issue, can you try this patch out. It maybe in the -mm branch. Andrew cc'd on this email can confirm. ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/ 2.6.13-rc3/2.6 .13-rc3-mm3/broken-out/mpt-fusion-dv-fixes.patch Yes, that's part of 2.6.13-rc3-mm3. The patch makes no difference. Still get the following results when fusion is compiled in: sdc 74MB/s sdd2MB/s sde2MB/s sdf2MB/s On second channel: sdg 74MB/s sdh 74MB/s sdi 74MB/s sdj 74MB/s The patch was applied to linux-2.6.13-rc4-git3. Here part of dmesg output: Fusion MPT base driver 3.03.02 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.03.02 ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} scsi4 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=217 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back sdc: sdc1 Attached scsi disk sdc at scsi4, channel 0, id 0, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back sdd: sdd1 Attached scsi disk sdd at scsi4, channel 0, id 1, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back sde: sde1 Attached scsi disk sde at scsi4, channel 0, id 2, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back sdf: sdf1 Attached scsi disk sdf at scsi4, channel 0, id 3, lun 0 ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} scsi5 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=225 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdg: drive cache: write back SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdg: drive cache: write back sdg: sdg1 Attached scsi disk sdg at scsi5, channel 0, id 0, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdh: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdh: drive cache: write back SCSI device sdh: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdh: drive cache: write back sdh: sdh1 Attached scsi disk sdh at scsi5, channel 0, id 1, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdi: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdi: drive cache: write back SCSI device sdi: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdi: drive cache: write back sdi: sdi1 Attached scsi disk sdi at scsi5, channel 0, id 2, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdj: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdj: drive
Re: As of 2.6.13-rc1 Fusion-MPT very slow
On Fri, 29 Jul 2005, Andrew Morton wrote: "Moore, Eric Dean" <[EMAIL PROTECTED]> wrote: Regarding the 1st issue, can you try this patch out. It maybe in the -mm branch. Andrew cc'd on this email can confirm. ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6 .13-rc3-mm3/broken-out/mpt-fusion-dv-fixes.patch Yes, that's part of 2.6.13-rc3-mm3. The patch makes no difference. Still get the following results when fusion is compiled in: sdc 74MB/s sdd2MB/s sde2MB/s sdf2MB/s On second channel: sdg 74MB/s sdh 74MB/s sdi 74MB/s sdj 74MB/s The patch was applied to linux-2.6.13-rc4-git3. Here part of dmesg output: Fusion MPT base driver 3.03.02 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.03.02 ACPI: PCI Interrupt :02:04.0[A] -> GSI 24 (level, low) -> IRQ 217 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} scsi4 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=217 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back SCSI device sdc: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdc: drive cache: write back sdc: sdc1 Attached scsi disk sdc at scsi4, channel 0, id 0, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back SCSI device sdd: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdd: drive cache: write back sdd: sdd1 Attached scsi disk sdd at scsi4, channel 0, id 1, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back SCSI device sde: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sde: drive cache: write back sde: sde1 Attached scsi disk sde at scsi4, channel 0, id 2, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back SCSI device sdf: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdf: drive cache: write back sdf: sdf1 Attached scsi disk sdf at scsi4, channel 0, id 3, lun 0 ACPI: PCI Interrupt :02:04.1[B] -> GSI 25 (level, low) -> IRQ 225 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} scsi5 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=225 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdg: drive cache: write back SCSI device sdg: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdg: drive cache: write back sdg: sdg1 Attached scsi disk sdg at scsi5, channel 0, id 0, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdh: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdh: drive cache: write back SCSI device sdh: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdh: drive cache: write back sdh: sdh1 Attached scsi disk sdh at scsi5, channel 0, id 1, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdi: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdi: drive cache: write back SCSI device sdi: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdi: drive cache: write back sdi: sdi1 Attached scsi disk sdi at scsi5, channel 0, id 2, lun 0 Vendor: FUJITSU Model: MAS3735NP Rev: 0104 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdj: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdj: drive cache: write back SCSI device sdj: 143552136 512-byte hdwr sectors (73499 MB) SCSI device sdj: drive cache: write back sdj: sdj1 Attached scsi disk sdj at scsi5, channel 0, id 3, lun 0 Anything else I can try or provide? Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
As of 2.6.13-rc1 Fusion-MPT very slow
Hello On a four CPU Opteron with Fusion-MPT compiled in, I get the following results (up to 2.6.13-rc3-git7) with hdparm on the first channel with four disks: sdc74 MB/s sdd 2 MB/s sde 2 MB/s sdf 2 MB/s On the second channel also with the same type of disks: sdg74 MB/s sdh74 MB/s sdi74 MB/s sdj74 MB/s All disk are of the same type. Compiling Fusion-MPT as module for the same kernel I get 74 MB/s for all eight disks. Taking kernel 2.6.12.2 and compile it in, all eigth disks give the expected performance of 74 MB/s. When I exchange the two cables, put the first cable on second channel and second cable on first channel, always sdd, sde and sdf will only get approx. 2 MB/s with any 2.6.13-* kernels. Another problem observed with 2.6.13-rc3-git7 and Fusion-MPT compiled in is when making a ext3 filesystem over those eight disks (software Raid10), makes mke2fs hang for a very long time in D-state and /var/log/messages writting a lot of these messages: mptscsih: ioc0: >> Attempting task abort! (sc=81014ead3ac0) mptscsih: ioc0: >> Attempting task abort! (sc=81014ead38c0) mptscsih: ioc0: >> Attempting task abort! (sc=81014ead36c0) mptscsih: ioc0: >> Attempting task abort! (sc=81014ead34c0) . . . And finally, when I do a halt or powerdown just after all filesystems are unmounted the fusion driver tells me that it puts the two controllers in power save mode. Then kernel whants to flush the SCSI disks but hangs forever. This does not happen when doing a reboot. Holger -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fusion-MPT much faster as module
On Tue, 22 Mar 2005, Chen, Kenneth W wrote: On Mon, 21 Mar 2005, Andrew Morton wrote: Holger, this problem remains unresolved, does it not? Have you done any more experimentation? I must say that something funny seems to be happening here. I have two MPT-based Dell machines, neither of which is using a modular driver: akpm:/usr/src/25> 0 hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec Holger Kiehl wrote on Tuesday, March 22, 2005 12:31 AM Got the same result when compiled in, always between 12 and 13 MB/s. As module it is approx. 75 MB/s. Half guess, half with data to prove: it must be the variable driver_setup initialization. If compiled as built-in, driver_setup is initialized to zero for all of its member variables, which isn't the fastest setting. If compiled as module, it gets first class treatment with shinny performance setting. Goofing around, this patch appears to be giving higher throughput. Yes, that fixes it. Many thanks! Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fusion-MPT much faster as module
On Mon, 21 Mar 2005, Andrew Morton wrote: Holger Kiehl <[EMAIL PROTECTED]> wrote: Hello On a four CPU Opteron compiling the Fusion-MPT as module gives much better performance when compiling it in, here some bonnie++ results: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2 module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2 This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI. Why is there such a large difference? Holger, this problem remains unresolved, does it not? Have you done any more experimentation? No. For now I just leave it as module. I must say that something funny seems to be happening here. I have two MPT-based Dell machines, neither of which is using a modular driver: akpm:/usr/src/25> 0 hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec Got the same result when compiled in, always between 12 and 13 MB/s. As module it is approx. 75 MB/s. Hope that LSI Logic will find the problem. Another question I have is there a way in what SCSI mode (320, 160, etc) Fusion-MPT is running? Could not find anything in proc or dmesg. Adaptec has the following information in dmesg (and more in proc): (scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Or has the Fusion-MPT some other tool to show this information? Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fusion-MPT much faster as module
Hello On a four CPU Opteron compiling the Fusion-MPT as module gives much better performance when compiling it in, here some bonnie++ results: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2 module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2 This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI. Why is there such a large difference? Holger -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] IPMI watchdog more verbose
Hello This makes IPMI watchdog more verbose during initialization. It prints the values of timeout and if nowayout is set or not. Currently there is no way to see what these values are, onced initialzed. Please check if this is the correct place to put the printk. Holger--- linux-2.6.10/drivers/char/ipmi/ipmi_watchdog.c.original 2005-02-21 10:02:38.289344538 + +++ linux-2.6.10/drivers/char/ipmi/ipmi_watchdog.c 2005-02-21 10:10:38.925872976 + @@ -944,9 +944,6 @@ { int rv; - printk(KERN_INFO PFX "driver version " - IPMI_WATCHDOG_VERSION "\n"); - if (strcmp(action, "reset") == 0) { action_val = WDOG_TIMEOUT_RESET; } else if (strcmp(action, "none") == 0) { @@ -1031,6 +1028,9 @@ register_reboot_notifier(&wdog_reboot_notifier); notifier_chain_register(&panic_notifier_list, &wdog_panic_notifier); + printk(KERN_INFO PFX "initialized (%s). timeout=%d sec (nowayout=%d)\n", + IPMI_WATCHDOG_VERSION, timeout, nowayout); + return 0; }
What do these SCSI error messages mean?
Hello I have a SW-Raid 5 running across 6 IBM DNES-309170W (one disk is hot spare) on AIC-7890/1 Ultra2 SCSI host adapter (onboard) under 2.2.19. I use the aic driver that comes with 2.2.19 and Tagged Command Queueing is enabled and set to 24. This system was running for about 2 years without any problems when one disk had medium errors and I had to exchange this disk with a DPSS-309170N of the same size. Another thing I did was to try 2.4.5 with the new aic driver, but went back to 2.2.19. Since then I am getting the following errors in my syslog when the system is under heavy disk load: scsi : aborting command due to timeout : pid 718083, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 48 76 00 00 80 00 (scsi0:0:1:0) SCSISIGI 0x4, SEQADDR 0x61, SSTAT0 0x0, SSTAT1 0x2 (scsi0:0:1:0) SG_CACHEPTR 0x2c, SSTAT2 0x40, STCNT 0x5fc scsi : aborting command due to timeout : pid 718084, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 48 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718085, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 49 7e 00 00 30 00 scsi : aborting command due to timeout : pid 718086, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 c4 47 76 00 00 80 00 scsi : aborting command due to timeout : pid 718087, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 c4 47 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718088, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 c4 48 76 00 00 80 00 scsi : aborting command due to timeout : pid 718089, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 c4 48 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718090, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 c4 49 76 00 00 08 00 scsi : aborting command due to timeout : pid 718091, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 c4 49 7e 00 00 30 00 scsi : aborting command due to timeout : pid 718092, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 c4 47 76 00 00 80 00 scsi : aborting command due to timeout : pid 718093, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 c4 47 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718094, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 c4 48 76 00 00 80 00 scsi : aborting command due to timeout : pid 718095, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 c4 48 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718096, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 c4 49 7e 00 00 30 00 scsi : aborting command due to timeout : pid 718097, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 c4 47 76 00 00 80 00 scsi : aborting command due to timeout : pid 718098, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 c4 47 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718099, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 c4 48 76 00 00 80 00 scsi : aborting command due to timeout : pid 718100, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 c4 48 f6 00 00 80 00 scsi : aborting command due to timeout : pid 718101, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 c4 49 7e 00 00 30 00 scsi : aborting command due to timeout : pid 718102, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 c4 49 ae 00 00 08 00 scsi : aborting command due to timeout : pid 718103, scsi0, channel 0, id 3, lun 0 Read (10) 00 00 c3 76 86 00 00 20 00 scsi : aborting command due to timeout : pid 718104, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 28 6b 76 00 00 80 00 scsi : aborting command due to timeout : pid 718105, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 28 6b f6 00 00 80 00 scsi : aborting command due to timeout : pid 718106, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 28 6b 76 00 00 80 00 scsi : aborting command due to timeout : pid 718107, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 28 6b f6 00 00 40 00 scsi : aborting command due to timeout : pid 718108, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 28 6b 76 00 00 80 00 scsi : aborting command due to timeout : pid 718109, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 28 6c 36 00 00 40 00 scsi : aborting command due to timeout : pid 718110, scsi0, channel 0, id 3, lun 0 Read (10) 00 00 28 6b 4e 00 00 68 00 scsi : aborting command due to timeout : pid 718111, scsi0, channel 0, id 3, lun 0 Read (10) 00 00 28 6b f6 00 00 60 00 scsi : aborting command due to timeout : pid 718112, scsi0, channel 0, id 4, lun 0 Read (10) 00 00 28 6b 36 00 00 40 00 scsi : aborting command due to timeout : pid 718113, scsi0, channel 0, id 4, lun 0 Read (10) 00 00 28 6b b6 00 00 80 00 scsi : aborting command due to timeout : pid 718114, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 c2 ed 1e 00 00 08 00 SCSI host 0 abort (pid 718083) timed out - resetting SCSI bus is being reset for host 0 channel 0. (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 31. (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 31. (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 31. (scsi0:0:2:0) Synchrono
VLAN in kernel?
Hello Some time ago Ben Greear has posted a patch to include VLAN support into the 2.4 kernel. I and many others are using this patch with great success and without any problems for a very long time. What is the reason that this patch is not included into the kernel? Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
On Mon, 22 Jan 2001, Neil Brown wrote: > > There have been assorted reports of filesystem corruption on raid5 in > 2.4.0, and I have finally got a patch - see below. > I don't know if it addresses everybody's problems, but it fixed a very > really problem that is very reproducable. > > The problem is that parity can be calculated wrongly when doing a > read-modify-write update cycle. If you have a fully functional, you > wont notice this problem as the parity block is never used to return > data. But if you have a degraded array, you will get corruption very > quickly. > So I think this will solve the reported corruption with ext2fs, as I > think they were mostly on degradred arrays. I have no idea whether it > will address the reiserfs problems as I don't think anybody reporting > those problems described their array. > > In any case, please apply, and let me know of any further problems. > I did test this patch with 2.4.1-pre9 for about 16 hours and I no longer get the ext2 errors in syslog. Though I must say that both machines I tested did not have any degradred arrays (but do have corruption without the patch). During my last test on one of the node a disk started to get "medium errors", however everything worked fine the raid code removed the bad disk, started recalculating parity to setup the spare disk and everything kept on running with no interaction and no errors in syslog. Very nice! However, forcing a check with e2fsck -f still produces the following: root@florix:~# !e2fsck e2fsck -f /dev/md2 e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09 Pass 1: Checking inodes, blocks, and sizes Special (device/socket/fifo) inode 3630145 has non-zero size. Fix? yes Special (device/socket/fifo) inode 3630156 has non-zero size. Fix? yes Special (device/socket/fifo) inode 3630176 has non-zero size. Fix? yes Special (device/socket/fifo) inode 3630184 has non-zero size. Fix? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -3394 -3395 -3396 -3397 -3398 -3399 -3400 -3429 -3430 -3431 -3432 -3433 -3434 -3435 -3466 -3467 -3468 -3469 -3470 -3471 -3472 -3477 -3478 -3479 -3480 -3481 -3482 -3483 -3586 -3587 -3588 -3589 -3590 -3591 -3592 -3627 -3628 -3629 -3630 -3631 -3632 -3633 -3668 -3669 -3670 -3671 -3672 -3673 -3674 -3745 -3746 -3747 -3748 -3749 -3750 -3751 -3756 -3757 -3758 -3759 -3760 -3761 -3762 -3765 -3766 -3767 -3768 -3769 -3770 -3771 -3840 -3841 -3842 -3843 -3844 -3845 -3846 Fix? yes Free blocks count wrong for group #0 (27874, counted=27951). Fix? yes Free blocks count wrong (7802000, counted=7802077). Fix? yes /dev/md2: * FILE SYSTEM WAS MODIFIED * /dev/md2: 7463/4006240 files (12.7% non-contiguous), 206243/8008320 blocks Is this something I need to worry about? Yesterday I already reported that I sometimes only do get the ones with "has non-zero size". What is the meaning of this? Another thing I observed in the syslog is the following: Jan 22 23:48:21 cube kernel: __alloc_pages: 2-order allocation failed. Jan 22 23:48:42 cube last message repeated 32 times Jan 22 23:49:54 cube last message repeated 48 times Jan 22 23:58:09 cube kernel: __alloc_pages: 2-order allocation failed. Jan 22 23:58:13 cube last message repeated 12 times Jan 23 00:11:08 cube kernel: __alloc_pages: 2-order allocation failed. Jan 23 00:11:10 cube last message repeated 43 times Jan 23 00:19:35 cube kernel: __alloc_pages: 2-order allocation failed. Jan 23 00:19:39 cube last message repeated 30 times Jan 23 00:40:05 cube -- MARK -- Jan 23 00:53:36 cube kernel: __alloc_pages: 2-order allocation failed. Jan 23 00:53:50 cube last message repeated 16 times This happens under a very high load (120) and is properly not raid related. What's the meaning of this? Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
On Sun, 21 Jan 2001, Manfred Spraul wrote: > I've attached Holger's testcase (ext2, SMP, raid5) > boot with "mem=64M" and run the attached script. > The script creates and deletes 9 directories with 10.000 in each dir. > Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without > raid5 is ok. > > Holger, what's your ext2 block size, and do you run with a degraded > array? > No, I do not have a degraded array and the blocksize of ext2 is 4096. Here is what /proc/mdstat looks like: afdbench@florix:~/testdir$ cat /proc/mdstat Personalities : [raid1] [raid5] read_ahead 1024 sectors md3 : active raid1 sdc1[1] sdb1[0] 136448 blocks [2/2] [UU] md4 : active raid1 sde1[1] sdd1[0] 136448 blocks [2/2] [UU] md0 : active raid1 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0] 24000 blocks [5/5] [U] md1 : active raid5 sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0] 3148288 blocks level 5, 64k chunk, algorithm 0 [5/5] [U] md2 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0] 32033280 blocks level 5, 32k chunk, algorithm 0 [5/5] [U] unused devices: What I do have is a spare disk and I am running swap on raid1. However, my machine at home, which experienes the same problems, does not have swap on raid and is also not degraded. I applied Neils patch to 2.4.1-pre9 and rerun the test, again with filesystem corruption. I now pressed the reset button and had all parity recalculated under 2.2.18 and rebooted again to 2.4.1-pre9 to rerun the test. Now, I do not see anymore filesystem corruption in syslog, however forcing a check with e2fsck produces the following: root@florix:~# !e2fsck e2fsck -f /dev/md2 e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09 Pass 1: Checking inodes, blocks, and sizes Special (device/socket/fifo) inode 3630145 has non-zero size. Fix? yes Special (device/socket/fifo) inode 3630156 has non-zero size. Fix? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/md2: * FILE SYSTEM WAS MODIFIED * /dev/md2: 20002/4006240 files (4.8% non-contiguous), 219556/8008320 blocks Doing this three times, two of them reported the same inodes with non-zero size. One test went without any problem (first time ever under 2.4.x). Now, I am not sure if this still is a filessytem corruption and why the corruptions where so bad, before the parity recalculation under 2.2.18. I do remember the first time I run 2.4.x with a much larger testset, it corrupted my system so badly that I had to push the reset button and parity was recalculated under 2.4.1-pre3. I will now run my other testset, but this always takes 8 hours. When this is done I report back. Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious file system corruption with RAID5+SMP and kernels above2.4.0
On Sat, 20 Jan 2001, Otto Meier wrote: > Two days ago I tried new kernels on my SMP SW RAID5 System > and expirienced serous file system corruption with kernels 2.4.1-pre8,9 as >2.4.0-ac8,9,10. > The same error has been reported by other people on this list. With 2.4.0 release > everything runs fine. So I backsteped to it and had no error since. > I just tried 2.4.0 and still get filesystem corruption. My system is also SMP and SW Raid5. So far I have tried 2.4.0, 2.4.1-pre3,8 and 2.4.0-ac10 and all corrupt my filesystem. 2.2.18 is ok. With the help of Manfred Spraul I can now reproduce this problem within 10 minutes. Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: More filesystem corruption under 2.4.1-pre8 and SW Raid5
On Fri, 19 Jan 2001, Manfred Spraul wrote: > > I don't see a corruption - neither with 192MB ram nor with 48 MB ram. > SMP, no SW Raid, ext2, but only 1024 byte/file and only 12500 > files/directory. > > > > > > With 1 I also had no problem, my next step was 5. > > > 1 files need ~180MB, that fit's into the cache. > 5 files need ~900MB, that doesn't fit into the cache. > > I'd try 1 files, but now with "mem=64m" > You are right! I first tried with 2 files and 256MB and it was ok. Then I tried with 1 files and "mem=64m" and I get the corruption. So if I conclude correctly: we both have SMP + ext2 and you do not have SW raid and I do, that its definetly a SW raid bug? Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: More filesystem corruption under 2.4.1-pre8 and SW Raid5
On Fri, 19 Jan 2001, Manfred Spraul wrote: > > Another thing I notice is that the responsiveness of the machine > > decreases dramatically as the test progresses until it is nearly > > useless. After the test is done everything is back to normal. > > The same behavior was observed under 2.2.18. > > That's expected: ext2 performs linear searches through the directory, > and with 50 000 entries that's very slow. > Would reiserfs be better and does it now work with SW Raid5? > I'm running a few quick tests, but I don't have a large enough spare > partition (~ 1GB?) for a full test. > > How much main memory do you have, how large is your raid5 partition? > On the two machines I have tried both have 256 MB of memory and one has a 8GB Raid5 and the other has a 30GB Raid5 partition. > Could you try to reproduce the problem with fewer files and less main > memory? > I will try. > I'm running your test with 48 MB ram, 12500 files, 9 processes in a 156 > MB partition (swapoff, here is the test partition ;-). > With 192MB Ram I don't see the corruption. > I am not sure if I understand you correctly: with 48MB you do get corruption and with 192MB not? And if you do see corruption are you using SW Raid, SMP? With 1 I also had no problem, my next step was 5. Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
More filesystem corruption under 2.4.1-pre8 and SW Raid5
Hello Trying to find a quick way to reproduce the filesystem corruption I reported earlier, I have written a short program that simply creates a certain number of files in a given directory. Now if I start this program 9 times each creating 5 files (each 2048 Bytes) in 9 different directories and then delete these files again I always get filesystem corruption. I admit that creating 5 files in one directory is not something very common, but in my other test there are simply to many process creating and deleting files and took too long to reproduce. My assumption is that something goes wrong somewhere as soon as a certain number files have been created. The test where done on two different machines both SMP, SW Raid 5 and ext2 filesystem. Under 2.4.1-pre3 and pre8 I always get filesystem corruption. This does NOT happen under 2.2.18. I don't know if this is due to a problem in the Raid 5, ext2 filesystem or in the kernel. Also, I do not currently have a system with 2.4.x without raid5. For this reason I have attached two files (one C program and a script) with the code that corrupts my filesystem. To run it you need to issue the following commands: cc -o fsd fsd.c mkdir testdir cp fsd start_fsd testdir cd testdir chmod 755 start_fsd ./start_fsd now you need to wait 3 or 4 hours and you should see some ext2 errors in your syslog. WARNING: This corrupts you filesystem really badly! Sometimes only the files in the testdir are effected. However, I had cases where other files where also effected. The system sometimes behaves very strangely after the test, programs that always have worked just crash. Reconstruction with fsck does not always work properly, sometimes there are very strange files scattered over the whole filesystem afterwards. So be warned, do this on a test filesystem and boot the machine after the test! Another thing I notice is that the responsiveness of the machine decreases dramatically as the test progresses until it is nearly useless. After the test is done everything is back to normal. The same behavior was observed under 2.2.18. Holger #include #include #include #include #include #include #include #include static void create_files(int, int, char *), delete_files(char *); /* fsd $$*/ int main(int argc, char *argv[]) { int no_of_files, file_size; char dirname[1024]; if (argc == 4) { no_of_files = atoi(argv[1]); file_size = atoi(argv[2]); (void)strcpy(dirname, argv[3]); } else { (void)fprintf(stderr, "Usage: %s \n", argv[0]); exit(1); } create_files(no_of_files, file_size, dirname); delete_files(dirname); exit(0); } /*+++ create_files() */ static void create_files(int no_of_files, int file_size, char *dirname) { int i, fd; char *ptr; ptr = dirname + strlen(dirname); *ptr++ = '/'; for (i = 0; i < no_of_files; i++) { (void)sprintf(ptr, "this_is_dummy_file_%d", i); if ((fd = open(dirname, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR)) == -1) { (void)fprintf(stderr, "Failed to open() %s : %s\n", dirname, strerror(errno)); exit(1); } if (lseek(fd, file_size - 1, SEEK_SET) == -1) { (void)fprintf(stderr, "Failed to lseek() %s : %s\n", dirname, strerror(errno)); exit(1); } if (write(fd, "", 1) != 1) { (void)fprintf(stderr, "Failed to write() to %s : %s\n", dirname, strerror(errno)); exit(1); } if (close(fd) == -1) { (void)fprintf(stderr, "Failed to close() %s : %s\n", dirname, strerror(errno)); } } ptr[-1] = 0; return; } /* delete_files +*/ static void delete_files(char *dirname) { char *ptr; struct dirent *dirp; DIR *dp; ptr = dirname + strlen(dirname); if ((dp = opendir(dirname)) == NULL) { (void)fprintf(stderr, "Failed to opendir() %s : %s\n", dirname, strerror(errno)); exit(1); } *ptr++ = '/'; while ((dirp = readdir(dp)) != NULL) { if (dirp->d_name[0] != '.') { (void)strcpy(ptr, dirp->d_name); if (unlink(dirname) == -1) { (void)fprintf(stderr, "Failed to open() %s : %s\n", dirname, strerror(errno)); exit(1); } } } ptr[-1] = 0; if (closedir(dp) == -1) { (void)fprintf(stderr, "Failed to closedir() %s : %s\n",
PROBLEM: More filesystem corruption with 2.4.1-pre3 and SW raid5
Hello Doing further tests I have experienced more filesystem corruption. This time on another node, but also with SMP and SW raid5. The machine has run the same test several times under 2.2.18, 2.2.17, 2.2.14 and 2.2.12 with no problems. This was the first time the test was run under 2.4.1 and gave me filesystem corruption. I observed the same thing on my machine at home. The test I am doing is copying/linking thousands of files around and delete them again. The test starts of with 58 process copying 600 files (SMALL), then 135 process copy around 9000 files (MEDIUM) and the in the last test 325 process copy 8 files (BIG). Each of the three tests (SMALL, MEDIUM, BIG) is further divided into one test where the files get transmitted via FTP (localhost) and another where the files are just being linked from one directory to another one. And it always starts when I come to the linking test. The link rate is about 2000 files/s. Here follows some data what syslog reported: Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881249), 0 Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881250), 0 Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881251), 0 . . . Jan 13 17:19:56 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688150 Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338561), 0 Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338562), 0 Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338563), 0 . . . Jan 13 17:20:00 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338647), 0 Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688139 Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688136 Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688182 Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361022), 0 Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361023), 0 Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361024), 0 . . . Jan 13 17:26:35 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361023), 0 Jan 13 17:26:35 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361024), 0 Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918960 Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918961 Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918962 . . . Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808052 Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808053 Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808054 Jan 13 17:32:56 florix kernel: EXT2-fs error (device md(9,2)): ext2_readdir: bad entry in directory #2894349: rec_len % 4 != 0 - offset=0, inode=270105152, rec_len=1397, name_len=39 Jan 13 17:32:56 florix kernel: EXT2-fs warning (device md(9,2)): empty_dir: bad directory (dir #2894349) - no `.' or `..' Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940635), 0 Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940636), 0 Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940637), 0 Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940638), 0 . . . Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933469), 0 Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933471), 0 Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933472), 0 At this point I was not able t
PROBLEM: Filesystem corruption with 2.4.1-pre3 and raid5
Hello Doing some test where lots of small files get copied (and some large ones) around, I experienced filesystem corruption with 2.4.1-pre3. The system has a ASUS P2B-DS (onboard adaptec controller) with two P2-350, 256MB (one module) PC-100 222 SDRAM with ECC, with 4 SCSI disk and one IDE disk put together as one big SW Raid5 disk, SuSE 6.4 with the following: Linux cube 2.4.1-pre3 #3 SMP Sun Jan 14 14:19:02 CET 2001 i686 unknown Kernel modules2.3.24 Gnu C 2.95.2 Gnu Make 3.78.1 Binutils 2.9.5.0.24 Linux C Library x 1 root root 4061504 Mar 11 2000 /lib/libc.so.6 Dynamic linkerldd (GNU libc) 2.1.3 Procps2.0.6 Mount 2.10r Net-tools 1.54 Kbd 0.99 Sh-utils 2.0 Modules Loaded I know my modutilities are not up to date, but all relevant things (SCSI, filesystem, raid) where compiled in. Here are some messages from syslog: Jan 14 18:50:00 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (613512), 0 Jan 14 18:56:19 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (613533), 0 Jan 14 18:56:20 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (613510), 0 Jan 14 18:57:14 cube kernel: attempt to access beyond end of device Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1753106892, limit=8449536 Jan 14 18:57:14 cube kernel: attempt to access beyond end of device Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1635361196, limit=8449536 . . . Jan 14 18:57:14 cube kernel: attempt to access beyond end of device Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=127799040, limit=8449536 Jan 14 18:57:14 cube kernel: attempt to access beyond end of device Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1004451972, limit=8449536 Jan 14 19:09:05 cube -- MARK -- Jan 14 19:29:05 cube -- MARK -- Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (145947), 0 Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (145948), 0 Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (145949), 0 . . . Jan 14 19:33:18 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (145945), 0 Jan 14 19:33:18 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (145946), 0 Jan 14 19:49:06 cube -- MARK -- Jan 14 19:53:36 cube kernel: __alloc_pages: 2-order allocation failed. Jan 14 19:53:39 cube last message repeated 8 times Jan 14 20:09:06 cube -- MARK -- Jan 14 20:10:52 cube kernel: EXT2-fs error (device md(9,1)): ext2_readdir: bad entry in directory #929061: rec_len is smaller than minimal - offset=4056, inode=0, rec_len=0, name_len=0 Jan 14 20:10:52 cube kernel: EXT2-fs error (device md(9,1)): empty_dir: bad entry in directory #929061: rec_len is smaller than minimal - offset=4056, inode=0, rec_len=0, name_len=0 Jan 14 20:30:20 cube -- MARK -- Jan 14 20:50:24 cube -- MARK -- Jan 14 21:10:06 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit already cleared for block 1402395 Jan 14 21:10:06 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit already cleared for block 1438368 Jan 14 21:11:57 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit already cleared for block 1439021 Jan 14 21:11:57 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit already cleared for block 1435690 Jan 14 21:27:01 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (698429), 0 . . . Jan 14 21:27:03 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: Deleting nonexistent file (698429), 0 Jan 14 21:30:02 cube nscd: 175: cannot stat() file `/etc/group': No such file or directory Jan 14 21:35:38 cube /usr/sbin/gpm[113]: oops() invoked from gpm.c(508) Jan 14 21:35:38 cube /usr/sbin/gpm[113]: get_shift_state: Inappropriate ioctl for device At this point I could still log into the system. I noticed after killing all process with SysRQ+i that something (I assume the kernel) was eating my memory: ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 344 200 ?S14:48 0:09 init root 2 0.0 0.0 00 ?SW 14:48 0:00 [keventd] root 4 0.0 0.0 00 ?SW 14:48 0:23 [kswapd] root 5 0.0 0.0 00 ?SW 14:48 0:03 [kreclaimd] root 6 0.7 0.0 00 ?SW 14:48 2:59 [bdflush] root 7 0.3
Why is LINK_MAX so low?
Hello Why is LINK_MAX in linux only 127? The values for other operating systems is as follows: solaris 32767 hpux 32767 irix 3 In reallity LINK_MAX for ext2 is 32000, so why is this only 127? Please cc to me since I am not on this list. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/