Re: iSCSI Errors - SLES10 + Infortrend
Hi Hannes, I guess it was only a matter of time before you found this thread (-: I've had a look at the source through git and I must admit I'm fairly lost - I was a qualified (embedded) software engineer at one stage, but I've dropped that in favor of systems / network administration. First time I've used git too, I much prefer the software I was using previously to manage code, as you could clearly see the project milestones versions. If Novell / SuSE are prepared to support the latest open-iscsi code (from git) then I'm happy to stick with what works. I haven't been able to crash / corrupt anything, and I've been thrashing it to the limits of 1Gbit ethernet. I'll come back when I have problems with 10gbe! Thanks all! Stuart. On Mar 6, 11:10 pm, Hannes Reinecke [EMAIL PROTECTED] wrote: Sparqz wrote: FYI: If anyone is keen to have a look at my bugzilla submission at SuSE https://bugzilla.novell.com/show_bug.cgi?id=366492 Careful what you say here; I'm listening in :-) No, seriously: If you could do a git bisect to find out which commit fixed this I'd more than happy. The last commit that went into SLES10 SP1 is b4a62d156e793115d973d6841e060b5a5e77e57c or svn r768. But then I've updated the open-iscsi for SLES10 SP2 to git latest, so we should be fine there. Or that's the hope, at least. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Sparqz wrote: On Feb 29, 9:41 am, Mike Christie [EMAIL PROTECTED] wrote: Sparqz wrote: On Feb 28, 12:09 pm, Sparqz [EMAIL PROTECTED] wrote: I can repeat the problem with bonnie on SuSE 10 SP1 I can also repeat the problem with HP's open- iscsi-2.0.707-0.25b.src.rpm (supplied from their website) I will try out bnx2i driver (HP supplied iSCSI offload driver for onboard ethernet). Turns out the hardware I've 'borrowed' for this testing doesn't have a bnx2 ethernet adapter, it's tg2 ... Then I will try the latest and greatest from open-iscsi I'm trying the latest semi-stable from the open-iscsi website, and it _is_ stable, from my bonnie and bonnie++ tests, there's no SCSI errors. HOWEVER - the throughput is maybe 20~50% less? There should not be any performance regressions. Are you using the current open-iscsi.org code with the same 2.6.16.54-0.2.5-bigsmp kernel? open-iscsi-2.0-865.15.tar.gz + 2.6.16.54-0.2.5-smp (SLES 10 SP1) Were there any default parameters changed from open-iscsi-2.0-707 and open-iscsi-2.0-865 ? None, that should affect you like how you are seeing. Could you try this tarball http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi.perftest.tar.bz2 And try it with different IO schedulers by doing cat /sys/block/sdb/queue/scheduler echo $ONE_OF_THE_VALUES_FROM_THE_CAT /sys/block/sdb/queue/scheduler Please be carefull with the tarball. It is not stable, and just for a quick performance test. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Sparqz wrote: The default run for bonnie++ on a server with 32GB of memory takes forever...! But it looks like the performance is up on the production server (Still SLES10 SP1) with the semi-stable release from the open- iscsi website. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP thor63G 45924 98 109503 26 36807 8 46482 84 85767 11 353.9 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- - Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 9597 57 + +++ 8192 38 10504 52 + +++ 7936 44 thor,63G, 45924,98,109503,26,36807,8,46482,84,85767,11,353.9,0,16,9597,57,+,+ ++,8192,38,10504,52,+,+++,7936,44 Is there a change-log I can hunt through to figure out perhaps where the bug was and what fixed it? You can do git revert with the open-iscsi git tree: http://git.kernel.org/?p=linux/kernel/git/mnc/open-iscsi.git;a=summary You can also search throught that. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Mike Christie wrote: Sparqz wrote: On Feb 29, 9:41 am, Mike Christie [EMAIL PROTECTED] wrote: Sparqz wrote: On Feb 28, 12:09 pm, Sparqz [EMAIL PROTECTED] wrote: I can repeat the problem with bonnie on SuSE 10 SP1 I can also repeat the problem with HP's open- iscsi-2.0.707-0.25b.src.rpm (supplied from their website) I will try out bnx2i driver (HP supplied iSCSI offload driver for onboard ethernet). Turns out the hardware I've 'borrowed' for this testing doesn't have a bnx2 ethernet adapter, it's tg2 ... Then I will try the latest and greatest from open-iscsi I'm trying the latest semi-stable from the open-iscsi website, and it _is_ stable, from my bonnie and bonnie++ tests, there's no SCSI errors. HOWEVER - the throughput is maybe 20~50% less? There should not be any performance regressions. Are you using the current open-iscsi.org code with the same 2.6.16.54-0.2.5-bigsmp kernel? open-iscsi-2.0-865.15.tar.gz + 2.6.16.54-0.2.5-smp (SLES 10 SP1) Were there any default parameters changed from open-iscsi-2.0-707 and open-iscsi-2.0-865 ? None, that should affect you like how you are seeing. Could you try this tarball http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi.perftest.tar.bz2 And try it with different IO schedulers by doing cat /sys/block/sdb/queue/scheduler echo $ONE_OF_THE_VALUES_FROM_THE_CAT /sys/block/sdb/queue/scheduler Please be carefull with the tarball. It is not stable, and just for a quick performance test. Oh yeah, could you also try http://www.open-iscsi.org/bits/open-iscsi-2.0-868-rc1.tar.gz this one fixes a perf problem we saw with some other targets. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Hi Mike, Could you try this tarball http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/ope... And try it with different IO schedulers by doing cat /sys/block/sdb/queue/scheduler echo $ONE_OF_THE_VALUES_FROM_THE_CAT /sys/block/sdb/queue/scheduler Please be carefull with the tarball. It is not stable, and just for a quick performance test. This tarball doesn't compile ): It may be because I don't have a complete build environment on this development server - I have uploaded the error messages for you to scan through, see if there is anything obvious there. Thanks, Stuart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
FYI: If anyone is keen to have a look at my bugzilla submission at SuSE https://bugzilla.novell.com/show_bug.cgi?id=366492 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Here's some output from tcpdump, this is a bunch of duplicate acks, I don't think these are normal... unless anyone can tell me different? 192.168.50.8 is the iSCSI target, 192.168.50.11 is the initiator 11:14:45.391670 IP 192.168.50.8.3260 192.168.50.11.58035: P 3143284:3144176(892) ack 193 win 61440 11:14:45.391674 IP 192.168.50.8.3260 192.168.50.11.58035: . 3144176:3145636(1460) ack 193 win 61440 11:14:45.391676 IP 192.168.50.11.58035 192.168.50.8.3260: . ack 3145636 win 32767 11:14:45.391681 IP 192.168.50.8.3260 192.168.50.11.58035: . 3145636:3147096(1460) ack 193 win 61440 11:14:45.391686 IP 192.168.50.8.3260 192.168.50.11.58035: P 3147096:3148320(1224) ack 193 win 61440 11:14:45.391689 IP 192.168.50.11.58035 192.168.50.8.3260: . ack 3148320 win 32767 11:14:45.391694 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391699 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391702 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391706 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391710 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391713 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391716 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391719 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391722 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391725 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391763 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391766 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.391769 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 193 win 61440 11:14:45.398114 IP 192.168.50.11.58035 192.168.50.8.3260: P 193:241(48) ack 3148320 win 32767 11:14:45.398211 IP 192.168.50.8.3260 192.168.50.11.58035: . ack 241 win 61440 11:14:45.398636 IP 192.168.50.8.3260 192.168.50.11.58035: . 3148320:3149780(1460) ack 241 win 61440 11:14:45.398645 IP 192.168.50.8.3260 192.168.50.11.58035: . 3149780:3151240(1460) ack 241 win 61440 11:14:45.398649 IP 192.168.50.11.58035 192.168.50.8.3260: . ack 3151240 win 32767 11:14:45.398654 IP 192.168.50.8.3260 192.168.50.11.58035: . 3151240:3152464(1224) ack 241 win 61440 11:14:45.398658 IP 192.168.50.8.3260 192.168.50.11.58035: . 3152464:3153924(1460) ack 241 win 61440 11:14:45.398662 IP 192.168.50.11.58035 192.168.50.8.3260: . ack 3153924 win 32767 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
On Wed, Feb 20, 2008 at 06:09:50PM -0800, Sparqz wrote: Hi All, I have a nasty problem with open-iscsi on SLES10 + an Infortrend iSCSI array. Basically it looks like everything goes wrong as soon as the read/ write load becomes heavy, although network dumps suggest the problem is always there, it just goes critical when the load is too heavy. My setup: 1x HP DL585 - SLES10 x86_64 1x HP DL585 - RHEL4 x86_64 1x HP DL380 - SLES10 i586 SLES10 or SLES10SP1 ? Have you tried installing and using the latest open-iscsi from open-iscsi.org ? 2x Cisco 2960G (gigabit) switches 2x Infortrend A16E-G2130-4 with 16x 1TB disks each The two Infortrend arrays have all their gigabit ethernet ports plugged into one of the cisco switches, then we have 2 fibre connections leading to the other cisco switch which has the three servers plugged into it. The network is completely isolated from our other company networks. So you have only 2 gbit/sec of bandwidth between the Cisco switches? How many ethernet ports do your iSCSI arrays have (plugged in to the switches)? How many ethernet ports each server is using / plugged in to the switch? At first I thought it was a network problem, so we replaced our dodgy Netgear switches with quality Cisco networking gear, but the problem is the same, if anything it's worse because the Cisco switches facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be more reliably producible. Do you see packet drops/errors in any of the ports? Check all ports in both switches. None of the linux ethernet statistics report any errors (ifconfig) and the cisco switches don't report any packet errors either. The Infortrend arrays don't provide ethernet statistics. Check linux TCP statistics for tcp retransmits? netstat -s Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs, and a few previous segment lost. Are you using ethernet flow control? Check the switch settings, and server NIC settings.. and possible iSCSI array settings.. In a bigger IP-SAN setup with many servers and switches flow control might be needed to get a good performance and to prevent tcp retransmits from happening (=preventing the switch port buffers becoming full and packet drop happening). Any help would be much appreciated !!! Btw have you tried with ext3? XFS is known to have problems with some setups and versions.. I'm not familiar with Infotrend iSCSI arrays so can't comment much about them.. -- Pasi --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iSCSI Errors - SLES10 + Infortrend
Hi Pasi, My setup: 1x HP DL585 - SLES10 x86_64 1x HP DL585 - RHEL4 x86_64 1x HP DL380 - SLES10 i586 SLES10 or SLES10SP1 ? SLES10SP1 Have you tried installing and using the latest open-iscsi from open-iscsi.org ? 2x Cisco 2960G (gigabit) switches 2x Infortrend A16E-G2130-4 with 16x 1TB disks each The two Infortrend arrays have all their gigabit ethernet ports plugged into one of the cisco switches, then we have 2 fibre connections leading to the other cisco switch which has the three servers plugged into it. The network is completely isolated from our other company networks. So you have only 2 gbit/sec of bandwidth between the Cisco switches? That's correct. I've never seen the two links saturated together, the most I've seen is ~95% on the first link and ~50% on the second. How many ethernet ports do your iSCSI arrays have (plugged in to the switches)? Each iSCSI array has four 1Gbit ethernet ports, so all four ports are connected on each array. How many ethernet ports each server is using / plugged in to the switch? Each server has two 1Gbit ethernet ports - but only one port is used on each server for iSCSI traffic, the other is for usual LAN traffic. At first I thought it was a network problem, so we replaced our dodgy Netgear switches with quality Cisco networking gear, but the problem is the same, if anything it's worse because the Cisco switches facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be more reliably producible. Do you see packet drops/errors in any of the ports? Check all ports in both switches. No drops and no errors on any of the ports on the servers or on the switches. There's no way to tell what is happening on the iSCSI arrays. None of the linux ethernet statistics report any errors (ifconfig) and the cisco switches don't report any packet errors either. The Infortrend arrays don't provide ethernet statistics. Check linux TCP statistics for tcp retransmits? netstat -s Tcp: 9787 active connections openings 4964 passive connection openings 8 failed connection attempts 885 connection resets received 33 connections established 1903902036 segments received 3106760297 segments send out 2108006 segments retransmited 0 bad segments received. 1298 resets sent Looks like there are... any way to just pull the stats for eth1 ? Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs, and a few previous segment lost. Are you using ethernet flow control? Check the switch settings, and server NIC settings.. and possible iSCSI array settings.. Someone replied outside of the forum, suggesting I turn on flow control. It's made things a lot faster, but I still see problems with packets, and eventually iscsi errors. In a bigger IP-SAN setup with many servers and switches flow control might be needed to get a good performance and to prevent tcp retransmits from happening (=preventing the switch port buffers becoming full and packet drop happening). Any help would be much appreciated !!! Btw have you tried with ext3? XFS is known to have problems with some setups and versions.. ext3 is worse in my experience. because our partitions are 1, 2, 5TB in size XFS works better for us, especially in the case where the partition has to be scanned for errors. fsck takes hours on large multiple terabyte arrays! xfs_check takes only a few minutes. Although, it could just be the amount of IO that fsck.ext3 does that causes iscsi problems and delays etc. I'm not familiar with Infotrend iSCSI arrays so can't comment much about them.. I get that a lot )-; -- Pasi --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---