Re: iSCSI Errors - SLES10 + Infortrend

2008-03-06 Thread Sparqz

Hi Hannes,

I guess it was only a matter of time before you found this thread (-:

I've had a look at the source through git and I must admit I'm fairly
lost - I was a qualified (embedded) software engineer at one stage,
but I've dropped that in favor of systems / network administration.
First time I've used git too, I much prefer the software I was using
previously to manage code, as you could clearly see the project
milestones  versions.

If Novell / SuSE are prepared to support the latest open-iscsi code
(from git) then I'm happy to stick with what works.  I haven't been
able to crash / corrupt anything, and I've been thrashing it to the
limits of 1Gbit ethernet.  I'll come back when I have problems with
10gbe!

Thanks all!

Stuart.

On Mar 6, 11:10 pm, Hannes Reinecke [EMAIL PROTECTED] wrote:
 Sparqz wrote:
  FYI: If anyone is keen to have a look at my bugzilla submission at
  SuSE

 https://bugzilla.novell.com/show_bug.cgi?id=366492

 Careful what you say here; I'm listening in :-)

 No, seriously: If you could do a git bisect to find out which
 commit fixed this I'd more than happy.
 The last commit that went into SLES10 SP1 is
 b4a62d156e793115d973d6841e060b5a5e77e57c
 or svn r768.

 But then I've updated the open-iscsi for SLES10 SP2 to
 git latest, so we should be fine there.
 Or that's the hope, at least.

 Cheers,

 Hannes
 --
 Dr. Hannes Reinecke   zSeries  Storage
 [EMAIL PROTECTED]  +49 911 74053 688
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
 GF: Markus Rex, HRB 16746 (AG Nürnberg)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Mike Christie

Sparqz wrote:
 
 
 On Feb 29, 9:41 am, Mike Christie [EMAIL PROTECTED] wrote:
 Sparqz wrote:

 On Feb 28, 12:09 pm, Sparqz [EMAIL PROTECTED] wrote:
 I can repeat the problem with bonnie on SuSE 10 SP1
 I can also repeat the problem with HP's open-
 iscsi-2.0.707-0.25b.src.rpm  (supplied from their website)
 I will try out bnx2i driver (HP supplied iSCSI offload driver for
 onboard ethernet).
 Turns out the hardware I've 'borrowed' for this testing doesn't have a
 bnx2 ethernet adapter, it's tg2 ...
 Then I will try the latest and greatest from open-iscsi
 I'm trying the latest semi-stable from the open-iscsi website, and
 it _is_ stable, from my bonnie and bonnie++ tests, there's no SCSI
 errors.
 HOWEVER - the throughput is maybe 20~50% less?
 There should not be any performance regressions. Are you using the
 current open-iscsi.org code with the same 2.6.16.54-0.2.5-bigsmp kernel?
 
 open-iscsi-2.0-865.15.tar.gz + 2.6.16.54-0.2.5-smp (SLES 10 SP1)
 
 Were there any default parameters changed from open-iscsi-2.0-707 and
 open-iscsi-2.0-865 ?
 

None, that should affect you like how you are seeing. Could you try this 
tarball
http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi.perftest.tar.bz2

And try it with different IO schedulers by doing
cat  /sys/block/sdb/queue/scheduler
echo $ONE_OF_THE_VALUES_FROM_THE_CAT  /sys/block/sdb/queue/scheduler

Please be carefull with the tarball. It is not stable, and just for a 
quick performance test.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Mike Christie

Sparqz wrote:
 The default run for bonnie++ on a server with 32GB of memory takes
 forever...!  But it looks like the performance is up on the production
 server (Still SLES10 SP1) with the semi-stable release from the open-
 iscsi website.
 
 Version  1.03   --Sequential Output-- --Sequential Input-
 --Random-
 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
 %CP  /sec %CP
 thor63G 45924  98 109503  26 36807   8 46482  84 85767  11
 353.9   0
 --Sequential Create-- Random
 Create
 -Create-- --Read--- -Delete-- -Create-- --Read--- -
 Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
 %CP  /sec %CP
  16  9597  57 + +++  8192  38 10504  52 + +++
 7936  44
 thor,63G,
 45924,98,109503,26,36807,8,46482,84,85767,11,353.9,0,16,9597,57,+,+
 ++,8192,38,10504,52,+,+++,7936,44
 
 Is there a change-log I can hunt through to figure out perhaps where
 the bug was and what fixed it?

You can do git revert with the open-iscsi git tree:
http://git.kernel.org/?p=linux/kernel/git/mnc/open-iscsi.git;a=summary


You can also search throught that.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Mike Christie

Mike Christie wrote:
 Sparqz wrote:


 On Feb 29, 9:41 am, Mike Christie [EMAIL PROTECTED] wrote:
 Sparqz wrote:

 On Feb 28, 12:09 pm, Sparqz [EMAIL PROTECTED] wrote:
 I can repeat the problem with bonnie on SuSE 10 SP1
 I can also repeat the problem with HP's open-
 iscsi-2.0.707-0.25b.src.rpm  (supplied from their website)
 I will try out bnx2i driver (HP supplied iSCSI offload driver for
 onboard ethernet).
 Turns out the hardware I've 'borrowed' for this testing doesn't have a
 bnx2 ethernet adapter, it's tg2 ...
 Then I will try the latest and greatest from open-iscsi
 I'm trying the latest semi-stable from the open-iscsi website, and
 it _is_ stable, from my bonnie and bonnie++ tests, there's no SCSI
 errors.
 HOWEVER - the throughput is maybe 20~50% less?
 There should not be any performance regressions. Are you using the
 current open-iscsi.org code with the same 2.6.16.54-0.2.5-bigsmp kernel?

 open-iscsi-2.0-865.15.tar.gz + 2.6.16.54-0.2.5-smp (SLES 10 SP1)

 Were there any default parameters changed from open-iscsi-2.0-707 and
 open-iscsi-2.0-865 ?

 
 None, that should affect you like how you are seeing. Could you try this 
 tarball
 http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi.perftest.tar.bz2
  
 
 
 And try it with different IO schedulers by doing
 cat  /sys/block/sdb/queue/scheduler
 echo $ONE_OF_THE_VALUES_FROM_THE_CAT  /sys/block/sdb/queue/scheduler
 
 Please be carefull with the tarball. It is not stable, and just for a 
 quick performance test.
 

Oh yeah, could you also try
http://www.open-iscsi.org/bits/open-iscsi-2.0-868-rc1.tar.gz
this one fixes a perf problem we saw with some other targets.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

Hi Mike,

 Could you try this tarball
 http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/ope...

 And try it with different IO schedulers by doing
 cat  /sys/block/sdb/queue/scheduler
 echo $ONE_OF_THE_VALUES_FROM_THE_CAT  /sys/block/sdb/queue/scheduler

 Please be carefull with the tarball. It is not stable, and just for a
 quick performance test.

This tarball doesn't compile ): It may be because I don't have a
complete build environment on this development server - I have
uploaded the error messages for you to scan through, see if there is
anything obvious there.

Thanks,

Stuart.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

FYI: If anyone is keen to have a look at my bugzilla submission at
SuSE

https://bugzilla.novell.com/show_bug.cgi?id=366492

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-25 Thread Sparqz

Here's some output from tcpdump, this is a bunch of duplicate acks, I
don't think these are normal... unless anyone can tell me different?

192.168.50.8 is the iSCSI target, 192.168.50.11 is the initiator

11:14:45.391670 IP 192.168.50.8.3260  192.168.50.11.58035: P
3143284:3144176(892) ack 193 win 61440
11:14:45.391674 IP 192.168.50.8.3260  192.168.50.11.58035: .
3144176:3145636(1460) ack 193 win 61440
11:14:45.391676 IP 192.168.50.11.58035  192.168.50.8.3260: . ack
3145636 win 32767
11:14:45.391681 IP 192.168.50.8.3260  192.168.50.11.58035: .
3145636:3147096(1460) ack 193 win 61440
11:14:45.391686 IP 192.168.50.8.3260  192.168.50.11.58035: P
3147096:3148320(1224) ack 193 win 61440
11:14:45.391689 IP 192.168.50.11.58035  192.168.50.8.3260: . ack
3148320 win 32767
11:14:45.391694 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391699 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391702 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391706 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391710 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391713 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391716 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391719 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391722 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391725 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391763 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391766 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.391769 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 193
win 61440
11:14:45.398114 IP 192.168.50.11.58035  192.168.50.8.3260: P
193:241(48) ack 3148320 win 32767
11:14:45.398211 IP 192.168.50.8.3260  192.168.50.11.58035: . ack 241
win 61440
11:14:45.398636 IP 192.168.50.8.3260  192.168.50.11.58035: .
3148320:3149780(1460) ack 241 win 61440
11:14:45.398645 IP 192.168.50.8.3260  192.168.50.11.58035: .
3149780:3151240(1460) ack 241 win 61440
11:14:45.398649 IP 192.168.50.11.58035  192.168.50.8.3260: . ack
3151240 win 32767
11:14:45.398654 IP 192.168.50.8.3260  192.168.50.11.58035: .
3151240:3152464(1224) ack 241 win 61440
11:14:45.398658 IP 192.168.50.8.3260  192.168.50.11.58035: .
3152464:3153924(1460) ack 241 win 61440
11:14:45.398662 IP 192.168.50.11.58035  192.168.50.8.3260: . ack
3153924 win 32767


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-21 Thread Pasi Kärkkäinen

On Wed, Feb 20, 2008 at 06:09:50PM -0800, Sparqz wrote:
 
 Hi All,
 
 I have a nasty problem with open-iscsi on SLES10 + an Infortrend iSCSI
 array.
 
 Basically it looks like everything goes wrong as soon as the read/
 write load becomes heavy, although network dumps suggest the problem
 is always there, it just goes critical when the load is too heavy.
 
 My setup:
 
 1x HP DL585 - SLES10 x86_64
 1x HP DL585 - RHEL4 x86_64
 1x HP DL380 - SLES10 i586


SLES10 or SLES10SP1 ? 

Have you tried installing and using the latest open-iscsi from open-iscsi.org ?
 
 2x Cisco 2960G (gigabit) switches
 
 2x Infortrend A16E-G2130-4 with 16x 1TB disks each
 
 The two Infortrend arrays have all their gigabit ethernet ports
 plugged into one of the cisco switches, then we have 2 fibre
 connections leading to the other cisco switch which has the three
 servers plugged into it.  The network is completely isolated from our
 other company networks.
 

So you have only 2 gbit/sec of bandwidth between the Cisco switches? 

How many ethernet ports do your iSCSI arrays have (plugged in to the
switches)? 

How many ethernet ports each server is using / plugged in to the switch? 

 At first I thought it was a network problem, so we replaced our dodgy
 Netgear switches with quality Cisco networking gear, but the problem
 is the same, if anything it's worse because the Cisco switches
 facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be
 more reliably producible.


Do you see packet drops/errors in any of the ports? Check all ports in both
switches. 
 
 None of the linux ethernet statistics report any errors (ifconfig) and
 the cisco switches don't report any packet errors either.  The
 Infortrend arrays don't provide ethernet statistics.


Check linux TCP statistics for tcp retransmits? netstat -s
 
 Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs,
 and a few previous segment lost.
 

Are you using ethernet flow control? Check the switch settings, and server
NIC settings.. and possible iSCSI array settings.. 

In a bigger IP-SAN setup with many servers and switches flow control might be
needed to get a good performance and to prevent tcp retransmits from
happening (=preventing the switch port buffers becoming full and packet drop
happening). 

 Any help would be much appreciated !!!
 

Btw have you tried with ext3? XFS is known to have problems with some setups
and versions.. 

I'm not familiar with Infotrend iSCSI arrays so can't comment much about
them..

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-21 Thread Sparqz

Hi Pasi,

  My setup:

  1x HP DL585 - SLES10 x86_64
  1x HP DL585 - RHEL4 x86_64
  1x HP DL380 - SLES10 i586

 SLES10 or SLES10SP1 ?

SLES10SP1


 Have you tried installing and using the latest open-iscsi from open-iscsi.org 
 ?

  2x Cisco 2960G (gigabit) switches

  2x Infortrend A16E-G2130-4 with 16x 1TB disks each

  The two Infortrend arrays have all their gigabit ethernet ports
  plugged into one of the cisco switches, then we have 2 fibre
  connections leading to the other cisco switch which has the three
  servers plugged into it.  The network is completely isolated from our
  other company networks.

 So you have only 2 gbit/sec of bandwidth between the Cisco switches?

That's correct.  I've never seen the two links saturated together, the
most I've seen is ~95% on the first link and ~50% on the second.


 How many ethernet ports do your iSCSI arrays have (plugged in to the
 switches)?

Each iSCSI array has four 1Gbit ethernet ports, so all four ports are
connected on each array.


 How many ethernet ports each server is using / plugged in to the switch?

Each server has two 1Gbit ethernet ports - but only one port is used
on each server for iSCSI traffic, the other is for usual LAN traffic.


  At first I thought it was a network problem, so we replaced our dodgy
  Netgear switches with quality Cisco networking gear, but the problem
  is the same, if anything it's worse because the Cisco switches
  facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be
  more reliably producible.

 Do you see packet drops/errors in any of the ports? Check all ports in both
 switches.

No drops and no errors on any of the ports on the servers or on the
switches.  There's no way to tell what is happening on the iSCSI
arrays.


  None of the linux ethernet statistics report any errors (ifconfig) and
  the cisco switches don't report any packet errors either.  The
  Infortrend arrays don't provide ethernet statistics.

 Check linux TCP statistics for tcp retransmits? netstat -s

Tcp:
9787 active connections openings
4964 passive connection openings
8 failed connection attempts
885 connection resets received
33 connections established
1903902036 segments received
3106760297 segments send out
2108006 segments retransmited
0 bad segments received.
1298 resets sent

Looks like there are...  any way to just pull the stats for eth1 ?


  Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs,
  and a few previous segment lost.

 Are you using ethernet flow control? Check the switch settings, and server
 NIC settings.. and possible iSCSI array settings..

Someone replied outside of the forum, suggesting I turn on flow
control.  It's made things a lot faster, but I still see problems with
packets, and eventually iscsi errors.


 In a bigger IP-SAN setup with many servers and switches flow control might be
 needed to get a good performance and to prevent tcp retransmits from
 happening (=preventing the switch port buffers becoming full and packet drop
 happening).

  Any help would be much appreciated !!!

 Btw have you tried with ext3? XFS is known to have problems with some setups
 and versions..

ext3 is worse in my experience.  because our partitions are 1, 2, 5TB
in size XFS works better for us, especially in the case where the
partition has to be scanned for errors.  fsck takes hours on large
multiple terabyte arrays!  xfs_check takes only a few minutes.
Although, it could just be the amount of IO that fsck.ext3 does that
causes iscsi problems and delays etc.


 I'm not familiar with Infotrend iSCSI arrays so can't comment much about
 them..

I get that a lot )-;


 -- Pasi
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---