Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]

2008-02-07 Thread Shirley Ma
On Thu, 2008-02-07 at 18:16 -0800, Ralph Campbell wrote:
> # cat /etc/*release
> Red Hat Enterprise Linux Server release 5 (Tikanga)
> # uname -r
> 2.6.18-8.el5
> 
> 4K PAGE_SIZE
I don't have ipath driver here. Otherwise I could try them out. 

A couple suggestions here, could you please try out?

1. try this on 64K page size, like RHEL5U1 to see whether you have the
same issue.

2. Can you put a debug message in ipath_create_ah() to see whether this
is a memory allocation failure?

3. How many IB cards in your system? If you have severals, just leave
one ipath there to see whether you can hit this problem.

Thanks
Shirley

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]

2008-02-07 Thread Ralph Campbell
On Thu, 2008-02-07 at 08:29 -0800, Shirley Ma wrote:
> On Thu, 2008-02-07 at 18:16 -0800, Ralph Campbell wrote:
> > # cat /etc/*release
> > Red Hat Enterprise Linux Server release 5 (Tikanga)
> > # uname -r
> > 2.6.18-8.el5
> > 
> > 4K PAGE_SIZE
> I don't have ipath driver here. Otherwise I could try them out. 
> 
> A couple suggestions here, could you please try out?
> 
> 1. try this on 64K page size, like RHEL5U1 to see whether you have the
> same issue.

We don't have any systems with 64K page size at hand.

> 2. Can you put a debug message in ipath_create_ah() to see whether this
> is a memory allocation failure?

I'm working on it.

> 3. How many IB cards in your system? If you have severals, just leave
> one ipath there to see whether you can hit this problem.

only one card with one IB port.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]

2008-02-07 Thread Ralph Campbell
On Thu, 2008-02-07 at 08:11 -0800, Shirley Ma wrote:
> Hello Ralph,
> 
> What's ifconfig ib0 output?

# ifconfig ib0 ib-iqa-77 mtu 65520
ib0: bringing up interface
ib0: failed to create own ah
SIOCSIFFLAGS: Invalid argument

> > > We can reproduce the problem here.
> > > We haven't made any ib_ipath driver changes between RC3 and RC4
> > > so some recent patch has broken us.
> > > I'm in the process of looking at it.
> > > 
> > > On Wed, 2008-02-06 at 17:17 -0800, Arlin Davis wrote:
> > > > I cannot ifconfig ib0 on ipath with using the latest build
> > > > (ofed20080206).
> > > >  
> > > > ifup ib0
> > > > SIOCSIFFLAGS: Invalid argument
> > > > Failed to bring up ib0.
> > > > 
> > > > >>>  ib0: failed to create own ah
> 
> int ipoib_ib_dev_open(struct net_device *dev)
> {
> struct ipoib_dev_priv *priv = netdev_priv(dev);
> int ret;
> 
> if (ib_find_pkey(priv->ca, priv->port, priv->pkey,
> &priv->pkey_index)) {
> ipoib_warn(priv, "P_Key 0x%04x not found\n",
> priv->pkey);
> clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
> return -1;
> }
> set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
> 
> ret = create_own_ah(priv);
> if (ret) {
> priv->own_ah = NULL;
> ipoib_warn(priv, "failed to create own ah\n");
> return -1;
> }
> 
> Looks like the ipath driver returns error from create_own_ah() call. Are
> you sure there is no ipath driver changes between RC3 and RC4?

Yes.

> Which kernel did you hit this problem? What's the kernel PAGE_SIZE?

# cat /etc/*release
Red Hat Enterprise Linux Server release 5 (Tikanga)
# uname -r
2.6.18-8.el5

4K PAGE_SIZE

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [Stgt-devel] Update (Re: open iSCSI over iSER target RPM ...)

2008-02-07 Thread FUJITA Tomonori
On Thu, 07 Feb 2008 11:05:03 -0500
Joe Landman <[EMAIL PROTECTED]> wrote:

> Update:
> 
> [EMAIL PROTECTED] etc]# dd if=/dev/zero of=/big/local.file bs=256k 
> count=10
> 10+0 records in
> 10+0 records out
> 2621440 bytes (26 GB) copied, 58.7484 seconds, 446 MB/s
> 
> Better. I rebuilt OFED 1.2.5.5.  Are there specific recommended tuning 
> guides for iSER?  Backing store in this case are real disks, and we can 
> sink/source >750 MB/s on them, so I am not worried about disk IO 
> bottlenecks, more worried about bad config of iSCSI/iSER.
> 
> BTW:  the 2TB LUN limit I asked about is still here in this code.  Same 
> machines (initiator and target) used for SRP reported correct LUN sizes. 
>   Here we are using the -868 open-iscsi initiator, and the tgt RPM 
> announced.  I would like to dig into this.

Thanks a lot,

I thought that I tested tgt with >2TB devices but seems that I
didn't. I'll try to fix the problem shortly.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]

2008-02-07 Thread Shirley Ma
Hello Ralph,

What's ifconfig ib0 output?

> > We can reproduce the problem here.
> > We haven't made any ib_ipath driver changes between RC3 and RC4
> > so some recent patch has broken us.
> > I'm in the process of looking at it.
> > 
> > On Wed, 2008-02-06 at 17:17 -0800, Arlin Davis wrote:
> > > I cannot ifconfig ib0 on ipath with using the latest build
> > > (ofed20080206).
> > >  
> > > ifup ib0
> > > SIOCSIFFLAGS: Invalid argument
> > > Failed to bring up ib0.
> > > 
> > > >>>  ib0: failed to create own ah

int ipoib_ib_dev_open(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
int ret;

if (ib_find_pkey(priv->ca, priv->port, priv->pkey,
&priv->pkey_index)) {
ipoib_warn(priv, "P_Key 0x%04x not found\n",
priv->pkey);
clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
return -1;
}
set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);

ret = create_own_ah(priv);
if (ret) {
priv->own_ah = NULL;
ipoib_warn(priv, "failed to create own ah\n");
return -1;
}

Looks like the ipath driver returns error from create_own_ah() call. Are
you sure there is no ipath driver changes between RC3 and RC4?

Which kernel did you hit this problem? What's the kernel PAGE_SIZE?

thanks
Shirley

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg][PATCH][0/2] SRP multipath failover within 60 seconds,

2008-02-07 Thread Vu Pham

David Dillow wrote:

On Thu, 2008-02-07 at 08:18 +0200, Vladimir Sokolovsky wrote:

Vu Pham wrote:
The following patches assist SRP/dm-multipath to failover within 60 
seconds (bugzilla #577) without data corruption, read/write error

[snip]

Applied,
kernel_patches/fixes/srp_2_disconnect_without_wait.patch
kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch


Are there plans for these (and the ones they build on) to make their way
to the upstream kernel?



Is there any objection for these patches merged upstream?

Let me rework these patches and send to Roland/general list 
for review


  --vu
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]

2008-02-07 Thread Ralph Campbell
I forgot to CC EWG on my reply to Arlin Davis.
--- Begin Message ---
We can reproduce the problem here.
We haven't made any ib_ipath driver changes between RC3 and RC4
so some recent patch has broken us.
I'm in the process of looking at it.

On Wed, 2008-02-06 at 17:17 -0800, Arlin Davis wrote:
> I cannot ifconfig ib0 on ipath with using the latest build
> (ofed20080206).
>  
> ifup ib0
> SIOCSIFFLAGS: Invalid argument
> Failed to bring up ib0.
> 
> >>>  ib0: failed to create own ah
>  
> CA 'ipath0'
> CA type: InfiniPath_QLE7140
> Number of ports: 1
> Firmware version:
> Hardware version: 2
> Node GUID: 0x001175ffd75b
> System image GUID: 0x001175ffd75b
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 10
> Base lid: 14
> LMC: 0
> SM lid: 1
> Capability mask: 0x02010800
> Port GUID: 0x001175ffd75b
>  
> It works fine on mthca adapters. Anyone else see this problem?
> 
> 
> -arlin
> 
> 
>  
> ___
> general mailing list
> [EMAIL PROTECTED]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
--- End Message ---
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Pradeep Satyanarayana
Eli Cohen wrote:
>> Ehca supports fewer than 16 s/g entries- hence the srq patch addresses that 
>> issue.
>> The sequence of steps that I followed for the touch test:
>> 1. On a freshly booted system, configure ib0 and assign an IP addresss
>> 2. Switch to connected mode and change mtu
>> 3. ping remote ib interface (already in CM mode)
>> 4. modprobe -r ib_ehca
>>
>> I see a series of cascading failures in /var/log/messages, starting with
>> the issue of not being able to destroy the cq (specifically rcq)
>>
> I followed the procedure you describe with Arbel device. I changed the
> code such that it will publish 12 scatter entires for the SRQ. I did
> not see this problem however so I don't how to debug this. Could it be
> a problem in the ehca driver?
> 
Hello Eli, 

Thanks for the update. We are continuing to investigate this issue.

Pradeep

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Eli Cohen
> Ehca supports fewer than 16 s/g entries- hence the srq patch addresses that 
> issue.
> The sequence of steps that I followed for the touch test:
> 1. On a freshly booted system, configure ib0 and assign an IP addresss
> 2. Switch to connected mode and change mtu
> 3. ping remote ib interface (already in CM mode)
> 4. modprobe -r ib_ehca
>
> I see a series of cascading failures in /var/log/messages, starting with
> the issue of not being able to destroy the cq (specifically rcq)
>
I followed the procedure you describe with Arbel device. I changed the
code such that it will publish 12 scatter entires for the SRQ. I did
not see this problem however so I don't how to debug this. Could it be
a problem in the ehca driver?
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg][PATCH][0/2] SRP multipath failover within 60 seconds,

2008-02-07 Thread David Dillow

On Thu, 2008-02-07 at 08:18 +0200, Vladimir Sokolovsky wrote:
> Vu Pham wrote:
> > The following patches assist SRP/dm-multipath to failover within 60 
> > seconds (bugzilla #577) without data corruption, read/write error
[snip]
> Applied,
> kernel_patches/fixes/srp_2_disconnect_without_wait.patch
> kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch

Are there plans for these (and the ones they build on) to make their way
to the upstream kernel?
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Pradeep Satyanarayana
Eli Cohen wrote:
>> This problem was seen on a ehca that supports SRQ.
>>
> 
> Please reply how many scatter entries does ehca support when working
> in SRQ mode? Also any piece of info I might need to try and mimic ehca
> behaviour on Mellanox devices. I will appreciate if you can repeat the
> exact sequence of actions you do to reproduce this.

Hello Eli,

Ehca supports fewer than 16 s/g entries- hence the srq patch addresses that 
issue. 
The sequence of steps that I followed for the touch test:
1. On a freshly booted system, configure ib0 and assign an IP addresss
2. Switch to connected mode and change mtu
3. ping remote ib interface (already in CM mode)
4. modprobe -r ib_ehca

I see a series of cascading failures in /var/log/messages, starting with 
the issue of not being able to destroy the cq (specifically rcq)

Pradeep

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Update (Re: open iSCSI over iSER target RPM ...)

2008-02-07 Thread Joe Landman

Update:

[EMAIL PROTECTED] etc]# dd if=/dev/zero of=/big/local.file bs=256k count=10
10+0 records in
10+0 records out
2621440 bytes (26 GB) copied, 58.7484 seconds, 446 MB/s

Better. I rebuilt OFED 1.2.5.5.  Are there specific recommended tuning 
guides for iSER?  Backing store in this case are real disks, and we can 
sink/source >750 MB/s on them, so I am not worried about disk IO 
bottlenecks, more worried about bad config of iSCSI/iSER.


BTW:  the 2TB LUN limit I asked about is still here in this code.  Same 
machines (initiator and target) used for SRP reported correct LUN sizes. 
 Here we are using the -868 open-iscsi initiator, and the tgt RPM 
announced.  I would like to dig into this.


This is what I am getting in dmesg for this iSER target:

iscsi: registered transport (tcp)
iscsi: registered transport (iser)
iser: iser_connect:connecting to: 10.2.1.2, port 0xbc0c
iser: iser_cma_handler:event 0 conn 81024b9f69c0 id 810209748c00
iser: iser_cma_handler:event 2 conn 81024b9f69c0 id 810209748c00
iser: iser_create_ib_conn_res:setting conn 81024b9f69c0 cma_id 
810209748c00: fmr_pool 81024bfb32c0 qp 8101cb16d600

iser: iser_cma_handler:event 9 conn 81024b9f69c0 id 810209748c00
iser: iscsi_iser_ep_poll:ib conn 81024b9f69c0 rc = 1
scsi13 : iSCSI Initiator over iSER, v.0.1
iser: iscsi_iser_conn_bind:binding iscsi conn 81021b65fa90 to 
iser_conn 81024b9f69c0

  Vendor: IET   Model: ControllerRev: 0001
  Type:   RAID   ANSI SCSI revision: 05
scsi 13:0:0:0: Attached scsi generic sg2 type 12
  Vendor: IET   Model: VIRTUAL-DISK  Rev: 0001
  Type:   Direct-Access  ANSI SCSI revision: 05
sdc : very big device. try to use READ CAPACITY(16).
sdc : READ CAPACITY(16) failed.
sdc : status=1, message=00, host=0, driver=08
sdc : use 0x as device size
SCSI device sdc: 4294967296 512-byte hdwr sectors (2199023 MB)
sdc: Write Protect is off
sdc: Mode Sense: 79 00 00 08
SCSI device sdc: drive cache: write back
sdc : very big device. try to use READ CAPACITY(16).
sdc : READ CAPACITY(16) failed.
sdc : status=1, message=00, host=0, driver=08
sdc : use 0x as device size
SCSI device sdc: 4294967296 512-byte hdwr sectors (2199023 MB)
sdc: Write Protect is off
sdc: Mode Sense: 79 00 00 08
SCSI device sdc: drive cache: write back
 sdc: unknown partition table
sd 13:0:0:1: Attached scsi disk sdc
sd 13:0:0:1: Attached scsi generic sg3 type 0


and this is what we get in SRP

scsi6 : SRP.T10:0008F104039862A4
  Vendor: SCST_BIO  Model: vdisk0Rev:  096
  Type:   Direct-Access  ANSI SCSI revision: 04
sdc : very big device. try to use READ CAPACITY(16).
SCSI device sdc: 12693355130 512-byte hdwr sectors (6498998 MB)
sdc: Write Protect is off
sdc: Mode Sense: 6b 00 10 08
SCSI device sdc: drive cache: write back w/ FUA


This looks suspiciously like a 2^32 limit somewhere.


Our exported device is

[EMAIL PROTECTED] ~]# parted /dev/sdb print

Model: Areca jrvs1 (scsi)
Disk /dev/sdb: 6500GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number  Start   End SizeFile system  Flags
 1  0.00kB  6500GB  6500GB  xfs


and this is what tgtadm reports

[EMAIL PROTECTED] ~]# tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2001-04.com.jr1-jackrabbit.small
System information:
Driver: iscsi
Status: running
I_T nexus information:
I_T nexus: 4
Initiator: iqn.1996-04.voltaire.com:01:dfaa3fd
Connection: 0
RDMA IP Address: 10.2.1.1
LUN information:
LUN: 0
Type: controller
SCSI ID: deadbeaf1:0
SCSI SN: beaf10
Size: 0
Online: No
Poweron/Reset: Yes
Removable media: No
Backing store: No backing store
LUN: 1
Type: disk
SCSI ID: deadbeaf1:1
SCSI SN: beaf11
Size: 5T
Online: Yes
Poweron/Reset: No
Removable media: No
Backing store: /dev/sdb
Account information:
ACL information:
10.2.1.1

So it looks like the LUN 1 is approximately correct (5T ???) on the 
target, and incorrect when the initiator asks for it.


Please note that I have successfully used the full 6+TB as an iSCSI 
target using the SCST-iscsi code, so I do know that the initiator works 
correctly.


Is there a source RPM/tree for this target?

Joe Landman wrote:

Hi Erez

Erez Zilber wrote:

stgt (SCSI target) is an open-source framework for storage target
drivers. It supports iSCSI over iSER among other storage target drivers.

Voltaire added a git tree for stgt that will be added to OFED 1.4:
http://www2.openfabrics.org/git/?p=~dorons/tgt.git;a=summary

Until OFED 1.4 gets released, it is possible to install the stgt RPM on
top of OFED 1.3. For more details about how to install and u

[ewg] Where have you been?

2008-02-07 Thread Lenora Woodruff
Hello! I am bored tonight. I am nice girl that would like to chat with you. 
Email me at [EMAIL PROTECTED] only, because I am using my friend's email to 
write this. Will send some of my pictures


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.3 RC4 release is available

2008-02-07 Thread Tziporet Koren
Hi, 
OFED 1.3 RC3 release is available on 
http://www.openfabrics.org/builds/ofed-1.3/release/OFED-1.3-rc4.tgz 


To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/ 
The RC5 (Gold) release is expected on February 18


Tziporet & Vlad 




Release information: 
 
Linux Operating Systems:

   - RedHat EL4 up4:   2.6.9-42.ELsmp
   - RedHat EL4 up5:   2.6.9-55.ELsmp
   - RedHat EL4 up6:   2.6.9-67.ELsmp  *
   - RedHat EL5:   2.6.18-8.el5
   - RedHat EL5 up1:   2.6.18-53.el5
   - Fedora C6:2.6.18-8.fc6*
   - SLES10:   2.6.16.21-0.8-smp
   - SLES10 SP1:   2.6.16.46-0.12-smp
   - SLES10 SP1 up1:   2.6.16.53-0.16-smp
   - OpenSuSE 10.3:2.6.22-*-*  *
   - kernel.org:   2.6.23 and 2.6.24

 * OSes that are partially tested

Systems: 
	* x86_64 
	* x86 
	* ia64 
	* ppc64 


Main Changes from OFED 1.3-RC3
=== 
* Fixed 13 Bugs (see attachment)

* MPI packages update: mvapich-1.0.0-1981.src.rpm
* Updated libraries:
 * uDAPL 2.0.6
 * libibcm 1.0.2
 * librdmacm 1.0.6* I
* IPoIB enhancements: 
 * Non-SRQ for CM mode

 * 4K MTU support
 * Enhancements to improve small messages BW

Tasks that should be completed for RC5: 
===
1. Fix critical and major bugs 
2. Update all documents


bug_id,bug_severity,op_sys,assigned_to,resolution,short_short_desc
794,normal,Other,[EMAIL PROTECTED],FIXED,Kernel panic while unload driver
883,normal,RHEL 4,[EMAIL PROTECTED],FIXED,"mvapich gets killed during alltoall, 
32nodes"
884,normal,RHEL 4,[EMAIL PROTECTED],FIXED,mvapich doesn't report non-active 
ports
893,blocker,Other,[EMAIL PROTECTED],FIXED,Dynamic library supprot is broken
892,blocker,SLES 10,[EMAIL PROTECTED],FIXED,openibd does not remove cxgb3  
module
897,critical,SLES 10,[EMAIL PROTECTED],FIXED,"traffic is jittery, send queue 
full reports from mthca"
891,critical,RHEL 4,[EMAIL PROTECTED],FIXED,ib_sa panics system when enabled
878,critical,Other,[EMAIL PROTECTED],FIXED,slow failover with bonding and 
connected mode
887,critical,All,[EMAIL PROTECTED],FIXED,IMB benchmark stuck
577,critical,All,[EMAIL PROTECTED],FIXED,"SRP multipath failover too slow 
(minutes, not seconds)"
761,major,Other,[EMAIL PROTECTED],FIXED,Poor and jittery UDP performance at 
small messages
889,minor,Other,[EMAIL PROTECTED],FIXED,Intel test stuck 
fortran-datatype-functional-MPI_Type_contiguous_idispls
888,minor,Other,[EMAIL PROTECTED],FIXED,OSU latency benchmark (old version with 
iteration and message size parameter) stuck sometime
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Eli Cohen
>
> This problem was seen on a ehca that supports SRQ.
>

Please reply how many scatter entries does ehca support when working
in SRQ mode? Also any piece of info I might need to try and mimic ehca
behaviour on Mellanox devices. I will appreciate if you can repeat the
exact sequence of actions you do to reproduce this.

thanks.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Pradeep Satyanarayana
Tziporet Koren wrote:
> Eli Cohen wrote:
>> I have tried to reproduce this but when using ib_mthca and mlx4_ib and
>> could not see this problem. Could you try to dig more into this and
>> provide more details.
>>
>>
>>   
> Please reproduce the issue on our HCAs since we do not have any ehca
> Note that Eli tried the code when using the non-SRQ path

This problem was seen on a ehca that supports SRQ.

Pradeep

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [ofa-general] [ANNOUNCE] open iSCSI over iSER target RPMis available

2008-02-07 Thread Joe Landman

Erez Zilber wrote:

* READ: 920 MB/sec
* WRITE: 850 MB/sec

Not getting anything even remotely close to this.  Are there more
details on configuration somewhere?  I followed the web page as indicated.



Are you running iSCSI over TCP or iSCSI over iSER (over InfiniBand)? Our
results are with iSER.


I followed the instructions on the web pages that were pointed to for 
iSER.  Are there updated pages?  Is there a way to tell whether or not 
the RDMA path is being used?


Thanks.

Joe



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
   http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Tziporet Koren

Eli Cohen wrote:

I have tried to reproduce this but when using ib_mthca and mlx4_ib and
could not see this problem. Could you try to dig more into this and
provide more details.


  

Please reproduce the issue on our HCAs since we do not have any ehca
Note that Eli tried the code when using the non-SRQ path

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.0.0 SRPM available

2008-02-07 Thread Pavel Shamis (Pasha)

New srpm for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_3/ (see latest.txt for the build number)
Bugfix for: 883, 884, 888, 887, 889, 893

--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update

2008-02-07 Thread Eli Cohen
> > I have downloaded the todays build mentioned above. I am still seeing the 
> > issue
> > of failing ib_destroy_cq() for the rcq mentioned yesterday.
> > 
> > Here are the steps that I follow:
> > 
> > 1. On a freshly booted system configure ib0
> > 2. Switch to connected mode ( on HCA that supports SRQ)
> > 3. ping remote interface
> > 4. modprobe -r ib_ehca
> > 5. I see the failures about ib_destroy_cq() failing and the
> > cascading failures following that (srq and pd cannot be destroyed)
> 
> The ib_destroy_qp() fails because of refcnt is not zero. On my
> system it was set to 2.
> 
> Pradeep
> 
I have tried to reproduce this but when using ib_mthca and mlx4_ib and
could not see this problem. Could you try to dig more into this and
provide more details.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [ofa-general] [ANNOUNCE] open iSCSI over iSER target RPMis available

2008-02-07 Thread Erez Zilber

> > * READ: 920 MB/sec
> > * WRITE: 850 MB/sec
>
> Not getting anything even remotely close to this.  Are there more
> details on configuration somewhere?  I followed the web page as indicated.
>

Are you running iSCSI over TCP or iSCSI over iSER (over InfiniBand)? Our
results are with iSER.

Erez
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: traffic jittery, send queue full reports from mthca driver

2008-02-07 Thread Or Gerlitz

Eli Cohen wrote:

On Thu, 2008-02-07 at 09:42 +0200, Or Gerlitz wrote:

ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 max, 
0 nreq)
ib0: failed to post zlen send

OK, Eli, taking the kernel bits from OFED-1.3-20080206-0751.tgz I don't
see these prints any more. When probing out the driver inorder to replace
it with the drop, I have got the following:

ib0: timing out; will leak address handles
ib0: ib_dealloc_pd failed

so, is it another issue or related to the room-for-zlen-in-ring-accounting fix?



does it happen on mthca or connectx? Does it happen when running iperf
in the way you reported in bugzilla?


it happened on mthca after I stopped iperf and did modprobe -r ipoib

Or


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: traffic jittery, send queue full reports from mthca driver

2008-02-07 Thread Eli Cohen

On Thu, 2008-02-07 at 09:42 +0200, Or Gerlitz wrote:
> >> ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 
> >> max, 0 nreq)
> >> ib0: failed to post zlen send
> 
> OK, Eli, taking the kernel bits from OFED-1.3-20080206-0751.tgz I don't
> see these prints any more. When probing out the driver inorder to replace
> it with the drop, I have got the following:
> 
>   ib0: timing out; will leak address handles
>   ib0: ib_dealloc_pd failed
> 
> so, is it another issue or related to the room-for-zlen-in-ring-accounting 
> fix?
> 

Or,

does it happen on mthca or connectx? Does it happen when running iperf
in the way you reported in bugzilla?

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: traffic jittery, send queue full reports from mthca driver

2008-02-07 Thread Eli Cohen

On Thu, 2008-02-07 at 09:42 +0200, Or Gerlitz wrote:
> >> ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 
> >> max, 0 nreq)
> >> ib0: failed to post zlen send
> 
> OK, Eli, taking the kernel bits from OFED-1.3-20080206-0751.tgz I don't
> see these prints any more. When probing out the driver inorder to replace
> it with the drop, I have got the following:
> 
>   ib0: timing out; will leak address handles
>   ib0: ib_dealloc_pd failed
> 
> so, is it another issue or related to the room-for-zlen-in-ring-accounting 
> fix?
> 

I am not sure but I will look into it.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] with the ipoib patches, debug prints spam the system log

2008-02-07 Thread Eli Cohen

On Thu, 2008-02-07 at 10:01 +0200, Eli Cohen wrote:
> On Thu, 2008-02-07 at 09:48 +0200, Or Gerlitz wrote:
> > Or Gerlitz wrote:
> > > You have left somehow too many... debug prints in the last patches,
> > > please clean this up. See for example how the system log after less
> > > then a minute when ipoib debug prints are opened, it has one original
> > > print ("ib0: Send unicast ARP to 0023") and all the rest are yours.
> > 
> > > Feb  6 14:39:23  kernel: ib0: posting zlen send, wrid = 4: head = 2756, 
> > > tail = 2752
> > > Feb  6 14:39:23  kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2757
> > 
> > Hi Eli,
> > 
> > Just a reminder to remove this for RC4, using last night snapshot I 
> > still see it.
> > 
> > Or.
> > 
> 
> I have to look at last night build - it should have been there already.

Sorry - it will be in the next build.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] with the ipoib patches, debug prints spam the system log

2008-02-07 Thread Eli Cohen

On Thu, 2008-02-07 at 09:48 +0200, Or Gerlitz wrote:
> Or Gerlitz wrote:
> > You have left somehow too many... debug prints in the last patches,
> > please clean this up. See for example how the system log after less
> > then a minute when ipoib debug prints are opened, it has one original
> > print ("ib0: Send unicast ARP to 0023") and all the rest are yours.
> 
> > Feb  6 14:39:23  kernel: ib0: posting zlen send, wrid = 4: head = 2756, 
> > tail = 2752
> > Feb  6 14:39:23  kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2757
> 
> Hi Eli,
> 
> Just a reminder to remove this for RC4, using last night snapshot I 
> still see it.
> 
> Or.
> 

I have to look at last night build - it should have been there already.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg