Re: [ewg] RDS problematic on RC2

2008-01-16 Thread Olaf Kirch
On Thursday 17 January 2008 04:15, Johann George wrote:
> RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting
> 
> Note that this is using RDS over IB.  Our minimal experience with the
> non-IB version of RDS was worse.  We only tried it with RC1 and it
> crashed one of the two machines almost instantly.

Oh, and if you're using RDMA - does this happen to be with qlogic HCAs?
If so, I just received a patch from Ralph Campbell with some fixes to the
way we set up out DMA mapping.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
On Wednesday 16 January 2008 22:54, Roland Dreier wrote:
> Thanks, good catch, and I applied this (except I removed the BUG_ON,
> since I don't think killing the system with minimal info available on
> how the counts got out of sync is that useful...)

Can you turn it into a rate limited printk instead? I'd prefer
some indication that things are askew over memory corruption from
dangling MRs that I have though long dead and gone.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS problematic on RC2

2008-01-16 Thread Olaf Kirch
On Thursday 17 January 2008 04:15, Johann George wrote:
> We've been testing the OFED 1.3 pre-releases on a 12 node cluster here
> at UNH-IOL.  RDS seemed largely functional (other than problems we
> were aware of) on OFED 1.3 RC1.  When we installed RC2, RDS stopped
> working.  A dmesg indicates the following message repeatedly on the

Huh, scary. It works reasonably well here, though.

> console:
> 
> RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting

That's a remote invalid request error. Were you testing with
RDMA or without? What user application were you using for testing?

> Note that this is using RDS over IB.  Our minimal experience with the
> non-IB version of RDS was worse.  We only tried it with RC1 and it
> crashed one of the two machines almost instantly.

Yes, the TCP part of RDS isn't being looked after very much, unfortunately.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
Hi Roland,

On Wednesday 16 January 2008 22:54, Roland Dreier wrote:
> However I'm a little puzzled about how this can lead to memory
> corruption in practice: the only thing that flushing FMRs should do is
> make memory keys that should no longer be in use anyway become
> invalid.  So the only effect of this fix should be to expose a bug in
> your ULP by having some RDMA operation complete with a protection
> error -- and you're not relying on that behavior in normal operation,
> are you?  What am I missing?

The corruption happened when the process that allocated the MRs went
away in the middle of the operation. We would free the MR and invalidate
- and expect the in flight RDMA to error out. RDS does not know who is
doing RDMA to or from a MR at any given time.

There is a second potential issue however.

When RDS performs an RDMA, the initiator will queue two work requests -
one for the actual RDMA, immediately followed by a normal SEND with
a RDS packet. When the consumer sees that RDS packet, it will
release the MR to which the RDMA was directed.

Is that a safe thing to do? I found the spec a little unclear on
the ordering rules. It *seems* that RDMA writes are always fencing
against subsequent operations, and RDMA reads will fence if we ask
for it. But I'm not perfectly sure whether the ordering applies
to the sending system only, or if IB also guarantees that the
RDMA will have completed when it puts the incoming message on
the completion queue at the consumer.

If there is no such guarantee, then we have a second potential issue
in RDS wrt RDMA and memory corruption.

Thanks,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS problematic on RC2

2008-01-16 Thread Vladimir Sokolovsky

Johann George wrote:

We've been testing the OFED 1.3 pre-releases on a 12 node cluster here
at UNH-IOL.  RDS seemed largely functional (other than problems we
were aware of) on OFED 1.3 RC1.  When we installed RC2, RDS stopped
working.  A dmesg indicates the following message repeatedly on the
console:

RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting

Note that this is using RDS over IB.  Our minimal experience with the
non-IB version of RDS was worse.  We only tried it with RC1 and it
crashed one of the two machines almost instantly.

Johann


Hi Johann,
Please open a bug in Bugzilla and add some info that will help in debugging 
(OS,kernel,arch,test).

Thanks,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Ado6e FotoshopCS3 Extended for MAC\XP\Vlsta 89, Retail 999 (save 909)

2008-01-16 Thread Mann Wells
type 'ezadobenow .com' in Internet Exp!orer
coreldraw graphics suite x3 - 59
symantec antivirus corporate 10 - 29
2008 microsoft office beta for mac - 79
grand theft auto: san andreas - 29
autodesk autocad lt 2008 - 69
microsoft vista ultimate - 89
adobe encore dvd 2 - 49
adobe photoshop cs2 v 9.0 - 69
google sketchup pro 6 for mac - 59
zend studio - 49
adobe audition 2.0 - 49
autodesk architectural desktop 2006 - 119
avid xpress pro 5.7 - 119
solid edge v17 - 69
adobe photoshop cs2 v 9.0 - 69
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-16 Thread Roland Dreier
 > Roland, you said that XRC API is ugly, are you going to push it upstream
 > in its present form?

That's a good question.  Since there is no 'present form' for XRC as
far as I can tell, it's hard to make a definitive answer.  Certainly I
haven't made up my mind in advance one way or another.  In addition to
seeing how the code ends up, I think the other big piece of the puzzle
is to hear from the Open MPI team and other consumers of the API and
find out how big the benefit is.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS problematic on RC2

2008-01-16 Thread Richard Frank

copying rds-dev.

Johann George wrote:

We've been testing the OFED 1.3 pre-releases on a 12 node cluster here
at UNH-IOL.  RDS seemed largely functional (other than problems we
were aware of) on OFED 1.3 RC1.  When we installed RC2, RDS stopped
working.  A dmesg indicates the following message repeatedly on the
console:

RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting

Note that this is using RDS over IB.  Our minimal experience with the
non-IB version of RDS was worse.  We only tried it with RC1 and it
crashed one of the two machines almost instantly.

Johann
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
  

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RDS problematic on RC2

2008-01-16 Thread Johann George
We've been testing the OFED 1.3 pre-releases on a 12 node cluster here
at UNH-IOL.  RDS seemed largely functional (other than problems we
were aware of) on OFED 1.3 RC1.  When we installed RC2, RDS stopped
working.  A dmesg indicates the following message repeatedly on the
console:

RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting

Note that this is using RDS over IB.  Our minimal experience with the
non-IB version of RDS was worse.  We only tried it with RC1 and it
crashed one of the two machines almost instantly.

Johann
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Roland Dreier
 > Normally, the serial numbers for flush requests and flushes
 > executed should be in sync.
 > 
 > When we decide to flush dirty MRs because there are too many of them, we
 > wake up the cleanup thread and let it do its stuff.  As a side effect, it
 > increments pool->flush_ser, which leaves it one higher than req_ser. The
 > next time the user calls ib_flush_fmr_pool, it will wake up the cleanup
 > thread, but won't wait for the flush to complete. This can cause memory
 > corruption, as the user expects the flush to have taken place.

Thanks, good catch, and I applied this (except I removed the BUG_ON,
since I don't think killing the system with minimal info available on
how the counts got out of sync is that useful...)

However I'm a little puzzled about how this can lead to memory
corruption in practice: the only thing that flushing FMRs should do is
make memory keys that should no longer be in use anyway become
invalid.  So the only effect of this fix should be to expose a bug in
your ULP by having some RDMA operation complete with a protection
error -- and you're not relying on that behavior in normal operation,
are you?  What am I missing?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] dapltest segfault after calling inet_ntoa

2008-01-16 Thread Allen Hubbe


Here is a typical command I use to run dapltest:
 dapltest -T T -s 10.1.1.202 -D OpenIB-cma \
-i 100 -t 1 -w 1 -R BE client SR 256

Starting with some of the OFED-1.3 releases, this gives me a segmentation 
fault.  I do not get a segmentation fault on OFED 1.2.5.5.


The offending lines of code seem to be at cmd/dapl_netaddr.c:136

DT_Mdep_printf ("Server Net Address: %s\n",
inet_ntoa(((struct sockaddr_in *)target->ai_addr)->sin_addr));

That section of code in OFED 1.2.5.5 is

DT_Mdep_printf ("Server Net Address: %d.%d.%d.%d\n",
(rval >>  0) & 0xff,
(rval >>  8) & 0xff,
(rval >> 16) & 0xff,
(rval >> 24) & 0xff);

The newer code seems to be written correctly.  But, the address returned 
by inet_ntoa is out of bounds of the running program.  Since that address 
should point to a static buffer, is it possible that a library isn't 
loading properly at run time?


This issue is present on several machines I am using.  There are two or 
three versions of OFED 1.3 in use, including rc2.  Can anyone else confirm 
this?  Should I submit a bug report?



Allen Hubbe <[EMAIL PROTECTED]>
Technician - UNH-IOL iWARP Consortium
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH 2/2] fmr_pool_flush didn't flush all MRs

2008-01-16 Thread Olaf Kirch
From: Olaf Kirch <[EMAIL PROTECTED]>
Subject: [fmr_pool] fmr_pool_flush didn't flush all MRs

When a FMR is released via ib_fmr_pool_unmap, the FMR usually ends up
on the free_list rather than the dirty_list (because we allow a certain
number of remappings before actually requiring a flush).

However, ib_fmr_batch_release only looks at dirty_list when flushing
out old mappings. This can lead to memory corruption as the user
expects *all* old mappings to go away.

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
---
 drivers/infiniband/core/fmr_pool.c |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Index: ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
===
--- ofa_kernel-1.3.orig/drivers/infiniband/core/fmr_pool.c
+++ ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
@@ -139,7 +139,7 @@ static inline struct ib_pool_fmr *ib_fmr
 static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 {
int ret;
-   struct ib_pool_fmr *fmr;
+   struct ib_pool_fmr *fmr, *next;
LIST_HEAD(unmap_list);
LIST_HEAD(fmr_list);
 
@@ -158,6 +158,19 @@ static void ib_fmr_batch_release(struct 
 #endif
}
 
+   /* The free_list may hold FMRs that have been put there
+* because they haven't reached the max_remap count. We want
+* to invalidate their mapping as well!
+*/
+   list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
+   if (fmr->remap_count == 0)
+   continue;
+   hlist_del_init(&fmr->cache_node);
+   fmr->remap_count = 0;
+   list_add_tail(&fmr->fmr->list, &fmr_list);
+   list_move(&fmr->list, &unmap_list);
+   }
+
list_splice(&pool->dirty_list, &unmap_list);
INIT_LIST_HEAD(&pool->dirty_list);
pool->dirty_len = 0;

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
From: Olaf Kirch <[EMAIL PROTECTED]>
Subject: [fmr_pool] fmr_pool flush serials can get out of sync

Normally, the serial numbers for flush requests and flushes
executed should be in sync.

When we decide to flush dirty MRs because there are too many of them, we
wake up the cleanup thread and let it do its stuff.  As a side effect, it
increments pool->flush_ser, which leaves it one higher than req_ser. The
next time the user calls ib_flush_fmr_pool, it will wake up the cleanup
thread, but won't wait for the flush to complete. This can cause memory
corruption, as the user expects the flush to have taken place.

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
---
 drivers/infiniband/core/fmr_pool.c |   16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

Index: ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
===
--- ofa_kernel-1.3.orig/drivers/infiniband/core/fmr_pool.c
+++ ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
@@ -182,8 +182,7 @@ static int ib_fmr_cleanup_thread(void *p
struct ib_fmr_pool *pool = pool_ptr;
 
do {
-   if (pool->dirty_len >= pool->dirty_watermark ||
-   atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) 
< 0) {
+   if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) 
< 0) {
ib_fmr_batch_release(pool);
 
atomic_inc(&pool->flush_ser);
@@ -194,8 +193,7 @@ static int ib_fmr_cleanup_thread(void *p
}
 
set_current_state(TASK_INTERRUPTIBLE);
-   if (pool->dirty_len < pool->dirty_watermark &&
-   atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) 
>= 0 &&
+   if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) 
>= 0 &&
!kthread_should_stop())
schedule();
__set_current_state(TASK_RUNNING);
@@ -397,6 +395,7 @@ EXPORT_SYMBOL(ib_destroy_fmr_pool);
  */
 int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
 {
+   int flush_count = atomic_read(&pool->flush_ser);
int serial = atomic_inc_return(&pool->req_ser);
 
wake_up_process(pool->thread);
@@ -405,6 +404,9 @@ int ib_flush_fmr_pool(struct ib_fmr_pool
 atomic_read(&pool->flush_ser) - serial >= 
0))
return -EINTR;
 
+   flush_count = atomic_read(&pool->flush_ser) - flush_count;
+   BUG_ON(flush_count == 0);
+
return 0;
 }
 EXPORT_SYMBOL(ib_flush_fmr_pool);
@@ -511,8 +513,10 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr
list_add_tail(&fmr->list, &pool->free_list);
} else {
list_add_tail(&fmr->list, &pool->dirty_list);
-   ++pool->dirty_len;
-   wake_up_process(pool->thread);
+   if (++pool->dirty_len >= pool->dirty_watermark) {
+   atomic_inc(&pool->req_ser);
+   wake_up_process(pool->thread);
+   }
}
}
 

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Issues with fmr_pool

2008-01-16 Thread Olaf Kirch
Hi all,

I've been debugging a memory corruption in the RDS zerocopy code for
the past several days - basically, when we tear down a socket and destroy
any existing MRs, RMDA writes that are in progress continue well after
we've freed the MR and flushed the fmr_pool.

After chasing several schools of red herrings I think I understand the
problem. I believe there are two bugs in the fmr_pool code.

The first bug is this:

The fmr_pool has a per pool cleanup thread, which gets woken in two cases.
One, when there are too many FMRs on the dirty_list, and two, when the
user explicitly asked for a flush.

Now, ib_flush_fmr_pool synchronizes with the cleanup thread using two
atomic counters - one is a request serial number, which gets bumped
by ib_flush_fmr_pool, and the other is the flush serial number, which
gets incremented whenever the cleanup pool actually flushes something.
When the two are equal, we've flushed everything, and the cleanup thread
can go back to sleep.

Now the bad thing is, the two can get out of sync. When there are
too many FMRs on the dirty list, the cleanup thread will perform a
flush as well, and bump the flush serial number. The next time around
someone calls ib_flush_fmr_pool, the request serial number is incremented
and *is now equal* to the flush serial number - and nothing is flushed
at all.

The second bug (or maybe it's just a misunderstanding on my part) has
far worse consequences.

When we release a FMR using ib_fmr_pool_unmap, it will do one of two
things. If the fmr's remap_count is less than max_remaps, it will
be added to the free_list right away. If it exceeds max_remaps, it
will be added to the dirty_list.

Now when the user calls ib_flush_fmr_pool, it will only inspect the
dirty list, but leave the free_list alone. So all the while we *think*
we have invalidated all FMRs freed previously, most of them will stay
active because they're not inspected *at all*. So ib_flush_fmr_pool does
nothing 31 out of 32 times (32 is the default max_remaps value).

I will post two patches for these issues in follow-up emails. In general
however I wonder if the fmr_pool interface is really optimal. The major
concern I have is that the whole page pinning, mapping and unmapping
business is the caller's responsibility, but we don't know when the
underlying MR really goes away. So in order to be on the safe side,
the caller has to keep any pages mapped and pinned until the next
call to flush_fmr_pool. IMHO it would be very useful if there was a
callback function that lets you know that a particular MR was
zapped. I guess something like this could be engineered using the
flush_function, but that's really a very spartan interface, and requires
you to keep your deceased MRs on yet another list for later disposal.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-16 Thread Scott Weitzenkamp (sweitzen)
Or,

I don't see /sbin/call_ifenslave in my OFED-1.3-20080115-0600 ib-bonding
package.

[EMAIL PROTECTED] ~]# uname -a
Linux svbu-qa1850-1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007
x86_64 x8
6_64 x86_64 GNU/Linux
[EMAIL PROTECTED] ~]# rpm -ql ib-bonding
/lib/modules/2.6.9-55.ELsmp/updates/kernel/drivers/net/bonding/bonding.k
o
/usr/bin/ib-bond
/usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt

Scott

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Or Gerlitz
> Sent: Wednesday, January 16, 2008 2:32 AM
> To: [EMAIL PROTECTED]
> Cc: Moni Levy; EWG
> Subject: [ewg] Re: Can you send explanation how to work with 
> bonding and the standard bonding setting
> 
> Tziporet Koren wrote:
> > We wish to test in this way and also add this explanation  
> to OFED docs
> 
> sure, its all documented in the ib-bonding.txt file that 
> comes with the 
> ib-bonding package, so I suggest just to add a note in the 
> docs pointing 
> to it.
> 
> Or.
> 
> > # rpm -ql ib-bonding
> > /etc/sysconfig/network-scripts/ifup-pre-ibbond
> > 
> /lib/modules/2.6.9-55.ELsmp/updates/kernel/drivers/net/bonding
> /bonding.ko
> > /sbin/call_ifenslave
> > /usr/bin/ib-bond
> > /usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt
> > /usr/share/doc/ib-bonding-0.9.0/redhat4-initscripts.patch
> > /usr/share/doc/ib-bonding-0.9.0/series
> > 
> 
> 
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] OFED 1.3 RC2 release is available

2008-01-16 Thread Scott Weitzenkamp (sweitzen)
Isn't RHEL4 up6 supported, too?

I have added Version 1.3rc2 to bugzilla.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems


 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Tziporet Koren
> Sent: Wednesday, January 16, 2008 8:23 AM
> To: ewg@lists.openfabrics.org
> Cc: [EMAIL PROTECTED]
> Subject: [ofa-general] OFED 1.3 RC2 release is available
> 
> 
> Hi, 
> OFED 1.3 RC2 release is available on 
> http://www.openfabrics.org/builds/ofed-1.3/release/OFED-1.3-rc2.tgz
> 
> To get BUILD_ID run ofed_info 
> 
> Please report any issues in bugzilla https://bugs.openfabrics.org/ 
> The RC3 release is expected on January 30
> 
> Tziporet & Vlad 
> 
> 
> 
> ==
> ==
> 
> Release information: 
>  
> OS support: 
>   Novell: 
>   - SLES10 
>   - SLES10 SP1 and up1 
>   Redhat: 
>   - Redhat EL4 up4 and up5
>   - Redhat EL5 and up1 
>   kernel.org: 
>   - 2.6.23 and 2.6.24-rc5
> 
> Compilation only checks:
>   - Fedora Core 6
>   - openSuSE 10.3
>   - Redhat EL4 up6
>  
> Systems: 
>   * x86_64 
>   * x86 
>   * ia64 
>   * ppc64 
> 
> Main Changes from OFED 1.3-RC1
> === 
> * Fixed 21 Bugs (see attachment)
> * Added support for RHEL4.6 and openSuSE10.3 
> * Install: Added vendor's pre/post install scripts support
> * MPI packages update: 
>   *   openmpi-1.2.5-1
>   *   mvapich-1.0.0-1844
>   *   mvapich2-1.0.1-2
>   *   Added support for Qlogic new HCA: 
> 
> Specific module changes:
> 
> ULPs:
> -
> SDP:
> * Executing netperf with TCP_CORK enabled never ends
> * poll() always returns POLLOUT on non-blocking socket
> * SDP connect() only allows AF_INET (2), not AF_INET_SDP (27)
> iSER: 
> * Separate open-iscsi and iSER patches for different distros
> IPoIB:
> * Fix IPOIB LSO support: turn on the QP_CREATE_LSO flag to let the
> hw layer know and take proper actions
> 
> Libraries:
> --
> libibverbs: 
> * Preserve backwards binary compatibility.
> librdmacm:
> * Release 1.0.5
> 
> Utilities:
> --
> Opensm: 
> * Fixing core dump in fat-tree routing
> * Use valid pkey index value for gsi mads
> * osm_sa_slvl_record: fix overflow crash
> * Fixing a seg. fault in processing mcast groups
> * mcast mgr improvements
> * QoS policy - increased stability
> mstflint: 
> * Convert project to autoconf tools
> Performance tests: 
> * Fix bug rdma_lat.c. Messages up to 400 bytes will be sent Inline
> * Added multicast support to ib_send_bw and ib_send_lat tests
> Diagnostic tools:
> * Enhanced saquery to support: 
>   VLArb and PKey Table Records
>   Ports with LinkRecord query
>   SL2VLTableRecord attribute
>   Attribute names support
> * checkerrors: fix port errors count and query only single ports
> in CAs
> ibutils:
> * vsGetGeneralInfo function now dumps the correct data
> * Fixed stack-smashing bug in ibis gid typemaps, which could cause
> crashes on ppc64
> 
> Low level drivers:
> --
> mlx4: 
> * max_recv_wr must be > 0 for non-SRQ QPs. 
> * Fix the value of the pkey_index in the completion to get a valid
> value for GSI QPs.
> * Do not use memcpy when copying to the BlueFlame buffer
> * Fix pkey_index processing in cq polling
> mthca: 
> * Ensure an Rx WQE is in memory before linking
> cxgb: 
> * library release 1.1.2
> ipath:
> * Added support for the new HCA iba7220
> Nes: 
> * fix virtual WQ mapping and size
> 
> 
> Tasks that should be completed for RC3: 
> == 
> 1. XRC enhanced API 
> 2. Fix bugs 
> 
> 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.3 RC2 release is available

2008-01-16 Thread Tziporet Koren

Hi, 
OFED 1.3 RC2 release is available on 
http://www.openfabrics.org/builds/ofed-1.3/release/OFED-1.3-rc2.tgz

To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/ 
The RC3 release is expected on January 30

Tziporet & Vlad 





Release information: 
 
OS support: 
Novell: 
- SLES10 
- SLES10 SP1 and up1 
Redhat: 
- Redhat EL4 up4 and up5
- Redhat EL5 and up1 
kernel.org: 
- 2.6.23 and 2.6.24-rc5

Compilation only checks:
- Fedora Core 6
- openSuSE 10.3
- Redhat EL4 up6
 
Systems: 
* x86_64 
* x86 
* ia64 
* ppc64 

Main Changes from OFED 1.3-RC1
=== 
*   Fixed 21 Bugs (see attachment)
*   Added support for RHEL4.6 and openSuSE10.3 
*   Install: Added vendor's pre/post install scripts support
*   MPI packages update: 
*   openmpi-1.2.5-1
*   mvapich-1.0.0-1844
*   mvapich2-1.0.1-2
*   Added support for Qlogic new HCA: 

Specific module changes:

ULPs:
-
SDP:
*   Executing netperf with TCP_CORK enabled never ends
*   poll() always returns POLLOUT on non-blocking socket
*   SDP connect() only allows AF_INET (2), not AF_INET_SDP (27)
iSER: 
*   Separate open-iscsi and iSER patches for different distros
IPoIB:
*   Fix IPOIB LSO support: turn on the QP_CREATE_LSO flag to let the
hw layer know and take proper actions

Libraries:
--
libibverbs: 
*   Preserve backwards binary compatibility.
librdmacm:
*   Release 1.0.5

Utilities:
--
Opensm: 
*   Fixing core dump in fat-tree routing
*   Use valid pkey index value for gsi mads
*   osm_sa_slvl_record: fix overflow crash
*   Fixing a seg. fault in processing mcast groups
*   mcast mgr improvements
*   QoS policy - increased stability
mstflint: 
*   Convert project to autoconf tools
Performance tests: 
*   Fix bug rdma_lat.c. Messages up to 400 bytes will be sent Inline
*   Added multicast support to ib_send_bw and ib_send_lat tests
Diagnostic tools:
*   Enhanced saquery to support: 
VLArb and PKey Table Records
Ports with LinkRecord query
SL2VLTableRecord attribute
Attribute names support
*   checkerrors: fix port errors count and query only single ports
in CAs
ibutils:
*   vsGetGeneralInfo function now dumps the correct data
*   Fixed stack-smashing bug in ibis gid typemaps, which could cause
crashes on ppc64

Low level drivers:
--
mlx4: 
*   max_recv_wr must be > 0 for non-SRQ QPs. 
*   Fix the value of the pkey_index in the completion to get a valid
value for GSI QPs.
*   Do not use memcpy when copying to the BlueFlame buffer
*   Fix pkey_index processing in cq polling
mthca: 
*   Ensure an Rx WQE is in memory before linking
cxgb: 
*   library release 1.1.2
ipath:
*   Added support for the new HCA iba7220
Nes: 
*   fix virtual WQ mapping and size


Tasks that should be completed for RC3: 
== 
1. XRC enhanced API 
2. Fix bugs 



rc2-fixed-bugs.csv
Description: rc2-fixed-bugs.csv
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [GIT PULL] ~sashak/management.git

2008-01-16 Thread Vladimir Sokolovsky

Sasha Khapyorsky wrote:

Hi Vlad,

Please pull recent ofed_1_3 branch of ~sashak/management.git.

The changes are:


Sasha Khapyorsky (4):
  infiniband-diags/configure.in: complib doesn't have opensm dependencies 
anymore
  opensm/perfmgr: use pkey at index 0
  libibumad: increase the version of the library
  libibumad/man: umad_get_pkey man page

Yevgeny Kliteynik (2):
  opensm/osm_ucast_ftree.c: fixing coredump in fat-tree routing
  opensm/osm_ucast_ftree.c: cosmetics in log messages

Thanks,
Sasha



Done,

Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-16 Thread Or Gerlitz

Tziporet Koren wrote:

We wish to test in this way and also add this explanation  to OFED docs


sure, its all documented in the ib-bonding.txt file that comes with the 
ib-bonding package, so I suggest just to add a note in the docs pointing 
to it.


Or.


# rpm -ql ib-bonding
/etc/sysconfig/network-scripts/ifup-pre-ibbond
/lib/modules/2.6.9-55.ELsmp/updates/kernel/drivers/net/bonding/bonding.ko
/sbin/call_ifenslave
/usr/bin/ib-bond
/usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt
/usr/share/doc/ib-bonding-0.9.0/redhat4-initscripts.patch
/usr/share/doc/ib-bonding-0.9.0/series




___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg