Re: [ofa-general] PATCH] IB/ipoib: ignore membership bit when looking for a P_Key in the table

2007-07-24 Thread Or Gerlitz

Hal Rosenstock wrote:
On 7/23/07, *Moni Shoua* [EMAIL PROTECTED] 
Hal Rosenstock wrote:

 
  -   if (pkey == tmp_pkey) {
  +   if ((pkey  0x7fff) == (tmp_pkey  0x7fff)) {
 
 



  Wouldn't this allow 2 limited PKeys to match though ?



Hi Hal,
Can you please explain what do you mean? Perhaps by example?


Two Pkeys which have their full memebership bit off (0x8000). Two 
limited members are not allowed to talk with each other.


Hal,

ib_find_pkey() is the buddy of ib_find_cached_pkey() which is in the 
stack from day one. Now, ib_find_cached_pkey does some abstraction where 
it masks out the membership bit, so pkeys are matched in 15 bit fashion.


Indeed, the overall design of the IB stack wrt to partial membership in 
a partition is not perfect nor final. I don't see why this masking off 
makes things worse then they could have been without it.


As you know, as some changes need to be done in the IB spec and the 
IPoIB RFC, I am personally holding off with suggesting changes/fixes 
till the spec is done, this is per the approach expressed by you and Sean.


Or.

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] ofa_1_2_kernel 20070724-0100 daily build status

2007-07-24 Thread Vladimir Sokolovsky
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_2/linux-2.6.git
git_branch: ofed_1_2

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod 
--with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod 
--with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ppc64 with linux-2.6.18-8.el5
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-8.el5

Failed:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: IPoIB path caching

2007-07-24 Thread Or Gerlitz

Sean Hefty wrote:

What I have in mind is that IPoIB must not use cached IB path info.


If the IB stack has path caching which is in the default flow of 
requesting a path record, it should provide an API (eg flag to the 
function through which one does path query) to request a non cached path.


Argh!  This was the original design.  I believe the current design is a 
better approach.  The ULP shouldn't care whether the PR is cached or not 
- only that it's usable.


Linux has a quite sophisticated mechanism to maintain / cache / probe / 
invalidate / update the network stack L2 neighbour info.


Stating that although the neighbour cache state machine decided to 
update/delete a neighbour it is just correct by design for IPoIB to use 
 cached IB L2 info is somehow moving too fast I think, some discussion 
is needed here.


My basic thought is that for IPoIB its better to never use cached path 
then to always use cached path. But! maybe there's a way in the middle 
here, lets think. This is what I was referring to when saying almost 
always.


For example, in the Voltaire gen1 stack we had an ib arp module which 
was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This 
module managed some sort of path cache, were IPoIB was always asking for 
non-cached path and other ULPs were willing to get cached path.


The design I was thinking to suggest for IPoIB is to almost always use 
this API since this policy makes the implementation consistent with 
the decisions made by the network stack neighbour cache


This defeats one of the benefit of caching, which is using a single 
GetTable query, versus literally hundreds or thousands of Get queries. 
Consider that constant all-to-all communication using IPoIB between 1024 
ports, with a 15 minute ARP table timeout would hit the SA with close to 
600 queries per second.


If the cache comes to serve all-to-all MPI jobs and practically with IB, 
to get MPI performance (specifically latency) people would --not-- be 
using IPoIB for their MPI jobs since they want kernel AND net-stack 
bypass, it does make sense to use non-cached path in IPoIB if we agree 
that design-wise its the the correct approach.


While I agree that there's the potential for a problem, given that IPoIB 
has always cached PRs and no one has reported problems, I think we're 
overstating the likelihood of issues occurring in practice.  Even the SA 
caches the path data -- getting a PR from the SA doesn't provide any 
additional guarantees.


I am not with you... I would expect an SA implementation to invalid / 
recompute the relevant data structures associated with each change in 
the fabric and get a trap for each change.


Or.


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration

2007-07-24 Thread Hal Rosenstock

On 7/23/07, Sasha Khapyorsky [EMAIL PROTECTED] wrote:


Hi Eitan,

On 20:59 Mon 23 Jul , Eitan Zahavi wrote:
 Hi Sasha, Hal,

 I think I have an idea:

 Since this is a specific switch that reported ChangeBit or Trap why
 can't we just qualify that there was no change in the switch setup?

The ChangeBit seems to be good start point - then OpenSM will query all
switch ports PortInfo anyway and if for all ports PortState is = INIT
(and at least for one port it is = INIT), it means that this switch was
rebooted/reinitialized.

And for single port PortState drop to = INIT should indicate
reinitialization.

Seems correct?



Wouldn't this be all ports in INIT indicate reset of switch ?

-- Hal


We could send PortInfo, SwitchInfo,

SwitchInfo is queried at each light sweep, PortInfo's if ChangeBit is
set. Guess we are ok with it even now.

 LFT, MFT, SL2VL, VLArb, PKey queries
 and make sure no change from previous state. Or we could simply enforce
 last state by sending it over again ...

I think we could want to re-read PKey tables in order to preserve
existing PKey indices and just to flush (overwrite with new settings)
LFT, MFT, SL2VL, VLArb tables. Reasonable?

Sasha

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH] libibmad: Fixed a name of a field in SwitchInfo to the right name

2007-07-24 Thread Dotan Barak
Fixed a name of a field in SwitchInfo to the right name.

Signed-off-by: Dotan Barak [EMAIL PROTECTED]

---

Index: connectx_user/src/userspace/management/libibmad/src/fields.c
===
--- connectx_user.orig/src/userspace/management/libibmad/src/fields.c   
2007-07-22 16:34:02.0 +0300
+++ connectx_user/src/userspace/management/libibmad/src/fields.c
2007-07-24 13:58:41.0 +0300
@@ -193,7 +193,7 @@ ib_field_t ib_mad_f [] = {
[IB_SW_PARTITION_ENF_INB_F] {BITSOFFS(128, 1), InboundPartEnf, 
mad_dump_uint},
[IB_SW_PARTITION_ENF_OUTB_F]{BITSOFFS(129, 1), OutboundPartEnf, 
mad_dump_uint},
[IB_SW_FILTER_RAW_INB_F]{BITSOFFS(130, 1), FilterRawInbound, 
mad_dump_uint},
-   [IB_SW_FILTER_RAW_OUTB_F]   {BITSOFFS(131, 1), FilterRawInbound, 
mad_dump_uint},
+   [IB_SW_FILTER_RAW_OUTB_F]   {BITSOFFS(131, 1), FilterRawOutbound, 
mad_dump_uint},
[IB_SW_ENHANCED_PORT0_F]{BITSOFFS(132, 1), EnhancedPort0, 
mad_dump_uint},
 
/*
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Bug in inline sends with sge_num 0 in libmlx4

2007-07-24 Thread Gleb Natapov
Hi,

 There is a bug in mlx4_post_send(). A data that is sent inline and
consists from multiple small sges isn't copied properly into wqe.
The following patch fixes it for me.

Signed-off-by: Gleb Natapov [EMAIL PROTECTED]

diff --git a/src/qp.c b/src/qp.c
index 66ee309..83a4fd4 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -288,6 +288,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr 
*wr,
memcpy(wqe, addr, len);
wqe += len;
seg_len += len;
+   off += len;
}
 
if (seg_len) {
--
Gleb.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] RE: OFA website edits

2007-07-24 Thread Vladimir Sokolovsky
 
  I would like to propose adding project directories under
  http://www.openfabrics.org/downloads/  where appropriate and give
  maintainers access. For example:
 
  http://www.openfabrics.org/downloads/verbs (rdreier)
  http://www.openfabrics.org/downloads/rdmacm (shefty)
  http://www.openfabrics.org/downloads/dapl (ardavis)
  http://www.openfabrics.org/downloads/management (sashak)
  http://www.openfabrics.org/downloads/OFED (vlad)
  http://www.openfabrics.org/downloads/WinOF (ardavis)
  http://www.openfabrics.org/downloads/archives (vlad) ??
  etc...
 
  Each of these would contain a README that details the contents of
the
  directory along with WEB_README that provides a short description
for
  the webpage. Jeff could then automatically parse for directories
under
  downloads and if it contains WEB_README add a webpage link to the
  directory along with the short description.
 

Looks good for me.

Regards,
Vladimir
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Bug in inline sends with sge_num 0 in libmlx4

2007-07-24 Thread Jack Morgenstein
On Tuesday 24 July 2007 15:14, Gleb Natapov wrote:
 Hi,
 
  There is a bug in mlx4_post_send(). A data that is sent inline and
 consists from multiple small sges isn't copied properly into wqe.
 The following patch fixes it for me.
 
 Signed-off-by: Gleb Natapov [EMAIL PROTECTED]
 
 diff --git a/src/qp.c b/src/qp.c
 index 66ee309..83a4fd4 100644
 --- a/src/qp.c
 +++ b/src/qp.c
 @@ -288,6 +288,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct 
 ibv_send_wr *wr,
   memcpy(wqe, addr, len);
   wqe += len;
   seg_len += len;
 + off += len;
   }
  
   if (seg_len) {

Good catch! This patch is correct.
Roland?

- Jack
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] libibmad: Fixed a name of a field in SwitchInfo to the right name

2007-07-24 Thread Sasha Khapyorsky
On 14:32 Tue 24 Jul , Dotan Barak wrote:
 Fixed a name of a field in SwitchInfo to the right name.
 
 Signed-off-by: Dotan Barak [EMAIL PROTECTED]

Applied. Thanks.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Hal Rosenstock

Hi,

This is what starts off as a minor issue and I know it has been discussed
it somewhat in the past:

Putting a loopback connector on a (switch) link causes OpenSM to indicate
duplicated GUID error 0D18 as follows:

__osm_ni_rcv_set_links
{
...
 /*
When there are only two nodes with exact same guids (connected
back
to back) - the previous check for duplicated guid will not
catch
them. But the link will be from the port to itself...
Enhanced Port 0 is an exception to this
 */
 if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid)

 (port_num == p_ni_context-port_num) 
 (port_num != 0))
 {
   osm_log( p_rcv-p_log, OSM_LOG_ERROR,
__osm_ni_rcv_set_links: ERR 0D18: 
Duplicate GUID found by link from a port to itself:
node 0x% PRIx64 , port number 0x%X\n,
cl_ntoh64( osm_node_get_node_guid( p_node ) ),
port_num );
...

So this occurs over and over and over and fills the log with the same spew.
This should be improved IMO.

Is this really a fatal condition ? Doesn't seem like it should be to me.

Also, OpenSM can ride this out with -y (stay on fatal) but is that safe
for this condition ?

Seems like something like an extra loopback bit should be added to some port
structure which should cause these links to be ignored. This bit would then
be reset when the peer is now longer itself.

Also, is there a relationship of this with the 12x/duplicated GUID code ?

Thanks.

-- Hal
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Eitan Zahavi
Hi Hal,
 
What is this loopback connector used for?
Does not seem to me like a very useful thing to do.
Anyway, if it is not a production environment we could add a debug
mode (-d flag option) to ignore this check.
 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 5:31 PM
To: OpenFabrics General
Cc: Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik
Subject: OpenSM detection of duplicated GUIDs on loopback


Hi,
 
This is what starts off as a minor issue and I know it has
been discussed it somewhat in the past: 
 
Putting a loopback connector on a (switch) link causes OpenSM to
indicate duplicated GUID error 0D18 as follows:

__osm_ni_rcv_set_links
{
...
  /*
 When there are only two nodes with exact same guids
(connected back 
 to back) - the previous check for duplicated guid
will not catch
 them. But the link will be from the port to
itself...
 Enhanced Port 0 is an exception to this
  */ 
  if ((osm_node_get_node_guid( p_node ) ==
p_ni_context-node_guid) 
  (port_num == p_ni_context-port_num) 
  (port_num != 0))
  {
osm_log( p_rcv-p_log, OSM_LOG_ERROR, 
 __osm_ni_rcv_set_links: ERR 0D18: 
 Duplicate GUID found by link from a port
to itself:
 node 0x% PRIx64 , port number 0x%X\n, 
 cl_ntoh64( osm_node_get_node_guid( p_node )
),
 port_num );
...

So this occurs over and over and over and fills the log with the
same spew. This should be improved IMO. 

Is this really a fatal condition ? Doesn't seem like it should
be to me. 
 
Also, OpenSM can ride this out with -y (stay on fatal) but is
that safe for this condition ?
 
Seems like something like an extra loopback bit should be added
to some port structure which should cause these links to be ignored.
This bit would then be reset when the peer is now longer itself. 

Also, is there a relationship of this with the 12x/duplicated
GUID code ? 
 
Thanks.
 
-- Hal

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Hal Rosenstock

Hi Eitan,

On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:


 *Hi Hal,*
**
*What is this loopback connector used for?*
*Does not seem to me like a very useful thing to do.*



Perhaps not but no reason OpenSM can't handle this more gracefully.

*Anyway, if it is not a production environment we could add a debug mode

(-d flag option) to ignore this check.*



Why would a separate flag be needed ?

-- Hal




*Eitan Zahavi***
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


 --
*From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
*Sent:* Tuesday, July 24, 2007 5:31 PM
*To:* OpenFabrics General
*Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik
*Subject:* OpenSM detection of duplicated GUIDs on loopback


 Hi,

This is what starts off as a minor issue and I know it has been
discussed it somewhat in the past:

Putting a loopback connector on a (switch) link causes OpenSM to indicate
duplicated GUID error 0D18 as follows:

__osm_ni_rcv_set_links
{
...
  /*
 When there are only two nodes with exact same guids
(connected back
 to back) - the previous check for duplicated guid will not
catch
 them. But the link will be from the port to itself...
 Enhanced Port 0 is an exception to this
  */
  if ((osm_node_get_node_guid( p_node ) ==
p_ni_context-node_guid) 
  (port_num == p_ni_context-port_num) 
  (port_num != 0))
  {
osm_log( p_rcv-p_log, OSM_LOG_ERROR,
 __osm_ni_rcv_set_links: ERR 0D18: 
 Duplicate GUID found by link from a port to itself:
 node 0x% PRIx64 , port number 0x%X\n,
 cl_ntoh64( osm_node_get_node_guid( p_node ) ),
 port_num );
...

So this occurs over and over and over and fills the log with the same
spew. This should be improved IMO.

Is this really a fatal condition ? Doesn't seem like it should be to me.

Also, OpenSM can ride this out with -y (stay on fatal) but is that safe
for this condition ?

Seems like something like an extra loopback bit should be added to some
port structure which should cause these links to be ignored. This bit would
then be reset when the peer is now longer itself.

Also, is there a relationship of this with the 12x/duplicated GUID code ?

Thanks.

-- Hal


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
 [...]
 But I also see a serious problem with addressing: basically
 git tracks content. It's not designed to track a bush
 of branches taken together.  For example, take tagging:
 tag namespace is global, so you can not have the same
 tag point at multiple branches at the same time.

agreed.  however, the way we use git, with the
location of the git DB as the tag, it's not
really a problem in practice.  but tagging each
branch separately is indeed a PITA...

 anyway, what do you think?  is there anyway i could
 convince you to dump the backport patches and put
 all the backports in branches?  i'm willing to do the
 legwork if you see value...
 
 Can you publish the scripts and/or the tree?
 I think we can start by just running the scripts nightly,
 making it possible for people to view backport history
 with gitview.

i've attached the script that i'm using to compare
the trees, but it's a total hack.  it doesn't keep
the patch history.  that would not be too hard to
do i guess -- if there's interest...

to run the script:

cp attached files here...
$ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel
$ cd ofed_kernel
$ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done

now you'll have a bunch of backport-2.6.xxx branches...

arthur
2.6.5_sles9_sp3
2.6.9_U2
2.6.9_U3
2.6.9_U4
2.6.9_U5
2.6.11_FC4
2.6.11
2.6.12
2.6.13_suse10_0_u
2.6.13
2.6.14
2.6.15_ubuntu606
2.6.15
2.6.16_sles10
2.6.16_sles10_sp1
2.6.16
2.6.17
2.6.18_FC6
2.6.18
2.6.19
2.6.20
2.6.21
2.6.22


create-backport.sh
Description: Bourne shell script
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Eitan Zahavi
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 5:53 PM
To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated GUIDs on loopback



Hi Eitan,


On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: 

Hi Hal,
 
What is this loopback connector used for?
Does not seem to me like a very useful thing to do.

 
Perhaps not but no reason OpenSM can't handle this more
gracefully.


Anyway, if it is not a production environment we could
add a debug mode (-d flag option) to ignore this check.

 
Why would a separate flag be needed ?
[EZ] Since I do not see any other solution for the SM  to know
it is really a loop back plug rather then two devices with same GUID
connected back to back ...
 
-- Hal


 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 5:31 PM
To: OpenFabrics General
Cc: Sasha Khapyorsky; Eitan Zahavi; Yevgeny
Kliteynik
Subject: OpenSM detection of duplicated GUIDs on
loopback

 

Hi,
 
This is what starts off as a minor issue and I
know it has been discussed it somewhat in the past: 
 
Putting a loopback connector on a (switch) link
causes OpenSM to indicate duplicated GUID error 0D18 as follows:

__osm_ni_rcv_set_links
{
...
  /*
 When there are only two nodes with
exact same guids (connected back 
 to back) - the previous check for
duplicated guid will not catch
 them. But the link will be from the
port to itself...
 Enhanced Port 0 is an exception to
this
  */ 
  if ((osm_node_get_node_guid( p_node )
== p_ni_context-node_guid) 
  (port_num ==
p_ni_context-port_num) 
  (port_num != 0))
  {
osm_log( p_rcv-p_log,
OSM_LOG_ERROR, 
 __osm_ni_rcv_set_links:
ERR 0D18: 
 Duplicate GUID found by
link from a port to itself:
 node 0x% PRIx64 , port
number 0x%X\n, 
 cl_ntoh64(
osm_node_get_node_guid( p_node ) ),
 port_num );
...

So this occurs over and over and over and fills
the log with the same spew. This should be improved IMO. 

Is this really a fatal condition ? Doesn't seem
like it should be to me. 
 
Also, OpenSM can ride this out with -y (stay
on fatal) but is that safe for this condition ?
 
Seems like something like an extra loopback bit
should be added to some port structure which should cause these links to
be ignored. This bit would then be reset when the peer is now longer
itself. 

Also, is there a relationship of this with the
12x/duplicated GUID code ? 
 
Thanks.
 
-- Hal


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Hal Rosenstock

On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:


 *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
*Sent:* Tuesday, July 24, 2007 5:53 PM
*To:* Eitan Zahavi
*Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
*Subject:* Re: OpenSM detection of duplicated GUIDs on loopback



Hi Eitan,

On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:

  *Hi Hal,*
 **
 *What is this loopback connector used for?*
 *Does not seem to me like a very useful thing to do.*

**
Perhaps not but no reason OpenSM can't handle this more gracefully.

 *Anyway, if it is not a production environment we could add a debug
 mode (-d flag option) to ignore this check.*

**
Why would a separate flag be needed ?
*[EZ] Since I do not see any other solution for the SM  to know it is
really a loop back plug rather then two devices with same GUID connected
back to back ...*



Technically, this should only occur when looped back and not two devices
with same GUID as GUID == globally unique and a duplication indicates a
manufacturing issue.

Anyhow, can't these be treated the same (and handled more gracefully)
without an additional option/flag ?

-- Hal



-- Hal

 **

 *Eitan Zahavi***
 Senior Engineering Director, Software Architect
 Mellanox Technologies LTD
 Tel:+972-4-9097208
 Fax:+972-4-9593245
 P.O. Box 586 Yokneam 20692 ISRAEL


  --
 *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
 *Sent: *Tuesday, July 24, 2007 5:31 PM
 *To:* OpenFabrics General
 *Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik
 *Subject:* OpenSM detection of duplicated GUIDs on loopback


  Hi,

 This is what starts off as a minor issue and I know it has been
 discussed it somewhat in the past:

 Putting a loopback connector on a (switch) link causes OpenSM to
 indicate duplicated GUID error 0D18 as follows:

 __osm_ni_rcv_set_links
 {
 ...
   /*
  When there are only two nodes with exact same guids
 (connected back
  to back) - the previous check for duplicated guid will not
 catch
  them. But the link will be from the port to itself...
  Enhanced Port 0 is an exception to this
   */
   if ((osm_node_get_node_guid( p_node ) ==
 p_ni_context-node_guid) 
   (port_num == p_ni_context-port_num) 
   (port_num != 0))
   {
 osm_log( p_rcv-p_log, OSM_LOG_ERROR,
  __osm_ni_rcv_set_links: ERR 0D18: 
  Duplicate GUID found by link from a port to
 itself:
  node 0x% PRIx64 , port number 0x%X\n,
  cl_ntoh64( osm_node_get_node_guid( p_node ) ),
  port_num );
 ...

 So this occurs over and over and over and fills the log with the same
 spew. This should be improved IMO.

 Is this really a fatal condition ? Doesn't seem like it should be to me.


 Also, OpenSM can ride this out with -y (stay on fatal) but is that
 safe for this condition ?

 Seems like something like an extra loopback bit should be added to some
 port structure which should cause these links to be ignored. This bit would
 then be reset when the peer is now longer itself.

 Also, is there a relationship of this with the 12x/duplicated GUID code
 ?

 Thanks.

 -- Hal



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
  [...]
  But I also see a serious problem with addressing: basically
  git tracks content. It's not designed to track a bush
  of branches taken together.  For example, take tagging:
  tag namespace is global, so you can not have the same
  tag point at multiple branches at the same time.
 
 agreed.  however, the way we use git, with the
 location of the git DB as the tag, it's not
 really a problem in practice.

who uses git this way?

 but tagging each
 branch separately is indeed a PITA...

This is just one problem.
For example, git pull can only merge one branch at a time.

  anyway, what do you think?  is there anyway i could
  convince you to dump the backport patches and put
  all the backports in branches?  i'm willing to do the
  legwork if you see value...
  
  can you publish the scripts and/or the tree?
  i think we can start by just running the scripts nightly,
  making it possible for people to view backport history
  with gitview.
 
 i've attached the script that i'm using to compare
 the trees, but it's a total hack.  it doesn't keep
 the patch history.  that would not be too hard to
 do i guess -- if there's interest...
 
 to run the script:
 
 cp attached files here...
 $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel
 $ cd ofed_kernel
 $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done
 
 now you'll have a bunch of backport-2.6.xxx branches...

So, would you like to have this script run nightly on ofed trees?

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote:
  Quoting Arthur Jones [EMAIL PROTECTED]:
  Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
  
  hi michael, ...
  
  On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
   [...]
   But I also see a serious problem with addressing: basically
   git tracks content. It's not designed to track a bush
   of branches taken together.  For example, take tagging:
   tag namespace is global, so you can not have the same
   tag point at multiple branches at the same time.
  
  agreed.  however, the way we use git, with the
  location of the git DB as the tag, it's not
  really a problem in practice.
 
 who uses git this way?

i do.

  but tagging each
  branch separately is indeed a PITA...
 
 This is just one problem.
 For example, git pull can only merge one branch at a time.

how is this a problem?  the way i use git,
i use a script to reflow the changes into
the dependent branches.  over the last few
months, anyway, it has worked fine...

   anyway, what do you think?  is there anyway i could
   convince you to dump the backport patches and put
   all the backports in branches?  i'm willing to do the
   legwork if you see value...
   
   can you publish the scripts and/or the tree?
   i think we can start by just running the scripts nightly,
   making it possible for people to view backport history
   with gitview.
  
  i've attached the script that i'm using to compare
  the trees, but it's a total hack.  it doesn't keep
  the patch history.  that would not be too hard to
  do i guess -- if there's interest...
  
  to run the script:
  
  cp attached files here...
  $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel
  $ cd ofed_kernel
  $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done
  
  now you'll have a bunch of backport-2.6.xxx branches...
 
 So, would you like to have this script run nightly on ofed trees?

if someone finds that useful.  my main motivation is
getting rid of all the patches in ofed, if running this
script nightly helps us to get there, then i'm all for
it.  if it's just for me, it's easy enough to run the
scripts by hand...

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 06:32:28PM +0300, Michael S. Tsirkin wrote:
 [...]
   For example, git pull can only merge one branch at a time.
  
  how is this a problem?  the way i use git,
  i use a script to reflow the changes into
  the dependent branches.  over the last few
  months, anyway, it has worked fine...
 
 Precisely because no one developed on these branches,
 so you are re-generating themfrom patches - not a problem,
 but as you point out not too useful either.

well, no, i _have_ been doing development on the
local branches in our internal repo.  i also
merge in changes that you make to the ofed repo
to our internal backport branches.  the script
i posted is just so that i can more easily compare
our internal branches to the ofed backport branches.

 If people start developing on these branches, then
 eventually you will need to merge them - and git only merges
 them one at a time.

yes, i have to merge them one at a time.  i
still don't see how this is a problem.  backport
changes can be pulled in and the changes from
upstream can be merged in as well.  i haven't
had a problem with this so far.  can you be more
specific about what you expect will fail?

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote:
   Quoting Arthur Jones [EMAIL PROTECTED]:
   Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
   
   hi michael, ...
   
   On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
[...]
But I also see a serious problem with addressing: basically
git tracks content. It's not designed to track a bush
of branches taken together.  For example, take tagging:
tag namespace is global, so you can not have the same
tag point at multiple branches at the same time.
   
   agreed.  however, the way we use git, with the
   location of the git DB as the tag, it's not
   really a problem in practice.
  
  who uses git this way?
 
 i do.
 
   but tagging each
   branch separately is indeed a PITA...
  
  This is just one problem.
  For example, git pull can only merge one branch at a time.
 
 how is this a problem?  the way i use git,
 i use a script to reflow the changes into
 the dependent branches.  over the last few
 months, anyway, it has worked fine...

Precisely because no one developed on these branches,
so you are re-generating themfrom patches - not a problem,
but as you point out not too useful either.

If people start developing on these branches, then
eventually you will need to merge them - and git only merges
them one at a time.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Command specification of ca_name and ca_port

2007-07-24 Thread Ira Weiny
On Tue, 24 Jul 2007 04:33:06 +0300
Sasha Khapyorsky [EMAIL PROTECTED] wrote:

 Hi David,
 
 On 09:52 Mon 23 Jul , David McMillen wrote:
  
   There are a standard set of command line options that allow specification 
  of 
   the CA to use for sending the requests.  I'm adding these to programs that 
   don't have them, since they are very useful when diagnosing a node 
  connected 
   to multiple subnets.  Even if you discount multiple subnets on purpose, 
   sometimes this happens when the hardware connecting all of the CA ports to 
   the same place gets broken, and that is when you need diagnostics that can 
   help figure out what is where.
  
   The standard options are:
  
 -C ca_nameuse the specified ca_name.
  
 -P ca_portuse the specified ca_port.
  
 -t timeout_ms override the default timeout for the solicited mads.
  
   My problem is that saquery already uses -C and -P, although the -t exists 
   for the expected purpose.  Also, ibcheckerrs already uses -t for 
  specifying 
   the threshold file.
 
 I think unified command line options over diags are good thing, so I
 guess reasonable renaming should be acceptable.

I agree, however right now saquery does not support specifying the ca_name or
ca_port, so you would have to add that support.

 
  
   Changing the timeout for ibcheckerrs isn't critical, but not being able to 
   do it doesn't seem right.  However, the saquery command could be really 
   handy for figuring out split fabrics, and is useful to those of us that 
   connect to multiple subnets.
  
   Does anybody have a useful suggestion?
 
 '-T' for the threshold file?

That sounds good.


 But it is easy part - saquery renames are
 less intuitive :(. Probably just lower case? Or special query option
 (-q or -Q), so queries could be specified as -qP, -qC?
 

I disagree with this because ~50% of the options are query's, it's primary
purpose is to query, and most of the other options change the format of the
output of the query.  Therefore, I don't think a -q should be required for a
query.  I think that seems redundant.

Perhaps just changing the current option to -c,-p, and adding -C and -P would
be best.  I know this might break some scripts out there, particularly mine,
but I think it is the right thing to do if you really want consistency.

Thoughts?
Ira
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Sean Hefty
at qlogic we now keep the backports as branches in
our git tree and this, i find, is much easier to
handle.  because:

* viewing and navigating backport source becomes
  _much_ easier.
* merges are easier -- patches are much more fragile
  than branches.
* comparisons are easier -- checking for differences
  between backports and between a backport and the
  canonical source is faster and more convenient...
* changesets are readable.  trying to decipher diffs
  to patches is medically proven to take months, if not
  years, off your life.

Let's add that you don't need patches to patches, and the order patches are
applied isn't determined alphabetically.

anyway, what do you think?  is there anyway i could
convince you to dump the backport patches and put
all the backports in branches?  i'm willing to do the
legwork if you see value...

I would love OFED to dump the patch directory concept.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 06:53:48PM +0300, Michael S. Tsirkin wrote:
 [...]
  well, no, i _have_ been doing development on the
  local branches in our internal repo.  i also
  merge in changes that you make to the ofed repo
  to our internal backport branches.  the script
  i posted is just so that i can more easily compare
  our internal branches to the ofed backport branches.
 
 How do you do the merging?

for just the backport branches, i merge different ways
from different sources:
   * from upstream, it's a pull into master and a git merge master
 into local backport branches -- i call this a reflow.
   * from local developers, it's a git pull straight into
 the backport branch, then reflow the repo.
   * from ofed, i apply the backport patch by hand and
 fixup the inevitable clashes -- either because part
 of the patch is already applied, or because context
 has changed enough for git apply to get confused.  when
 these are fixed up, reflow the repo...
   
   If people start developing on these branches, then
   eventually you will need to merge them - and git only merges
   them one at a time.
  
  yes, i have to merge them one at a time.  i
  still don't see how this is a problem.  backport
  changes can be pulled in and the changes from
  upstream can be merged in as well.  i haven't
  had a problem with this so far.  can you be more
  specific about what you expect will fail?
 
 Well, as distro maintainers we need to merge a lot, from different
 people. We'll have to write all kind of scripts to do it instead of
 a plain git pull.

i can't imagine what script you would need.  can
you be more specific?  it would seem to me that you
could just pull straight in to the backport branch...

 And, I expect almost all git operations will have to be wrapped
 in a script in some way, to operate on a bush of branches.

so far, this hasn't been an issue for me.  the only
operation that i've scripted is the reflow.  for 
most work, i can just ignore the backport branches and
do the work in the (copy of) master, then reflow the
changes into the backports...

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 at qlogic we now keep the backports as branches in
 our git tree and this, i find, is much easier to
 handle.  because:
 
 * viewing and navigating backport source becomes
   _much_ easier.
 * merges are easier -- patches are much more fragile
   than branches.
 * comparisons are easier -- checking for differences
   between backports and between a backport and the
   canonical source is faster and more convenient...
 * changesets are readable.  trying to decipher diffs
   to patches is medically proven to take months, if not
   years, off your life.
 
 Let's add that you don't need patches to patches, and the order patches are
 applied isn't determined alphabetically.
 
 anyway, what do you think?  is there anyway i could
 convince you to dump the backport patches and put
 all the backports in branches?  i'm willing to do the
 legwork if you see value...
 
 I would love OFED to dump the patch directory concept.

I'd love to have a common source for all kernels,
and the kernel_addons mechanism does this for us whenever possible.

But, for these cases where the code actually needs to be modified,
applying a patch seems like the least evil way to do it.
Alternatives seem to be much worse.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 06:53:48PM +0300, Michael S. Tsirkin wrote:
  [...]
   well, no, i _have_ been doing development on the
   local branches in our internal repo.  i also
   merge in changes that you make to the ofed repo
   to our internal backport branches.  the script
   i posted is just so that i can more easily compare
   our internal branches to the ofed backport branches.
  
  How do you do the merging?
 
 for just the backport branches, i merge different ways
 from different sources:
* from upstream, it's a pull into master and a git merge master
  into local backport branches -- i call this a reflow.
* from local developers, it's a git pull straight into
  the backport branch, then reflow the repo.
* from ofed, i apply the backport patch by hand and
  fixup the inevitable clashes -- either because part
  of the patch is already applied, or because context
  has changed enough for git apply to get confused.  when
  these are fixed up, reflow the repo...

Hmm. Concider that yuou did all of the above, and then mail me
that there's an update. Now I need to merge updates to multiple branches 
directly
and git pull does not do this. It's a problem.

If people start developing on these branches, then
eventually you will need to merge them - and git only merges
them one at a time.
   
   yes, i have to merge them one at a time.  i
   still don't see how this is a problem.  backport
   changes can be pulled in and the changes from
   upstream can be merged in as well.  i haven't
   had a problem with this so far.  can you be more
   specific about what you expect will fail?
  
  Well, as distro maintainers we need to merge a lot, from different
  people. We'll have to write all kind of scripts to do it instead of
  a plain git pull.
 
 i can't imagine what script you would need.  can
 you be more specific?  it would seem to me that you
 could just pull straight in to the backport branch...

You'll have to check out branches one by one, and do a pull.
What if there's a conflict? I currently just do git reset --hard ORIG_HEAD
and mail the maintainer to fix it up - but this won't work
with the bush of branches approach.

  And, I expect almost all git operations will have to be wrapped
  in a script in some way, to operate on a bush of branches.
 
 so far, this hasn't been an issue for me.  the only
 operation that i've scripted is the reflow.  for 
 most work, i can just ignore the backport branches and
 do the work in the (copy of) master, then reflow the
 changes into the backports...

Because you only have your driver to maintain.


-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: IPoIB path caching

2007-07-24 Thread Sean Hefty
Linux has a quite sophisticated mechanism to maintain / cache / probe / 
invalidate / update the network stack L2 neighbour info.


Path records are not just L2 info.  They contain L4, L3, and L2 info 
together.


For example, in the Voltaire gen1 stack we had an ib arp module which 
was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This 
module managed some sort of path cache, were IPoIB was always asking for 
non-cached path and other ULPs were willing to get cached path.


IMO, using a cached AH is no different than using a cached path.  You're 
simply mapping the PR data into another structure.


We're ignoring the problem here, and that is that a centralized SA 
doesn't scale.  MPI stacks have largely ignored this problem by simply 
not doing path record queries.  Path information is often hard-coded, 
with QPN data exchanged out of band over sockets (often over Ethernet).


We've seen problems running large MPI jobs without PR caching.  I know 
that Silverstorm/QLogic did as well.  And apparently Voltaire hit the 
same type of problem, since you added a caching module.  (Did Mellanox 
and Topspin/Cisco create PR caches as well?)  At least three companies 
working on IB came up with the same solution.  What is the objection to 
the current patch set?


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 07:23:06PM +0300, Michael S. Tsirkin wrote:
 [...]
  for just the backport branches, i merge different ways
  from different sources:
 * from upstream, it's a pull into master and a git merge master
   into local backport branches -- i call this a reflow.
 * from local developers, it's a git pull straight into
   the backport branch, then reflow the repo.
 * from ofed, i apply the backport patch by hand and
   fixup the inevitable clashes -- either because part
   of the patch is already applied, or because context
   has changed enough for git apply to get confused.  when
   these are fixed up, reflow the repo...
 
 Hmm. Concider that yuou did all of the above, and then mail me
 that there's an update. Now I need to merge updates to multiple branches 
 directly
 and git pull does not do this. It's a problem.

for changes made to the canonical source, it's
just git pull into ofed_kernel and a reflow.

for changes made to the backports, you would need
to git checkout and git pull into each of the
backport branches _in which i made a change_.
the case that i make changes to _all_ or even
a significant number of backport patches is
sufficiently rare that i doubt it is worth scripting.
but, if the script is necessary, it's pretty
straightforward:

set -e
for b in branches-which-have-changed; do
   git checkout $b
   git pull remote $b
done

 [...]
  i can't imagine what script you would need.  can
  you be more specific?  it would seem to me that you
  could just pull straight in to the backport branch...
 
 You'll have to check out branches one by one, and do a pull.
 What if there's a conflict? I currently just do git reset --hard ORIG_HEAD
 and mail the maintainer to fix it up - but this won't work
 with the bush of branches approach.

it works for me.  what do you expect will break?

   And, I expect almost all git operations will have to be wrapped
   in a script in some way, to operate on a bush of branches.
  
  so far, this hasn't been an issue for me.  the only
  operation that i've scripted is the reflow.  for 
  most work, i can just ignore the backport branches and
  do the work in the (copy of) master, then reflow the
  changes into the backports...
 
 Because you only have your driver to maintain.

no, i have to maintain quite a few of the
ofed backport branches as well for our release.
if i started getting pull requests from people
with changes to 15 backport branches in one go,
i'd probably want to script it...

i have found that drawing a DAG with graphviz has
been a big help in making sure that i organize the
branches correctly...

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 07:16:46PM +0300, Michael S. Tsirkin wrote:
 [...]
 But, for these cases where the code actually needs to be modified,
 applying a patch seems like the least evil way to do it.
 Alternatives seem to be much worse.

what is it about patches that are less evil
than changesets?  can you list some of the
advantages?

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
  Because you only have your driver to maintain.
 
 no, i have to maintain quite a few of the
 ofed backport branches as well for our release.
 if i started getting pull requests from people
 with changes to 15 backport branches in one go,
 i'd probably want to script it...

Yea. Happens all the time here: when component maintainer
makes a change, it will typically affect all backports or none.

 i have found that drawing a DAG with graphviz has
 been a big help in making sure that i organize the
 branches correctly...

Ugh .. *that* sounds complicated.
Looks like it's much simpler with current setup.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Sasha Khapyorsky
Hi,

On 11:03 Tue 24 Jul , Hal Rosenstock wrote:
  On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:
 
   *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
  *Sent:* Tuesday, July 24, 2007 5:53 PM
  *To:* Eitan Zahavi
  *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
  *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback
 
 
 
  Hi Eitan,
 
  On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:
  
*Hi Hal,*
   **
   *What is this loopback connector used for?*
   *Does not seem to me like a very useful thing to do.*
  
  **
  Perhaps not but no reason OpenSM can't handle this more gracefully.

I don't have loopback plug, but used loopback connections for some
checks with simulator. There is nothing illegal, so I think it would be
better to support it.

   *Anyway, if it is not a production environment we could add a debug
   mode (-d flag option) to ignore this check.*
  
  **
  Why would a separate flag be needed ?
  *[EZ] Since I do not see any other solution for the SM  to know it is
  really a loop back plug rather then two devices with same GUID connected
  back to back ...*

Also we saw the cases when port moving triggers duplicated GUIDs
detector (originally was reported on real fabric and it is trivially
reproducible in simulated environment).

So probably we need to find some better way to handle duplication GUID
detector (in general, not just for loopback). For example node_info
content could be compared. More ideas?

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 07:16:46PM +0300, Michael S. Tsirkin wrote:
  [...]
  But, for these cases where the code actually needs to be modified,
  applying a patch seems like the least evil way to do it.
  Alternatives seem to be much worse.
 
 what is it about patches that are less evil
 than changesets?  can you list some of the
 advantages?

changesets *do not exist* in git - git tracks content.

I compare multiple directories with patches with the bush of branches.
With bush of branches:
git pull broken, git archive broken, git tag broken, git reset broken.
It looks like the list can be continued.

Yes, we can start building our own tools on top of git to do this,
but I'd rather not.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration

2007-07-24 Thread Sasha Khapyorsky
On 07:56 Tue 24 Jul , Eitan Zahavi wrote:
  On 20:59 Mon 23 Jul , Eitan Zahavi wrote:
   Hi Sasha, Hal,

   I think I have an idea:

   Since this is a specific switch that reported ChangeBit or Trap why 
   can't we just qualify that there was no change in the switch setup?
  
  The ChangeBit seems to be good start point - then OpenSM will 
  query all switch ports PortInfo anyway and if for all ports 
  PortState is = INIT (and at least for one port it is = 
  INIT), it means that this switch was rebooted/reinitialized.
  
  And for single port PortState drop to = INIT should indicate 
  reinitialization.
  
  Seems correct?
 Yes.
  
   We could send PortInfo, SwitchInfo,
  
  SwitchInfo is queried at each light sweep, PortInfo's if 
  ChangeBit is set. Guess we are ok with it even now.
 I will double check that...
 Well - even setting one port state to INIT did not cause the switch to
 be reconfigured.
 Seems the code does not enforce this condition yet.
  
   LFT, MFT, SL2VL, VLArb, PKey queries
   and make sure no change from previous state. Or we could simply 
   enforce last state by sending it over again ...
  
  I think we could want to re-read PKey tables in order to 
  preserve existing PKey indices and just to flush (overwrite 
  with new settings) LFT, MFT, SL2VL, VLArb tables. Reasonable?
 Correct.

Ok, I will prepare patches. I think about separate patches for switches
and ports. Also likely MFT should be handled separately, since we don't
do incremental update there yet.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Command specification of ca_name and ca_port

2007-07-24 Thread Sasha Khapyorsky
On 09:05 Tue 24 Jul , Ira Weiny wrote:
 
  But it is easy part - saquery renames are
  less intuitive :(. Probably just lower case? Or special query option
  (-q or -Q), so queries could be specified as -qP, -qC?
  
 
 I disagree with this because ~50% of the options are query's, it's primary
 purpose is to query, and most of the other options change the format of the
 output of the query.  Therefore, I don't think a -q should be required for a
 query.  I think that seems redundant.
 
 Perhaps just changing the current option to -c,-p, and adding -C and -P would
 be best.  I know this might break some scripts out there, particularly mine,
 but I think it is the right thing to do if you really want consistency.
 
 Thoughts?

-c,-p are fine for me too.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 07:52:03PM +0300, Michael S. Tsirkin wrote:
 [...]
  i have found that drawing a DAG with graphviz has
  been a big help in making sure that i organize the
  branches correctly...
 
 Ugh .. *that* sounds complicated.
 Looks like it's much simpler with current setup.

compared to the rather sophisticated linux-kernel
changesets that i see from you on this list -- it's
child's play...

compared to figuring out the list of options for
ofed_scripts/configure just so we can _see_ the
source we're running on our box -- it's a walk in
the park...

one of the goals of OFED 1.3 is to make access
to the source easier.  to do that, we will prob
need to rid ourselves of patches...

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Sean Hefty

Hmm. Concider that yuou did all of the above, and then mail me
that there's an update. Now I need to merge updates to multiple branches 
directly
and git pull does not do this. It's a problem.


A simple script can do this.


You'll have to check out branches one by one, and do a pull.
What if there's a conflict? I currently just do git reset --hard ORIG_HEAD
and mail the maintainer to fix it up - but this won't work
with the bush of branches approach.


If there's a conflict, then you need a different patch.  A single patch 
may work for all backports, or a fix may require different patches 
depending on the kernel version.  As it stands now, there are patches 
that we apply that do not work and expect a subsequent patch to fix it up.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 one of the goals of OFED 1.3 is to make access
 to the source easier.  to do that, we will prob
 need to rid ourselves of patches...

I'm working on a rather simpler solution to this problem.
Stay tuned.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Sean Hefty
Examples: What if there's a conflict? I currently do git reset, we'll

If there's a conflict applying a patch, you reject it.  I fail to see any issue
here.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [ANNOUNCE] NFS-RDMA for OFED 1.2 G/A

2007-07-24 Thread Tom Tucker
For those interested in NFS-RDMA, OGC has created an install package
based on the OFA 1.2 GA release. The package supports both SLES 10 and
RHEL 5. You can download this package from
http://www.opengridcomputing.com/nfs-rdma.html.

Please let me know if you find any problems.

Thanks,
Tom

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 i'd _really_ like to see a list of the advantages of
 patches over branches.  it's hard for me to know if
 i'm just missing something if the case is not laid out...

Here's a short list off the top of my head

- A single git pull merges any number of backport changes
- A single git reset ORIG_HEAD recovers from a conflicting merge
- A single tag tags all code for all kernels
- On update from upstream, if there is a conflict
  between upstream code and and a patch
  it's easy to temporarily remote the patch, complete the merge,
  and go bugger the patch author
- For recent kernels there are almost no patches.
  So an update from upstream for these kernels is free,
  with branches I will still need to update all branches.
- Adding a fix which only affects common code
  is currently straight-forward: make a change, commit.
  With multiple branches every fix must be pulled into
  all branches.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Sean Hefty

But the proposal here was to have a bush of branches, all of which
need to be merged at the same time. It's possible that some
would merge and some would fail, leaving me in an inconsistent state,
and no easy way to get back to where I started.


A fix could be applied to some kernels, but not others.  In fact, if a 
patch works for kernel X  Y, but has a conflict with kernel Z, then 
different patches are needed anyway.  I don't see the requirement to 
merge everything or even apply a fix to all kernels at the same time.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Sean Hefty

Here's a short list off the top of my head

- A single git pull merges any number of backport changes
- A single git reset ORIG_HEAD recovers from a conflicting merge
- A single tag tags all code for all kernels
- On update from upstream, if there is a conflict
  between upstream code and and a patch
  it's easy to temporarily remote the patch, complete the merge,
  and go bugger the patch author
- For recent kernels there are almost no patches.
  So an update from upstream for these kernels is free,
  with branches I will still need to update all branches.
- Adding a fix which only affects common code
  is currently straight-forward: make a change, commit.
  With multiple branches every fix must be pulled into
  all branches.


You seem to be overlooking the fact that you already require a script to 
check that things work for all kernels.  Until you apply a series of 
patches to form a particular kernel, you don't know if a change that you 
pulled in caused a conflict.  You still have the requirement to verify 
the fix on all kernels, and it still requires running a script that 
pushes/pops patches to create each tree.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Hal Rosenstock

On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:


 *Hi Hal,*
**
*The code to find duplicated GUIDs stem from real user cases where
flawed *
*burning procedure caused actual GUID duplications. There is nothing
impossible. *



No one said impossible; just a violation of what globally unique (GU from
GUID) really means. It's largely because vendors allowed users to program
non volatile RAM for GUIDs rather than a real manufacturing process for this
which guarantees uniqueness that we are even discussing this aspect of it.

*So it is really critical the the SM will be able to recognize this case

and abort.*



I agree with the detect part but not the abort part. Why can't it report
these errors and continue on ? That seems better to me than aborting.

-- Hal



*It might be that for testing someone wants to use a loopback plug that
cause the same *
*port GUID appear on both sides of link - but it is better to require the
user doing the test *
*to set some flag than to miss such a situation in real life cluster.*
**
*This requirement was written after many people wasted many hours trying
to figure out what was going on.*
*PLEASE DO NOT TAKE IT AWAY*
**

*Eitan Zahavi***
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


 --
*From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
*Sent:* Tuesday, July 24, 2007 6:04 PM
*To:* Eitan Zahavi
*Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
*Subject:* Re: OpenSM detection of duplicated GUIDs on loopback




On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:

  *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] ]
 *Sent:* Tuesday, July 24, 2007 5:53 PM
 *To:* Eitan Zahavi
 *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
 *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback



 Hi Eitan,

 On 7/24/07, Eitan Zahavi [EMAIL PROTECTED]  wrote:
 
   *Hi Hal,*
  **
  *What is this loopback connector used for?*
  *Does not seem to me like a very useful thing to do.*
 
 **
 Perhaps not but no reason OpenSM can't handle this more gracefully.

  *Anyway, if it is not a production environment we could add a debug
  mode (-d flag option) to ignore this check.*
 
 **
 Why would a separate flag be needed ?
 *[EZ] Since I do not see any other solution for the SM  to know it is
 really a loop back plug rather then two devices with same GUID connected
 back to back ... *


Technically, this should only occur when looped back and not two devices
with same GUID as GUID == globally unique and a duplication indicates a
manufacturing issue.

Anyhow, can't these be treated the same (and handled more gracefully)
without an additional option/flag ?

-- Hal


 -- Hal

  **
 
  *Eitan Zahavi***
  Senior Engineering Director, Software Architect
  Mellanox Technologies LTD
  Tel:+972-4-9097208
  Fax:+972-4-9593245
  P.O. Box 586 Yokneam 20692 ISRAEL
 
 
   --
  *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED]
  *Sent: *Tuesday, July 24, 2007 5:31 PM
  *To:* OpenFabrics General
  *Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik
  *Subject:* OpenSM detection of duplicated GUIDs on loopback
 
 
   Hi,
 
  This is what starts off as a minor issue and I know it has been
  discussed it somewhat in the past:
 
  Putting a loopback connector on a (switch) link causes OpenSM to
  indicate duplicated GUID error 0D18 as follows:
 
  __osm_ni_rcv_set_links
  {
  ...
/*
   When there are only two nodes with exact same guids
  (connected back
   to back) - the previous check for duplicated guid will
  not catch
   them. But the link will be from the port to itself...
   Enhanced Port 0 is an exception to this
*/
if ((osm_node_get_node_guid( p_node ) ==
  p_ni_context-node_guid) 
(port_num == p_ni_context-port_num) 
(port_num != 0))
{
  osm_log( p_rcv-p_log, OSM_LOG_ERROR,
   __osm_ni_rcv_set_links: ERR 0D18: 
   Duplicate GUID found by link from a port to
  itself:
   node 0x% PRIx64 , port number 0x%X\n,
   cl_ntoh64( osm_node_get_node_guid( p_node ) ),
   port_num );
  ...
 
  So this occurs over and over and over and fills the log with the same
  spew. This should be improved IMO.
 
  Is this really a fatal condition ? Doesn't seem like it should be to
  me.
 
  Also, OpenSM can ride this out with -y (stay on fatal) but is that
  safe for this condition ?
 
  Seems like something like an extra loopback bit should be added to
  some port structure which should cause these links to be ignored. This bit
  would then be reset when the peer is now longer itself.
 
  Also, is there a relationship of this with the 12x/duplicated GUID
  code ?
 
  Thanks.
 
  -- Hal
 
 



[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Eitan Zahavi
Hi Hal,
 
For many users such a critical failure (one the SM can not really do
anything with) is better aborted then forgotten in some log file.
Anyway's the -y flag lets you ignore it if you like.
 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 9:38 PM
To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated GUIDs on loopback




On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: 

Hi Hal,
 
The code to find duplicated GUIDs stem from real user
cases where flawed 
burning procedure caused actual GUID duplications. There
is nothing impossible. 

 
No one said impossible; just a violation of what globally unique
(GU from GUID) really means. It's largely because vendors allowed users
to program non volatile RAM for GUIDs rather than a real manufacturing
process for this which guarantees uniqueness that we are even discussing
this aspect of it. 


So it is really critical the the SM will be able to
recognize this case and abort.

 
I agree with the detect part but not the abort part. Why can't
it report these errors and continue on ? That seems better to me than
aborting.
 
-- Hal


 
It might be that for testing someone wants to use a
loopback plug that cause the same 
port GUID appear on both sides of link - but it is
better to require the user doing the test 
to set some flag than to miss such a situation in real
life cluster.
 
This requirement was written after many people wasted
many hours trying to figure out what was going on.
PLEASE DO NOT TAKE IT AWAY

 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] ] 
Sent: Tuesday, July 24, 2007 6:04 PM 

To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky;
Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated
GUIDs on loopback


 



On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] 
wrote: 

From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] ] 
Sent: Tuesday, July 24, 2007 5:53 PM
To: Eitan Zahavi
Cc: OpenFabrics General; Sasha
Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of
duplicated GUIDs on loopback 

 

Hi Eitan,


On 7/24/07, Eitan Zahavi
[EMAIL PROTECTED]  wrote: 

Hi Hal,
 
What is this loopback connector used
for?
Does not seem to me like a very useful
thing to do.

 
Perhaps not but no reason OpenSM can't
handle this more gracefully.


Anyway, if it is not a production
environment we could add a debug mode (-d flag option) to ignore this
check.

 
Why would a separate flag be needed ?
[EZ] Since I do not see any other
solution for the SM  to know it is really a loop back plug rather then
two devices with same GUID connected back to back ... 

 
Technically, this should only occur when
looped back and not two devices with same GUID as GUID == globally
unique and a duplication indicates a manufacturing issue.
 
Anyhow, can't these be treated the same (and
handled more gracefully) without an additional option/flag ?
 
-- Hal



 
-- Hal


 


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 But the proposal here was to have a bush of branches, all of which
 need to be merged at the same time. It's possible that some
 would merge and some would fail, leaving me in an inconsistent state,
 and no easy way to get back to where I started.
 
 A fix could be applied to some kernels, but not others.  In fact, if a 
 patch works for kernel X  Y, but has a conflict with kernel Z, then 
 different patches are needed anyway.  I don't see the requirement to 
 merge everything or even apply a fix to all kernels at the same time.

This is typically component maintainer's job, not integrator's.
As an integrator, I want to pull but if the merge fails,
reset everything back to the original state, and let the maintainer know.

-- 
MST
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH 00/10] Implement batching skb API

2007-07-24 Thread jamal
KK,

On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote:

 
 J Hadi Salim [EMAIL PROTECTED] wrote on 07/23/2007 06:02:01 PM:


 Actually you have not sent netperf results with prep and without prep.

My results were based on pktgen (which i explained as testing the
driver). I think depending on netperf without further analysis is
simplistic. It was like me doing forwarding tests on these patches.

  So _which_ non-LLTX driver doesnt do that? ;-
 
 I have no idea since I haven't looked at all drivers. Can you tell which
 all non-LLTX drivers does that ? I stated this as the sole criterea.

The few i have peeked at all do it. I also think the e1000 should be
converted to be non-LLTX. The rest of netdev is screaming to kill LLTX. 

  tun driver doesnt use it either - but i doubt that makes it bloat
 
 Adding extra code that is currently not usable (esp from a submission
 point) is bloat.

So far i have converted 3 drivers, 1 of them doesnt use it. Two more
driver conversions are on the way, they will both use it. How is this
bloat again? 
A few emails back you said if only IPOIB can use batching then thats
good enough justification. 

  You waltz in, have the luxury of looking at my code, presentations, many
  discussions with me etc ...
 
 luxury ? 
 I had implemented the entire thing even before knowing that you
 are working on something similar! and I had sent the first proposal to
 netdev,

I saw your patch at the end of may (or at least 2 weeks after you said
it existed). That patch has very little resemblance to what you just
posted conceptwise or codewise. I could post it if you would give me
permission.

 *after* which you told that you have your own code and presentations (which
 I had never seen earlier - I joined netdev a few months back, earlier I was
 working on RDMA, Infiniband as you know).

I am gonna assume you didnt know of my work - which i have been making
public for about 3 years. Infact i talked about this topic when i
visited your office in 2006 on a day you were not present, so it is
plausible you didnt hear of it.

  And it didn't give me any great
 ideas either, remember I had posted results for E1000 at the time of
 sending the proposals. 

In mid-June you sent me a series of patches which included anything from
changing variable names to combining qdisc_restart and about everything
i referred to as being cosmetic differences in your posted patches. I
took two of those and incorporated them in. One was an XXX in my code
already to allocate the dev-blist 
(Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63). 
The other one was a mechanical removal of the blist being passed
(Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757). 
Some of the others i asked you to defer. For example, the reason i gave
you for not merging any qdisc_restart_combine changes is because i was
waiting for Dave to swallow the qdisc_restart changes i made; otherwise
maintainance becomes extremely painful for me. 
Sridhar actually provided a lot more valuable comments and fixes but has
not planted a flag on behalf of the queen of spain like you did. 

 However I do give credit in my proposal to you for what
 ideas that your provided (without actual code), and the same I did for other
 people who did the same, like Dave, Sridhar. BTW, you too had discussions 
 with me,
 and I sent some patches to improve your code too, 

I incorporated two of your patches and asked for deferal of others.
These patches have now shown up in what you claim as the difference. I
just call them cosmetic difference not to downplay the importance of
having an ethtool interface but because they do not make batching
perform any better. The real differences are those two items. I am
suprised you havent cannibalized those changes as well. I thought you
renamed them to something else; according to your posting:
This patch will work with drivers updated by Jamal, Matt  Michael Chan
with minor modifications - rename xmit_win to xmit_slots  rename batch
handler. Or maybe thats a future plan you have in mind?

 so it looks like a two
 way street to me (and that is how open source works and should).

Open source is a lot more transparent than that.

You posted a question, which was part of your research. I responded and
told you i have patches; you asked me for them and i promptly ported
them from pre-2.6.18 to the latest kernel at the time. 

The nature of this batching work is one of performance. So numbers are
important. If you had some strong disagreements on something in the
architecture, then it would be of great value to explain it in a
technical detail - and more importantly to provide some numbers to say
why it is a bad idea. You get numbers by running some tests. 
You did none of the above. Your effort has been to produce your patch
for whatever reasons. This would not have been problematic to me if it
actually was based within reasons of optimization because the end goal
would have been achieved.

I have deleted the rest of the email 

[ofa-general] [PATCH] amso1100: QP init bug in amso driver

2007-07-24 Thread Tom Tucker
Roland:

The guys at UNH found this and fixed it. I'm surprised no
one has hit this before. I guess it only breaks when the 
refcount on the QP is non-zero.

Initialize the wait_queue_head_t in the c2_qp structure.

Signed-off-by: Ethan Burns [EMAIL PROTECTED]
Acked-by: Tom Tucker [EMAIL PROTECTED]

---
 drivers/infiniband/hw/amso1100/c2_qp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c 
b/drivers/infiniband/hw/amso1100/c2_qp.c
index 420c138..01d0786 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -506,6 +506,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
qp-send_sgl_depth = qp_attrs-cap.max_send_sge;
qp-rdma_write_sgl_depth = qp_attrs-cap.max_send_sge;
qp-recv_sgl_depth = qp_attrs-cap.max_recv_sge;
+   init_waitqueue_head(qp-wait);
 
/* Initialize the SQ MQ */
q_size = be32_to_cpu(reply-sq_depth);

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Eitan Zahavi
Maybe  avoid the log if -y is provided?
 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 9:56 PM
To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated GUIDs on loopback




On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote:

Hi Hal,
 
For many users such a critical failure (one the SM can
not really do anything with) is better aborted then forgotten in some
log file. 
Anyway's the -y flag lets you ignore it if you like.

 
So everything else continues to work fine with -y ? In which
case, I'm not sure which is the better default.
 
Users certainly won't like their logs filling up with continuous
duplicated GUID messages. The log spew should be cleaned up IMO.
 
-- Hal

 


 

Eitan Zahavi 
Senior Engineering Director, Software Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] ] 
Sent: Tuesday, July 24, 2007 9:38 PM 

To: Eitan Zahavi
Cc: OpenFabrics General; Sasha Khapyorsky;
Yevgeny Kliteynik
Subject: Re: OpenSM detection of duplicated
GUIDs on loopback


 



On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] 
wrote: 

Hi Hal,
 
The code to find duplicated GUIDs stem
from real user cases where flawed 
burning procedure caused actual GUID
duplications. There is nothing impossible. 

 
No one said impossible; just a violation of what
globally unique (GU from GUID) really means. It's largely because
vendors allowed users to program non volatile RAM for GUIDs rather than
a real manufacturing process for this which guarantees uniqueness that
we are even discussing this aspect of it. 


So it is really critical the the SM will
be able to recognize this case and abort.

 
I agree with the detect part but not the abort
part. Why can't it report these errors and continue on ? That seems
better to me than aborting.
 
-- Hal


 
It might be that for testing someone
wants to use a loopback plug that cause the same 
port GUID appear on both sides of link -
but it is better to require the user doing the test 
to set some flag than to miss such a
situation in real life cluster.
 
This requirement was written after many
people wasted many hours trying to figure out what was going on.
PLEASE DO NOT TAKE IT AWAY

 

Eitan Zahavi 
Senior Engineering Director, Software
Architect 
Mellanox Technologies LTD 
Tel:+972-4-9097208
Fax:+972-4-9593245 
P.O. Box 586 Yokneam 20692 ISRAEL 

 




From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] ] 
Sent: Tuesday, July 24, 2007 6:04 PM 

To: Eitan Zahavi
Cc: OpenFabrics General; Sasha
Khapyorsky; Yevgeny Kliteynik
Subject: Re: OpenSM detection of
duplicated GUIDs on loopback


 



On 7/24/07, Eitan Zahavi
[EMAIL PROTECTED]  wrote: 

From: Hal Rosenstock
[mailto:[EMAIL PROTECTED] ] 
Sent: Tuesday, July 24, 2007 5:53 PM
To: Eitan Zahavi
Cc: OpenFabrics 

[ofa-general] [PATCH] opensm: detect port external reset and flush cached tables

2007-07-24 Thread Sasha Khapyorsky

This detects port external reset by validating PortState == INIT, and
when detected flushes cached port related tables - re-reads pkey table
and drops (overwrites) SL2VL and VLArb tables.

Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
 opensm/include/opensm/osm_port.h  |5 +
 opensm/opensm/osm_port.c  |1 +
 opensm/opensm/osm_port_info_rcv.c |9 -
 opensm/opensm/osm_qos.c   |9 +
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/opensm/include/opensm/osm_port.h b/opensm/include/opensm/osm_port.h
index f6c40c7..44323ab 100644
--- a/opensm/include/opensm/osm_port.h
+++ b/opensm/include/opensm/osm_port.h
@@ -118,6 +118,7 @@ typedef struct _osm_physp
struct _osm_physp   *p_remote_physp;
boolean_t   healthy;
uint8_t vl_high_limit;
+   unsignedneed_update;
osm_dr_path_t   dr_path;
osm_pkey_tbl_t  pkeys;
ib_vl_arb_table_t   vl_arb[4];
@@ -157,6 +158,10 @@ typedef struct _osm_physp
 *  PortInfo:VLHighLimit value which installed by QoS manager
 *  and should be uploaded to port's PortInfo
 *
+*  need_update
+*  When set indicates that port was probably reset and port
+*  related tables (PKey, SL2VL, VLArb) require refreshing.
+*
 *  dr_path
 *  The directed route path to this port.
 *
diff --git a/opensm/opensm/osm_port.c b/opensm/opensm/osm_port.c
index e03e316..11cc5ca 100644
--- a/opensm/opensm/osm_port.c
+++ b/opensm/opensm/osm_port.c
@@ -118,6 +118,7 @@ osm_physp_init(
   p_physp-port_guid = port_guid;
   p_physp-port_num = port_num;
   p_physp-healthy = TRUE;
+  p_physp-need_update = 2;
   p_physp-p_node = (struct _osm_node*)p_node;
 
   osm_dr_path_init(
diff --git a/opensm/opensm/osm_port_info_rcv.c 
b/opensm/opensm/osm_port_info_rcv.c
index 6fe2d1d..0528e38 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -801,6 +801,12 @@ osm_pi_rcv_process(
   p_rcv-p_subn-master_sm_base_lid = p_pi-master_sm_base_lid;
 }
 
+/* if port just inited or reached INIT state (external reset)
+   request update for port related tables */
+p_physp-need_update =
+  (ib_port_info_get_port_state(p_pi) == IB_LINK_INIT ||
+   p_physp-need_update  1 ) ? 1 : 0;
+
 switch( osm_node_get_type( p_node ) )
 {
 case IB_NODE_TYPE_CA:
@@ -824,7 +830,8 @@ osm_pi_rcv_process(
 /*
   Get the tables on the physp.
 */
-__osm_pi_rcv_get_pkey_slvl_vla_tables( p_rcv, p_node, p_physp );
+if (p_physp-need_update)
+  __osm_pi_rcv_get_pkey_slvl_vla_tables( p_rcv, p_node, p_physp );
 
   }
 
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 17b7e3a..596b6d4 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -87,8 +87,9 @@ static ib_api_status_t vlarb_update_table_block(osm_req_t * 
p_req,
for (i = 0; i  block_length; i++)
block.vl_entry[i].vl = vl_mask;
 
-   if (!memcmp(p-vl_arb[block_num], block,
-block_length * sizeof(block.vl_entry[0])))
+   if (!p-need_update 
+   !memcmp(p-vl_arb[block_num], block,
+   block_length * sizeof(block.vl_entry[0])))
return IB_SUCCESS;
 
context.vla_context.node_guid =
@@ -170,8 +171,8 @@ static ib_api_status_t sl2vl_update_table(osm_req_t * p_req,
tbl.raw_vl_by_sl[i] = (vl1  4 ) | vl2 ;
}
 
-   p_tbl = osm_physp_get_slvl_tbl(p, in_port);
-   if (p_tbl  !memcmp(p_tbl, tbl, sizeof(tbl)))
+   if (!p-need_update  (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) 
+   !memcmp(p_tbl, tbl, sizeof(tbl)))
return IB_SUCCESS;
 
context.slvl_context.node_guid = osm_node_get_node_guid(p_node);
-- 
1.5.3.rc2.29.gc4640f

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Arthur Jones
hi michael, ...

On Tue, Jul 24, 2007 at 08:52:20PM +0300, Michael S. Tsirkin wrote:
  i'd _really_ like to see a list of the advantages of
  patches over branches.  it's hard for me to know if
  i'm just missing something if the case is not laid out...

thanks for the list...

 Here's a short list off the top of my head
 
 - A single git pull merges any number of backport changes

ok, you can run one command instead of a 4-line
script.   hmm, i guess you could say this is a
very slight advantage to using patches...
 
 - A single git reset ORIG_HEAD recovers from a conflicting merge

handling conflicts is a big part of a maintainer's
job!  the _vast_ majority of the time i bet you already
know how to do the merge.  if you don't, then only
the backport branches which haven't merged yet are
stuck and you can pick up where you left off (which
is how i do it now).  but if you're stuck in some
strange intermediate state with some patches pushed
and some yet to push in the configure script, i could
see how you'd want to punt.  but, someone is doing
this work, and that someone almost certainly has a
difficult time reproducing and developing a stack of
patches..

if, though, you must have a pristine environment,
this is easily solved by using an intermediate repo:

git clone -s canonical repo
run the pull
any conflicts, dump this guy, otherwise, pull this in

i bet this is very similar time-wise to running the
merge, then the ofed_scripts/configure over all supported
branches.  merges in git are _fast_...

 - A single tag tags all code for all kernels

store commit ids in a file and tag that?

 - On update from upstream, if there is a conflict
   between upstream code and and a patch
   it's easy to temporarily remote the patch, complete the merge,
   and go bugger the patch author

i think this is easier with the backport branches,
see git clone -s above.  or, just fixup the error.
the reason you have to bugger the author may be that
you don't have the tools necessary to actually fix
up the patch -- but you can prob bet the author doesn't
like to fixup patches in quilt any more than you do...

 - For recent kernels there are almost no patches.
   So an update from upstream for these kernels is free,
   with branches I will still need to update all branches.

i can say from a couple months experience that
upstream merges are free using backport branches.
running the script to reflow the branches is
_far_ less complex than the configure script,
has fewer dependencies and is much simpler to
maintain and understand.  also, if the upstream
changes touch code that conflicts with a backport
patch, you get to fix the problem as it happens
in a much more comfortable environment (i.e. you
don't need quilt)...

 - Adding a fix which only affects common code
   is currently straight-forward: make a change, commit.
   With multiple branches every fix must be pulled into
   all branches.

this use case is actually a good reason to
use backport branches.  with the patches,
you still need to fan out the changes to all
the backport branches.  but, in general, you
don't.  so you end up making a change and
_not realizing_ that it broke some random
backport patch.  by reflowing after every
change, you get to see it break right there
in front of you and you're way more likely
to know how to fix it.  you could do this
with the build script too, but that would
require a 4 line script -- and you'd need
to switch over to using quilt or some other
patch queue based system (yuck!)...

all your points above you made from the POV
of the maintainer.  but, what about the _users_
of the repo.  as long as changes are kept as
patches, trying to figure out what has
changed with your latest round of backports
comes down to recreating a tree and pulling
from that.  it's extremely fragile and error
prone.  there is only one maintainer, but many
developers.  if we can make their lives significantly
easier then it should be a net gain...

the backport branches make merging upstream changes
easier.  they make merging developer changes easier.
they make finding and fixing backport conflicts easier.
they make viewing and navigating changes easier.  but,
you need to use very short scripts (which i'm happy to
create and maintain) to tag and pull -- doesn't seem like
much of a price to pay to me...

arthur
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback

2007-07-24 Thread Sasha Khapyorsky
On 23:25 Tue 24 Jul , Eitan Zahavi wrote:
 
   On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: 
 
   Maybe  avoid the log if -y is provided?
 

   That avoids the spew but the duplicated GUID is important to
 know so IMO something in the middle is needed where duplicated GUIDs
 are logged but not continually the same ones.
   [EZ]  
   OK so in -y mode only we track which ones were reported and do
 not repeat the log?

And how port moving problem should be solved?

We cannot ask an user to run OpenSM with '-y' if in her/his plans to
reconnect some ports in a future and just decrease logging.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH 00/10] Implement batching skb API

2007-07-24 Thread Krishna Kumar2
Jamal,

This is silly. I am not responding to this type of presumptuous and
insulting mails.

Regards,

- KK

J Hadi Salim [EMAIL PROTECTED] wrote on 07/25/2007 12:58:20 AM:

 KK,

 On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote:

 
  J Hadi Salim [EMAIL PROTECTED] wrote on 07/23/2007 06:02:01 PM:


  Actually you have not sent netperf results with prep and without prep.

 My results were based on pktgen (which i explained as testing the
 driver). I think depending on netperf without further analysis is
 simplistic. It was like me doing forwarding tests on these patches.

   So _which_ non-LLTX driver doesnt do that? ;-
 
  I have no idea since I haven't looked at all drivers. Can you tell
which
  all non-LLTX drivers does that ? I stated this as the sole criterea.

 The few i have peeked at all do it. I also think the e1000 should be
 converted to be non-LLTX. The rest of netdev is screaming to kill LLTX.

   tun driver doesnt use it either - but i doubt that makes it bloat
 
  Adding extra code that is currently not usable (esp from a submission
  point) is bloat.

 So far i have converted 3 drivers, 1 of them doesnt use it. Two more
 driver conversions are on the way, they will both use it. How is this
 bloat again?
 A few emails back you said if only IPOIB can use batching then thats
 good enough justification.

   You waltz in, have the luxury of looking at my code, presentations,
many
   discussions with me etc ...
 
  luxury ?
  I had implemented the entire thing even before knowing that you
  are working on something similar! and I had sent the first proposal to
  netdev,

 I saw your patch at the end of may (or at least 2 weeks after you said
 it existed). That patch has very little resemblance to what you just
 posted conceptwise or codewise. I could post it if you would give me
 permission.

  *after* which you told that you have your own code and presentations
(which
  I had never seen earlier - I joined netdev a few months back, earlier I
was
  working on RDMA, Infiniband as you know).

 I am gonna assume you didnt know of my work - which i have been making
 public for about 3 years. Infact i talked about this topic when i
 visited your office in 2006 on a day you were not present, so it is
 plausible you didnt hear of it.

   And it didn't give me any great
  ideas either, remember I had posted results for E1000 at the time of
  sending the proposals.

 In mid-June you sent me a series of patches which included anything from
 changing variable names to combining qdisc_restart and about everything
 i referred to as being cosmetic differences in your posted patches. I
 took two of those and incorporated them in. One was an XXX in my code
 already to allocate the dev-blist
 (Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63).
 The other one was a mechanical removal of the blist being passed
 (Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757).
 Some of the others i asked you to defer. For example, the reason i gave
 you for not merging any qdisc_restart_combine changes is because i was
 waiting for Dave to swallow the qdisc_restart changes i made; otherwise
 maintainance becomes extremely painful for me.
 Sridhar actually provided a lot more valuable comments and fixes but has
 not planted a flag on behalf of the queen of spain like you did.

  However I do give credit in my proposal to you for what
  ideas that your provided (without actual code), and the same I did for
other
  people who did the same, like Dave, Sridhar. BTW, you too had
discussions with me,
  and I sent some patches to improve your code too,

 I incorporated two of your patches and asked for deferal of others.
 These patches have now shown up in what you claim as the difference. I
 just call them cosmetic difference not to downplay the importance of
 having an ethtool interface but because they do not make batching
 perform any better. The real differences are those two items. I am
 suprised you havent cannibalized those changes as well. I thought you
 renamed them to something else; according to your posting:
 This patch will work with drivers updated by Jamal, Matt  Michael Chan
 with minor modifications - rename xmit_win to xmit_slots  rename batch
 handler. Or maybe thats a future plan you have in mind?

  so it looks like a two
  way street to me (and that is how open source works and should).

 Open source is a lot more transparent than that.

 You posted a question, which was part of your research. I responded and
 told you i have patches; you asked me for them and i promptly ported
 them from pre-2.6.18 to the latest kernel at the time.

 The nature of this batching work is one of performance. So numbers are
 important. If you had some strong disagreements on something in the
 architecture, then it would be of great value to explain it in a
 technical detail - and more importantly to provide some numbers to say
 why it is a bad idea. You get numbers by running some tests.
 You did none of the above. Your 

[ofa-general] nightly osm_sim report 2007-07-25:normal completion

2007-07-24 Thread kliteyn
OSM Simulation Regression Summary
 
[Generated mail - please do NOT reply]
 
 
OpenSM rev = Thu_Jul_12_11:56:08_2007 [de69204d60071532833b0cdd3baa5e2386dc2c73]
ibutils rev = Tue_Mar_13_14:36:32_2007 
[80aaff94f0eb65117db39b9db7d609ffdcc055de]
 
 
Total=520  Pass=520  Fail=0
 
 
Pass:
39 Stability IS1-16.topo
39 Pkey IS1-16.topo
39 OsmTest IS1-16.topo
39 OsmStress IS1-16.topo
39 Multicast IS1-16.topo
39 LidMgr IS1-16.topo
13 Stability IS3-loop.topo
13 Stability IS3-128.topo
13 Pkey IS3-128.topo
13 OsmTest IS3-loop.topo
13 OsmTest IS3-128.topo
13 OsmStress IS3-128.topo
13 Multicast IS3-loop.topo
13 Multicast IS3-128.topo
13 LidMgr IS3-128.topo
13 FatTree merge-roots-4-ary-2-tree.topo
13 FatTree merge-root-4-ary-3-tree.topo
13 FatTree gnu-stallion-64.topo
13 FatTree blend-4-ary-2-tree.topo
13 FatTree RhinoDDR.topo
13 FatTree FullGnu.topo
13 FatTree 4-ary-2-tree.topo
13 FatTree 2-ary-4-tree.topo
13 FatTree 12-node-spaced.topo
13 FTreeFail 4-ary-2-tree-missing-sw-link.topo
13 FTreeFail 4-ary-2-tree-links-at-same-rank-2.topo
13 FTreeFail 4-ary-2-tree-links-at-same-rank-1.topo
13 FTreeFail 4-ary-2-tree-diff-num-pgroups.topo

Failures:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH 02/12 -Rev2] Changes to netdevice.h

2007-07-24 Thread Krishna Kumar2
Hi Patrick,

Krishna Kumar2/India/IBM wrote on 07/23/2007 08:27:53 AM:

 Hi Patrick,

 Patrick McHardy [EMAIL PROTECTED] wrote on 07/22/2007 10:36:51 PM:

  Krishna Kumar wrote:
   @@ -472,6 +474,9 @@ struct net_device
   void *priv;   /* pointer to private data   */
   int (*hard_start_xmit) (struct sk_buff *skb,
  struct net_device *dev);
   +   int (*hard_start_xmit_batch) (struct net_device
   +   *dev);
   +
 
 
  Os this function really needed? Can't you just call hard_start_xmit
with
  a NULL skb and have the driver use dev-blist?

 Probably not. I will see how to do it this way and get back to you.

I think this is a good idea and makes code everywhere simpler. I
will try this change and test to make sure it doesn't have any
negative impact. Will mostly send out rev3 tomorrow.

Thanks,

- KK

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general