Re: [ewg] EWG/OFED meeting minutes for July 24, 2012

2012-07-29 Thread Alex Netes

 
 OFED 3.5:
 =
 1. Kernel base: Move to kernel 3.5 GA will be done this week
 
 2. Backports:
 RHEL 6.2, 6.3 and SLES 11 SP2 - available today
 Low level drivers: mlx4 (core  ib) , nes
 Missing: mlx4_en - Mellanox, cxgb - Chelsio, qib - Intel
 
 
 3. RC1:
 If all will provide backports by Tue - July-31 we will be able to release 
 RC1
 on Aug-2
 - Mellanox is committed.
 - Need answers from Intel (Tom) and Chelsio (Steve)
 
 4. User space:
 New uDAPL package and it is in the latest OFED-3.5 build.
 Need to include new librdmacm-1.0.16-1.src.rpm and a new ibacm-1.0.7-
 1.src.rpm packages
 Management - Alex - is what we have is OK?

There would be another OpenSM release on Wed Aug-1, that will include the latest
bug fixes and some new contributed features such as:
- Per Module Logging support
- Congestion Control support
- Perf_mgr extensions

 Diagnostic tools - Ira - is what we have is OK?
 
 5. Release schedule:
 Will decide in next meeting - assuming RC1 will be at end of next week
 and testing will start.
 
 Tziporet
 
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OpenSM 1.5.4 Boot Problem

2011-12-16 Thread Alex Netes
Hi Hector,

Few more questions.
Does this happen to you only when you try to shut down the OpenSM on reboot?
What is the host cpu architecture? x86/x86_64/ppc?


 -Original Message-
 From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
 boun...@lists.openfabrics.org] On Behalf Of Hal Rosenstock
 Sent: Thursday, December 15, 2011 9:06 PM
 To: Hector Abrach
 Cc: ewg@lists.openfabrics.org
 Subject: Re: [ewg] OpenSM 1.5.4 Boot Problem
 
 Hector,
 
 On 12/15/2011 12:49 PM, Hector Abrach wrote:
  Hal,
 
  Thank you for the response. To address your questions:
 
  So the switch stays up and the servers (including the one OpenSM is
  on) is rebooted, right ?
 
  Right.
 
  Do the servers run QNX rather than Linux ? Are you saying all OpenSM
  code is the same as stock OpenSM 3.3.12 (OFED 1.5.4-rc3) ?
 
  Yes, all 7 servers run QNX. The OpenSM code is 99% the same, the only
  changes I had to make were made to some #define libraries.
  The big changes were made for the driver, not so much OpenSM.
 
 I would think there are also changes for porting of complib to QNX. Do you
 use osm_vendor_ibumad.c as the OpenSM vendor layer ?
 
  I'm using IBNet 1.3.
 
 What's IBNet 1.3 ? I'm not familiar with that.
 
  OpenSM always runs on the same one server, the others don't run it.
 
 Understood.
 
  Is the topology the 7 servers and the 1 switch and if you use other
  switches you don't see this issue ?
 
  That's correct, the topology is 7 servers and 1 switch. We typically
  use less servers (4) for our application but the problem is more
  easily reproducible with more servers so we have a 7 server setup with
  1 switch. We don't have a great selection of switches but I know our
  previous switch did not cause this problem. Our intention is to go to
  production with this new switch but we can't release until we find an
  acceptable solution.
 
 Ican see the responses but not the requests. What verbosity level did
 you use ?
 
  I ran OpenSM with level -D 0x06 (error, info, verbose). I don't want
  to do -D 0xFF because I know this fixes the problem for sure.
 
 I think -D 0x23 (error, info, frames) would do the trick...
 
  -
 
  In summary:
  1.knowing that the system gets stuck for sm_vendor_ibumad.c -
  umad_receiver() - for(;;) but keeps running properly for function
  main.c - osm_manager_loop().
  2.If I use -D 0xFF the problem is completely fixed
  3.if I use OSM_DEFAULT_SMP_MAX_ON_WIRE of 1 instead of any other
  value the problem is completely fixed
  4.The failure always occurs with qp0_mads_outstanding of 1
  remaining
  what do you think could be wrong?
  Do you think the driver could be the problem?
 
 Yes; The thing that I think is a likely suspect and may be missing and causing
 this issue is the (built in to kernel MAD in Linux) timeout retry code for MAD
 transactions which if the timeout/retries are exhaused triggers a send error
 (callback). Is that implemented ?
 
 However, I don't have a good explanation for why you see this now and not
 before with your other switches but maybe that's not important.
 
  What debug command should I use to see the sent requests?
 
 See above.
 
 -- Hal
 
  Thank you
 
  Hector Abrach
 
 
 
 
  From:   Hal Rosenstock h...@dev.mellanox.co.il
  To: Hector Abrach habr...@tmriusa.com
  Cc: ewg@lists.openfabrics.org
  Date:   12/14/2011 08:23 PM
  Subject:Re: [ewg] OpenSM 1.5.4 Boot Problem
 
 
  --
  --
 
 
 
  Hector,
 
  On 12/14/2011 1:41 PM, Hector Abrach wrote:
  Hal,
 
  Sorry for the multiple emails, but I was thinking how it may be a
  freeze /stall rather than a time out.  One reason is that it
  doesn't send an error message, is as if the log completely dies.
 
  So nothing interesting in the log...
 
  However, in
  file osm_vendor_ibumad.c under function umad_receiver there is an
  infinite loop for(;;) which seems to die when I get to that
  previously discussed vl15_poller. I checked to see if it breaks out
  of the loop but it doesn't seem to.
 
  It never breaks out of that loop except when OpenSM is shutting down.
  That's the basic receive loop.
 
  -- Hal
 
  I'm not sure if this may be an additional hint.
  Thank you
 
  Hector Abrach
 
 
  From:  Hector Abrach habr...@tmriusa.com
  To:  Hal Rosenstock h...@dev.mellanox.co.il
  Cc:  ewg@lists.openfabrics.org
  Date:  12/14/2011 11:15 AM
  Subject:  Re: [ewg] OpenSM 1.5.4 Boot Problem
  Sent by:  ewg-boun...@lists.openfabrics.org
 
 
  -
  ---
 
 
 
  Hal,
 
  Thank you very much for the support, I am the same person from the
  gmail account so I will respond through here.
 
  Attached is a picture of the switch serial number:
 
 
 
  I am indeed using OFED 1.5.4-rc3. My experiment 

Re: [ewg] EWG/Meeting agenda for today - 28-Mar, 2011

2011-03-28 Thread Alex Netes
On 16:14 Mon 28 Mar , Tziporet Koren wrote:
 
 EWG/OFED Agenda for today:
 
 1. OFED 1.6 schedule:
 --
 - Move to kernel 2.6.38 (since its GA already, and we have not started 
 backports)
 - Ongoing work on backports - during Q2
 - First RC - end of June
   RCs every 2 weeks
 - GA - End of Aug
 
 2. OFED 1.6 main features:
 
 - Mellanox: CX3 support
 - SRIOV support for mlx4 with CX2  CX3
 - FDR support
 - New OSes support: As usual the latest OSes will be supported
 - Remove MPI packages from OFED
 - Ne management package: Alex - please send details
 
 
OpenSM main improvements:
1. Torus-2QoS routing engine
2. Performance Manager improvements: improved redirection and extended counters
support
3. Additional port balancing options for routing
4. More bug fixes

Alex.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [ANNOUNCE] opensm tarballs release

2011-03-07 Thread Alex Netes
Hi,

There is a new release of the OpenSM tarball available in:

http://www.openfabrics.org/downloads/management/

(listed in http://www.openfabrics.org/downloads/management/latest.txt)

5e9b461073f7cfbafe0207e014796f9f  opensm-3.3.9.tar.gz

All component versions are from recent master branch. Full list of changes is
below.

OpenSM:
Alex Netes (1):
opensm: fixed memory leak in multicast spanning tree calculation
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFA management tree seperation

2011-02-23 Thread Alex Netes

On 15:59 Sun 13 Feb , Jason Gunthorpe wrote:
 On Sun, Feb 13, 2011 at 09:55:50AM +0200, Alex Netes wrote:
  
  We finished the management tree seperation.
  From now on, Ira Weiny wei...@llnl.gov takes the responsibility for 
  maitaining libibmad and
  infiniband-diags. His trees are:
  
  git://git.openfabrics.org/~iraweiny/libibmad
  git://git.openfabrics.org/~iraweiny/infiniband-diags
  
  libibumad, opensm and ibsim trees stays under my responsibility:
  
  git://git.openfabrics.org/~alexnetes/libibumad
  git://git.openfabrics.org/~alexnetes/opensm
  git://git.openfabrics.org/~alexnetes/ibsim
 
 Can you please include the OpenSm 3.2.6 and related branch and tags in
 your repository?
 

Done. OpenSM 3.2.6 resides on opensm-3.2 branch as before.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [ANNOUNCE] management tarballs release

2011-02-16 Thread Alex Netes
Hi,

There is a new release of the management (OpenSM and infiniband
diagnostics) tarballs available in:

http://www.openfabrics.org/downloads/management/

(listed in http://www.openfabrics.org/downloads/management/latest.txt)

c0b24a1053ae8b0b3caf5950b3ede6dc  infiniband-diags-1.5.8.tar.gz
c2755aa360d3f29d04865ba4e2454a98  libibmad-1.3.7.tar.gz
c7575b7620615d7dfa1c7fdbbd310ec7  libibumad-1.3.7.tar.gz
df051f5f0192d369b0b904147cb045a8  opensm-3.3.8.tar.gz

All component versions are from recent master branch. Full list of
changes is below.

OpenSM:
===

Alex Netes (1):
  opensm: fixed getline pointer allocation free in osm_console_io

Eli Dorfman (Voltaire) (1):
  Wrong handling of MC create and delete traps

Hal Rosenstock (6):
  opensm/osm_state_mgr.c: Don't signal DISCOVER to SM state machine when 
already DISCOVERING
  opensm: Fix some typos
  osmtest/osmt_service.c: In osmt_run_service_records_flow, add missing 
status
  opensm/osm_ucast_ftree: When roots are not connected, update hop count 
but not lft
  opensm/osm_trap_rcv.c: No need to check for sweep for trap 145
  opensm: Add support for SwitchInfo:MulticastFDBTop

Ira Weiny (1):
  Add node/port/qos information to some error messages

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

Sasha Khapyorsky (3):
  opensm/sa: simplify osm_mcmr_rcv_find_or_create_new_mgrp() function call
  opensm/osm_node_info_rcv.c: move p_physp declaration under code block
  opensm/osm_db_files.c: malloc() return value run-time check

Stan C. Smith (2):
  replace (long*)(long) casting with transportable data type (uintptr_t)
  replace (long*)(long) casting with transportable data type (uintptr_t)

Yevgeny Kliteynik (28):
  opensm/osm_qos_policy.c: change a log message
  opensm/osm_prtn.c: removing TopSpin hack
  libvendor/osm_vendor_ibumad_sa.c: remove useless if statement
  libvendor/osm_vendor_mlx_sa.c: remove useless if statement
  opensm/osm_mtree.c: removing useless 'if' statement
  opensm/osm_sminfo_rcv.c: removing unused variable
  opensm/osm_pkey.c: removing unused function
  opensm/osm_sa_pkey_record.c: removing unused variable
  opensm/osm_sa_vlarb_record.c: removed unused variable
  opensm/osm_node_info_rcv.c: remove useless code line
  osmtest/osmtest.c: handle timeouts in PR stress test
  opensm/osm_helper.c: fix potential overrun of the array
  opensm/osm_helper.c: cosmetics - move define closer to the relevant code
  opensm/osm_mesh.c: fixing a bug in compare_switches()
  opensm/osm_subnet.c: fixing small bug in error path
  opensm/osm_db_files.c: fix small memory leak
  osmtest/osmt_slvl_vl_arb.c: handling fopen() failure
  opensm/osm_helper.c: use ARR_SIZE macro instead of hardcoded values
  osm_vl15intf.c: fixing use-after-free coredump
  opensm/osm_trap_rcv.c: fix possible core dump
  opensm/osm_ucast_ftree.c: fix small memory leak in error path
  opensm/osm_ucast_ftree.c: fixing another memory leak at error path
  opensm/osm_ucast_lash.c: small bug in calculating allocated size
  opensm/osm_pkey_mgr.c: fixing small memory leak
  opensm/osm_ucast_file.c: closing file descriptor in error path
  opensm/osm_qos_parser_y.y: fixing bunch of memory leaks on invalid values
  opensm/osm_console.c: fix memory and file descriptor leaks
  opensm/st.c: fix potential core dumps

libibumad:
==

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

Mike Heinz (1):
  FW: [PATCH] umad_send.3 (man page)

Yevgeny Kliteynik (1):
  umad.{c,h}: moving stdlib.h include from C to H file

libibmad:
=

Ira Weiny (1):
  libibmad/fields.c: Change all PortCounter names to match the Specification

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

infiniband-diags:
=

Albert Chu (4):
  add --diff support to iblinkinfo
  support --diffcheck in iblinkinfo
  Add lid and node description diff options for --diffcheck in iblinkinfo
  support --filterdownports in iblinkinfo

Alex Netes (3):
  Makefile: ChangeLog and version generation script path fix
  infiniband-diags: update shared library versions
  infiniband-diags: package versions update

Eli Dorfman (Voltaire) (2):
  infiniband-diags: Do not exit when unexpected node found
  inifiband-diags: Support Voltaire switch ISR4200

Hal Rosenstock (3):
  infiniband-diags/ibtracert: Eliminate direct route (-D) option
  infiniband-diags/saquery.c: In dump_one_mcmember_record, fix flow label 
endian
  infiniband-diags/iblinkinfo.c: Limit some queries to switches

Ira Weiny (4):
  libibmad/fields.c: Change all PortCounter names to match the Specification
  infiniband-diags: Verify timeout value specified to diagnostics
  Further timeout paramater verification (Was: [PATCH] infiniband-diags: 
Verify

[ewg] OFA management tree seperation

2011-02-12 Thread Alex Netes

We finished the management tree seperation.
From now on, Ira Weiny wei...@llnl.gov takes the responsibility for 
maitaining libibmad and
infiniband-diags. His trees are:

git://git.openfabrics.org/~iraweiny/libibmad
git://git.openfabrics.org/~iraweiny/infiniband-diags

libibumad, opensm and ibsim trees stays under my responsibility:

git://git.openfabrics.org/~alexnetes/libibumad
git://git.openfabrics.org/~alexnetes/opensm
git://git.openfabrics.org/~alexnetes/ibsim


Alex.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Current administrator for git accounts on openfabrics.org

2011-02-09 Thread Alex Netes
Hi Ira,

Who would I contact for a git account on git.openfabrics.org/git?

 Ken Strandberg k...@kenstrandberg.com is sysadmin in openfabrics.org and he 
 would be happy to assist you.

Thanks, Alex. 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg