Re: [ewg] libibnetdisc: Add grouping for Voltaire's ISR4700 switch

2010-05-06 Thread Sasha Khapyorsky
On 14:35 Tue 27 Apr , sebastien dugue wrote:
> 
>   The ISR4700 features 3 kind of boards:
> 
>   - sLB-4018 line board with a single 36 port asic
>   - sFB-4700 fabric board with a single 36 port asic
>   - sFB-4700X2 double density fabric board with 2 36 port asics
> 
>   The double density fabric board (sFB-4700X2) features external 12X
> connectors that are only an aggregation of 3 4X ports, therefore
> ext_portnum is set to match the number printed on the faceplate.
> 
> Signed-off-by: Sebastien Dugue 

Applied. Thanks.

Sasha
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Sasha, thanks for sending me that. 

Despite asking several times over the past couple of years, you're the first 
person to actually point me to a document on how to submit patches to the group.

I will be sure to adhere to that format in the future.

-Original Message-
From: Sasha Khapyorsky [mailto:sashakv...@gmail.com] On Behalf Of Sasha 
Khapyorsky
Sent: Thursday, May 06, 2010 5:03 PM
To: Mike Heinz
Cc: linux-r...@vger.kernel.org; e...@openfabrics.org
Subject: Re: [PATCH] management: adding mad_dump_fields to libibmad

On 13:27 Thu 06 May , Mike Heinz wrote:
> Sasha asked that I re-submit the patches for perfquery in a slightly 
> different format. This is the first of 3 patches.

I just asked to try to follow the normal patch submission format
described in details there:

http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=blob;f=Documentation/SubmittingPatches

So each patch will have its own subject line, commit message, etc.

> This patch adds a function to libibmad that allows the caller to dump a 
> configurable range of MAD attributes. Basically, this provides an external 
> interface to the internal function _dump_fields.
> 
> Signed Off: Michael Heinz

All three applied. Thanks.

Sasha
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors "Port All FAILED" reported

2010-05-06 Thread Ira Weiny
On Thu, 6 May 2010 14:11:24 -0700
Sasha Khapyorsky  wrote:

> On 18:09 Wed 05 May , Ira Weiny wrote:
> > 
> > 14:29:03 > ./perfquery 40 255
> > ./perfquery: iberror: failed: AllPortSelect not supported
> > 
> > It seems there is an issue with the CapabilityMask value...
> > 
> > 14:43:32 > ./perfquery 40 255
> > cap_mask 0x400  <=== my debug output
> > ./perfquery: iberror: failed: AllPortSelect not supported
> > 
> > 14:43:38 > ./saquery CPI 40
> > SA ClassPortInfo:
> > ...
> > Capability mask..0x2602
> > ...
> > 
> > Those don't match because...  perfquery has a bug...
> > 
> > perfquery is issuing a PMA query when it should be issuing a SA query.
> 
> I'm not following. How should it be related to each other SA and PM
> ClassPortInfo(s)?

It's not, I was confused...  :-D

Ira

> 
> Sasha


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] infiniband-diags/scripts/ibcheckerrs.in: emulate all ports if necessary.

2010-05-06 Thread Sasha Khapyorsky
On 09:18 Thu 06 May , Ira Weiny wrote:
> Upon thinking about this a bit more, and seeing Mikes patch.  I think that the
> patch which Mike sent some time ago is a better fix.  This will work fine for
> ibcheckerrs.  However ibcheckerrors will run AllPortSelect and then go on to
> query all the ports individually.  The patch below will cause a double read
> for each port which will kill ibcheckerrors performance on a large cluster.
> 
> Sasha, what is the status of Mikes patch?

Applied now.

Sasha
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors "Port All FAILED" reported

2010-05-06 Thread Sasha Khapyorsky
On 18:09 Wed 05 May , Ira Weiny wrote:
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
> Capability mask..0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.

I'm not following. How should it be related to each other SA and PM
ClassPortInfo(s)?

Sasha
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Sasha Khapyorsky
On 13:27 Thu 06 May , Mike Heinz wrote:
> Sasha asked that I re-submit the patches for perfquery in a slightly 
> different format. This is the first of 3 patches.

I just asked to try to follow the normal patch submission format
described in details there:

http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=blob;f=Documentation/SubmittingPatches

So each patch will have its own subject line, commit message, etc.

> This patch adds a function to libibmad that allows the caller to dump a 
> configurable range of MAD attributes. Basically, this provides an external 
> interface to the internal function _dump_fields.
> 
> Signed Off: Michael Heinz

All three applied. Thanks.

Sasha
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [RFC] libibverbs: ibv_fork_init() and libhugetlbfs

2010-05-06 Thread Roland Dreier
 > When fork support is enabled in libibverbs, madvise() is called for every
 > memory page that is registered as a memory region. Memory ranges that
 > are passed to madvise() must be page aligned and the size must be a
 > multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find
 > out the system page size and rounds all ranges passed to reg_mr() according
 > to this page size. When memory from libhugetlbfs is passed to reg_mr(), this
 > does not work as the page size for this memory range might be different
 > (e.g. 16Mb). So libibverbs would have to use the huge page size to
 > calculate a page aligned range for madvise.

Yes, Alex Vainman reaised this same issue a while ago.

 > The patch below demonstrates a possible solution for this. It parses the
 > /proc/PID/maps file when registering a memory region and decides if the
 > memory that is to be registered is part of a libhugetlbfs range or not. If 
 > so,
 > a page size of 16Mb is used to align the memory range passed to madvise().
 > 
 > We see two problems with this: it is not a very elegant solution to parse the
 > procfs file and the 16Mb are hardcoded currently. The latter point could be
 > solved by calling gethugepagesize() from libhugetlbfs, which would add a new
 > dependency to libibverbs.

I think that we cannot assume huge pages only come from libhugetlbfs --
we should support an application directly enabling huge pages (possibly
via another library too, so we can't assume that an application knows
the page size for a memory range it is about to register).

And also the 16 MB page size constant is of course not feasible -- with
all due respect, the x86 page size of 2 MB is much more likely in
practice :)  (Although perhaps the much slower PowerPC TLB refill makes
users more likely to try and use hugetlb pages ;)

Alex suggested parsing files in the same way as libhugetlbfs does to get
the page size, and that seems to be the best solution, since I don't
think the libhugetlbfs license is compatible with the BSD license for
libibverbs.

But your trick of using /proc/*/maps looks nice.  Does that only work
for libhugetlbfs or can we recognize direct mmap of hugetlb pages?

 - R.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Fwd: a forum?]

2010-05-06 Thread Brian J. Murrell
On Thu, 2010-05-06 at 14:46 -0400, David Dillow wrote: 
> 
> Maybe, but I think even that is a bad way to go. From my experience,
> users of forums are less likely to quote the context for their messages
> -- they're used to the message they are replying to being right above
> them on the web page. Gating the messages to the mailing list would just
> give us words in a vacuum.

Yeah.  Probably such a gating implementation would probably need to
quote the message that the forum posting is in reply to.

Note, that I am not proposing that a gatewayed forum would be good (I'd
just as soon see no forums implemented -- I have a real dislike for
them), but just that if enough pressure did mount to provide them,
having them gated to the list is better than stand-alone forums to
prevent fragmenting the community into two different groups
communicating amongst themselves.

b.



signature.asc
Description: This is a digitally signed message part
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Sasha asked that I re-submit the patches for perfquery in a slightly different 
format. This is the third of 3 patches.

This patch corrects the AllPortSelect error message that is generated by 
ibcheckerrors when used against switches that do not support that attribute.

Signed-off-by: Michael Heinz

- snip ---
diff --git a/infiniband-diags/scripts/ibcheckerrs.in 
b/infiniband-diags/scripts/ibcheckerrs.in
index 305379a..15bfd4a 100644
--- a/infiniband-diags/scripts/ibcheckerrs.in
+++ b/infiniband-diags/scripts/ibcheckerrs.in
@@ -155,6 +155,14 @@ nodename=`$IBPATH/smpquery $ca_info nodedesc $lid | sed -e 
"s/^Node Description:
 
 text="`eval $IBPATH/perfquery $ca_info $lid $portnum`"
 rv=$?
+if echo $text | grep -q 'AllPortSelect not supported'; then
+   if [ "$verbose" = "yes" ]; then
+   echo -n "Error check on lid $lid ($nodename) port $portname: "
+   green "AllPortSelect not supported"
+   fi
+   exit 0
+fi
+
 if echo "$text" | awk -v mono=$bw -v brief=$brief -F '[.:]*' '
 function blue(s)
 {


allportselect.patch
Description: allportselect.patch
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [Fwd: a forum?]

2010-05-06 Thread Justin Clift
On 05/07/2010 02:13 AM, Brian J. Murrell wrote:
> On Thu, 2010-05-06 at 09:05 -0700, Jeff Becker wrote:
>> FYI.
>
> Unless it was only a gateway to/from a/the mailing list(s), I would
> vehemently vote against forums.  Creating un-gatewayed formus in
> addition to keeping the list(s) would, IMHO, just serve to fragment the
> community.

A gateway'd version is an interesting idea.

As a thought, it.. the software chosen would need to be evaluated 
carefully.  Forum spammers are getting *really* good at creating spam 
accounts even with every capture image known to man enabled.  :(

We've recently closed our forum to all new registrations because we're 
getting in excess of 200 new spam users per day.  (that's not peak, 
that's consistent.  ~1500/wk per week, every week. ugh)

For an open source project like Salasaga, where we haven't even launched 
yet and are still don't have our first beta available, it really hasn't 
helped. :/

One the plus side though, for a long time before the recent spam user 
deluge, we've gotten a lot more "end user" types using our development 
snapshots and communicating experiences about them with us through our 
forums.  Including picking up volunteers that contribute directly.

Regards and best wishes,

Justin Clift


> b.

-- 
Salasaga  -  Open Source eLearning IDE
   http://www.salasaga.org
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Sasha asked that I re-submit the patches for perfquery in a slightly different 
format. This is the second of 3 patches.

This patch uses the new mad_dump_fields function to suppress the display of 
extended attributes when querying switches that do not support them.

Signed off: Michael Heinz

-- snip --
diff --git a/infiniband-diags/src/perfquery.c b/infiniband-diags/src/perfquery.c
index 00ebfff..07a9226 100644
--- a/infiniband-diags/src/perfquery.c
+++ b/infiniband-diags/src/perfquery.c
@@ -302,7 +302,10 @@ static void dump_perfcounters(int extended, int timeout, 
uint16_t cap_mask,
if (aggregate)
aggregate_perfcounters();
else
-   mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc);
+   mad_dump_fields(buf, sizeof buf, pc, sizeof pc,
+   IB_PC_FIRST_F,
+   (cap_mask & 
0x1000)?IB_PC_LAST_F:(IB_PC_RCV_PKTS_F+1));
+
} else {
if (!(cap_mask & 0x200))/* 1.2 errata: bit 9 is 
extended counter support */
IBWARN
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Sasha asked that I re-submit the patches for perfquery in a slightly different 
format. This is the first of 3 patches.

This patch adds a function to libibmad that allows the caller to dump a 
configurable range of MAD attributes. Basically, this provides an external 
interface to the internal function _dump_fields.

Signed Off: Michael Heinz

 snip --
diff --git a/libibmad/include/infiniband/mad.h 
b/libibmad/include/infiniband/mad.h
index 02ef551..0478c2b 100644
--- a/libibmad/include/infiniband/mad.h
+++ b/libibmad/include/infiniband/mad.h
@@ -1031,6 +1031,9 @@ MAD_EXPORT ib_mad_dump_fn
 mad_dump_perfcounters_xmt_disc, mad_dump_perfcounters_rcv_err,
 mad_dump_portsamples_control;
 
+MAD_EXPORT void mad_dump_fields(char *buf, int bufsz, void *val, int valsz,
+   int start, int 
end);
+
 MAD_EXPORT int ibdebug;
 
 #if __BYTE_ORDER == __LITTLE_ENDIAN
diff --git a/libibmad/src/dump.c b/libibmad/src/dump.c
index 335e190..cc9c10f 100644
--- a/libibmad/src/dump.c
+++ b/libibmad/src/dump.c
@@ -671,6 +671,11 @@ static int _dump_fields(char *buf, int bufsz, void *data, 
int start, int end)
return (int)(s - buf);
 }
 
+void mad_dump_fields(char *buf, int bufsz, void *val, int valsz, int start, 
int end)
+{
+   return _dump_fields(buf, bufsz, val, start, end);
+}
+
 void mad_dump_nodedesc(char *buf, int bufsz, void *val, int valsz)
 {
strncpy(buf, val, bufsz);
diff --git a/libibmad/src/libibmad.map b/libibmad/src/libibmad.map
index e2d0b05..5778e3e 100644
--- a/libibmad/src/libibmad.map
+++ b/libibmad/src/libibmad.map
@@ -20,6 +20,7 @@ IBMAD_1.3 {
mad_dump_nodedesc;
mad_dump_nodeinfo;
mad_dump_opervls;
+   mad_dump_fields;
mad_dump_perfcounters;
mad_dump_perfcounters_ext;
mad_dump_perfcounters_xmt_sl;


dump_fields.patch
Description: dump_fields.patch
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [PATCH] infiniband-diags/scripts/ibcheckerrs.in: emulate all ports if necessary.

2010-05-06 Thread Ira Weiny
Upon thinking about this a bit more, and seeing Mikes patch.  I think that the
patch which Mike sent some time ago is a better fix.  This will work fine for
ibcheckerrs.  However ibcheckerrors will run AllPortSelect and then go on to
query all the ports individually.  The patch below will cause a double read
for each port which will kill ibcheckerrors performance on a large cluster.

Sasha, what is the status of Mikes patch?

Ira

On Wed, 5 May 2010 19:47:20 -0700
Ira Weiny  wrote:

> 
> From: Ira Weiny 
> Date: Wed, 5 May 2010 19:49:37 -0700
> Subject: [PATCH] infiniband-diags/scripts/ibcheckerrs.in: emulate all ports 
> if necessary.
> 
> 
> Signed-off-by: Ira Weiny 
> ---
>  infiniband-diags/scripts/ibcheckerrs.in |2 +-
>  infiniband-diags/src/perfquery.c|6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/infiniband-diags/scripts/ibcheckerrs.in 
> b/infiniband-diags/scripts/ibcheckerrs.in
> index 305379a..f4eb451 100644
> --- a/infiniband-diags/scripts/ibcheckerrs.in
> +++ b/infiniband-diags/scripts/ibcheckerrs.in
> @@ -153,7 +153,7 @@ fi
>  
>  nodename=`$IBPATH/smpquery $ca_info nodedesc $lid | sed -e "s/^Node 
> Description:\.*\(.*\)/\1/"`
>  
> -text="`eval $IBPATH/perfquery $ca_info $lid $portnum`"
> +text="`eval $IBPATH/perfquery -a $ca_info $lid $portnum`"
>  rv=$?
>  if echo "$text" | awk -v mono=$bw -v brief=$brief -F '[.:]*' '
>  function blue(s)
> diff --git a/infiniband-diags/src/perfquery.c 
> b/infiniband-diags/src/perfquery.c
> index 00ebfff..5d3b606 100644
> --- a/infiniband-diags/src/perfquery.c
> +++ b/infiniband-diags/src/perfquery.c
> @@ -525,11 +525,11 @@ int main(int argc, char **argv)
>   /* ClassPortInfo should be supported as part of libibmad */
>   memcpy(&cap_mask, pc + 2, sizeof(cap_mask));/* CapabilityMask */
>   cap_mask = ntohs(cap_mask);
> - if (!(cap_mask & 0x100)) {  /* bit 8 is AllPortSelect */
> - if (!all_ports && port == ALL_PORTS)
> - IBERROR("AllPortSelect not supported");
> + if (port == ALL_PORTS && !(cap_mask & 0x100)) { /* bit 8 is 
> AllPortSelect */
>   if (all_ports)
>   all_ports_loop = 1;
> + else
> + IBERROR("AllPortSelect not supported");
>   }
>  
>   if (xmt_sl) {
> -- 
> 1.5.4.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Fwd: a forum?]

2010-05-06 Thread Brian J. Murrell
On Thu, 2010-05-06 at 09:05 -0700, Jeff Becker wrote: 
> FYI.

Unless it was only a gateway to/from a/the mailing list(s), I would
vehemently vote against forums.  Creating un-gatewayed formus in
addition to keeping the list(s) would, IMHO, just serve to fragment the
community.

b.

>  Original Message 
> Subject:  a forum?
> Date: Thu, 6 May 2010 09:41:05 -0500
> From: Tushar Kapila 
> To:   webmas...@openfabrics.org 
> 
> 
> 
> i would be nice to have a forum on your site
> any out of the box forum software ... or a few more read mes ... how a
> regular windows/ java user can use your SDP sockets 
> Regards,
> Tushar Kapila
> 
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



signature.asc
Description: This is a digitally signed message part
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [PATCHv8 06/11] ipoib: avoid ipoib over IBoE

2010-05-06 Thread Roland Dreier
OK, I applied this with just the first chunk.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [Fwd: a forum?]

2010-05-06 Thread Jeff Becker
FYI.

-jeff

 Original Message 
Subject:a forum?
Date:   Thu, 6 May 2010 09:41:05 -0500
From:   Tushar Kapila 
To: webmas...@openfabrics.org 



i would be nice to have a forum on your site
any out of the box forum software ... or a few more read mes ... how a
regular windows/ java user can use your SDP sockets 
Regards,
Tushar Kapila

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors "Port All FAILED" reported

2010-05-06 Thread Mike Heinz
Yup - I've also sent a note to Sasha what happened to the patch.

-Original Message-
From: Ira Weiny [mailto:wei...@llnl.gov] 
Sent: Thursday, May 06, 2010 11:35 AM
To: Mike Heinz; Sasha Khapyorsky
Cc: Woodruff, Robert J; linux-r...@vger.kernel.org; EWG; tzipo...@mellanox.co.il
Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported

On Thu, 6 May 2010 06:26:55 -0700
Mike Heinz  wrote:

> Ira, 
> 
> I'm pretty sure I already fixed this problem. I submitted a patch to Sasha
> back in April.

The tests below are with the current master.

git://git.openfabrics.org/~sashak/management


Ira

> 
> 
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ira Weiny
> Sent: Wednesday, May 05, 2010 9:10 PM
> To: Woodruff, Robert J; linux-r...@vger.kernel.org
> Cc: EWG; tzipo...@mellanox.co.il
> Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported
> 
> Interesting...
> 
> I have a switch which does this as well.  Tracing through the scripts shows
> that the perfquery command is failing like this.
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
> Capability mask..0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.  It
> just so happens that on some switches the result of that PMA query indicates
> AllPortSelect is available.  Patch to follow.
> 
> Ira
> 
> 
> On Wed, 5 May 2010 13:47:54 -0700
> "Woodruff, Robert J"  wrote:
> 
> > 
> > Hi guys,
> > 
> > When I run ibcheckerrors on my Mellanox switch,
> > it is reporting that Port all FAILED. 
> > 
> > From what I can tell, the switch is working fine and
> > I think that this is a bogus error from the program.
> > 
> > If this is indeed not a real problem, can the diagnostic
> > be fixed to not report this as an error ?
> > 
> > 
> > ibcheckerrors -nocolor -v -t 100
> > 
> > # Checking Switch: nodeguid 0x0002c902004046a0
> > Node check lid 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: 
> > FAILED   <
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> > 
> >  Checking Ca: nodeguid 0x0002c9030002628a
> > Node check lid 14: OK
> > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300025e0a
> > Node check lid 12: OK
> > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030002615e
> > Node check lid 15: OK
> > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e442
> > Node check lid 11: OK
> > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44e
> > Node check lid 8: OK
> > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3e6
> > Node check lid 2: OK
> > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44a
> > Node check lid 18: OK
> > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fb4
> > Node check lid 13: OK
> > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fbc
> > Node check lid 10: OK
> > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3ee
> > Node check lid 9: OK
> > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e446

Re: [ewg] ibcheckerrors "Port All FAILED" reported

2010-05-06 Thread Ira Weiny
On Thu, 6 May 2010 06:26:55 -0700
Mike Heinz  wrote:

> Ira, 
> 
> I'm pretty sure I already fixed this problem. I submitted a patch to Sasha
> back in April.

The tests below are with the current master.

git://git.openfabrics.org/~sashak/management


Ira

> 
> 
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ira Weiny
> Sent: Wednesday, May 05, 2010 9:10 PM
> To: Woodruff, Robert J; linux-r...@vger.kernel.org
> Cc: EWG; tzipo...@mellanox.co.il
> Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported
> 
> Interesting...
> 
> I have a switch which does this as well.  Tracing through the scripts shows
> that the perfquery command is failing like this.
> 
> 14:29:03 > ./perfquery 40 255
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> It seems there is an issue with the CapabilityMask value...
> 
> 14:43:32 > ./perfquery 40 255
> cap_mask 0x400  <=== my debug output
> ./perfquery: iberror: failed: AllPortSelect not supported
> 
> 14:43:38 > ./saquery CPI 40
> SA ClassPortInfo:
> ...
> Capability mask..0x2602
> ...
> 
> Those don't match because...  perfquery has a bug...
> 
> perfquery is issuing a PMA query when it should be issuing a SA query.  It
> just so happens that on some switches the result of that PMA query indicates
> AllPortSelect is available.  Patch to follow.
> 
> Ira
> 
> 
> On Wed, 5 May 2010 13:47:54 -0700
> "Woodruff, Robert J"  wrote:
> 
> > 
> > Hi guys,
> > 
> > When I run ibcheckerrors on my Mellanox switch,
> > it is reporting that Port all FAILED. 
> > 
> > From what I can tell, the switch is working fine and
> > I think that this is a bogus error from the program.
> > 
> > If this is indeed not a real problem, can the diagnostic
> > be fixed to not report this as an error ?
> > 
> > 
> > ibcheckerrors -nocolor -v -t 100
> > 
> > # Checking Switch: nodeguid 0x0002c902004046a0
> > Node check lid 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: 
> > FAILED   <
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> > Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> > 
> >  Checking Ca: nodeguid 0x0002c9030002628a
> > Node check lid 14: OK
> > Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300025e0a
> > Node check lid 12: OK
> > Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030002615e
> > Node check lid 15: OK
> > Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e442
> > Node check lid 11: OK
> > Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44e
> > Node check lid 8: OK
> > Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3e6
> > Node check lid 2: OK
> > Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e44a
> > Node check lid 18: OK
> > Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fb4
> > Node check lid 13: OK
> > Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c90300044fbc
> > Node check lid 10: OK
> > Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e3ee
> > Node check lid 9: OK
> > Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e446
> > Node check lid 4: OK
> > Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e22e
> > Node check lid 1: OK
> > Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> > 
> > # Checking Ca: nodeguid 0x0002c9030008e43e
> > Node check lid 19: OK
> > Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> > 
> >

Re: [ewg] [PATCHv8 02/11] ib_core: IBoE support only QP1

2010-05-06 Thread Eli Cohen
On Wed, May 05, 2010 at 04:12:15PM -0700, Roland Dreier wrote:
>  > @@ -795,11 +799,12 @@ static void mcast_add_one(struct ib_device *device)
>  >struct mcast_device *dev;
>  >struct mcast_port *port;
>  >int i;
>  > +  int count = 0;
>  >  
>  >if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
>  >return;
>  >  
>  > -  dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
>  > +  dev = kzalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
> 
>  > @@ -1007,7 +1010,7 @@ static void ib_sa_add_one(struct ib_device *device)
>  >e = device->phys_port_cnt;
>  >}
>  >  
>  > -  sa_dev = kmalloc(sizeof *sa_dev +
>  > +  sa_dev = kzalloc(sizeof *sa_dev +
> 
> Do you happen to remember why you needed these kmalloc -> kzalloc conversions?

I can't remember why. I do have this habbit of prefering kzalloc
over kmalloc because it saves troubles sometimes.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors "Port All FAILED" reported

2010-05-06 Thread Mike Heinz
Ira, 

I'm pretty sure I already fixed this problem. I submitted a patch to Sasha back 
in April.


-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ira Weiny
Sent: Wednesday, May 05, 2010 9:10 PM
To: Woodruff, Robert J; linux-r...@vger.kernel.org
Cc: EWG; tzipo...@mellanox.co.il
Subject: Re: [ewg] ibcheckerrors "Port All FAILED" reported

Interesting...

I have a switch which does this as well.  Tracing through the scripts shows
that the perfquery command is failing like this.

14:29:03 > ./perfquery 40 255
./perfquery: iberror: failed: AllPortSelect not supported

It seems there is an issue with the CapabilityMask value...

14:43:32 > ./perfquery 40 255
cap_mask 0x400  <=== my debug output
./perfquery: iberror: failed: AllPortSelect not supported

14:43:38 > ./saquery CPI 40
SA ClassPortInfo:
...
Capability mask..0x2602
...

Those don't match because...  perfquery has a bug...

perfquery is issuing a PMA query when it should be issuing a SA query.  It
just so happens that on some switches the result of that PMA query indicates
AllPortSelect is available.  Patch to follow.

Ira


On Wed, 5 May 2010 13:47:54 -0700
"Woodruff, Robert J"  wrote:

> 
> Hi guys,
> 
> When I run ibcheckerrors on my Mellanox switch,
> it is reporting that Port all FAILED. 
> 
> From what I can tell, the switch is working fine and
> I think that this is a bogus error from the program.
> 
> If this is indeed not a real problem, can the diagnostic
> be fixed to not report this as an error ?
> 
> 
> ibcheckerrors -nocolor -v -t 100
> 
> # Checking Switch: nodeguid 0x0002c902004046a0
> Node check lid 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED  
>  <
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
> Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
> 
>  Checking Ca: nodeguid 0x0002c9030002628a
> Node check lid 14: OK
> Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300025e0a
> Node check lid 12: OK
> Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030002615e
> Node check lid 15: OK
> Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e442
> Node check lid 11: OK
> Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44e
> Node check lid 8: OK
> Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3e6
> Node check lid 2: OK
> Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e44a
> Node check lid 18: OK
> Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fb4
> Node check lid 13: OK
> Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300044fbc
> Node check lid 10: OK
> Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e3ee
> Node check lid 9: OK
> Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e446
> Node check lid 4: OK
> Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e22e
> Node check lid 1: OK
> Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c9030008e43e
> Node check lid 19: OK
> Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000345
> Node check lid 6: OK
> Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0090270002000335
> Node check lid 5: OK
> Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
> 
> # Checking Ca: nodeguid 0x0002c90300028238
> Node check lid 3: OK
> Error check on lid 3 (cst-linux HCA-1) port 1: OK
> 
> ## Summary: 17 nodes checked, 0 bad nodes found
> ##

Re: [ewg] [PATCHv8 06/11] ipoib: avoid ipoib over IBoE

2010-05-06 Thread Eli Cohen
On Wed, May 05, 2010 at 03:27:51PM -0700, Roland Dreier wrote:
>  > @@ -1383,6 +1385,9 @@ static void ipoib_remove_one(struct ib_device 
> *device)
>  >dev_list = ib_get_client_data(device, &ipoib_client);
>  >  
>  >list_for_each_entry_safe(priv, tmp, dev_list, list) {
>  > +  if (rdma_port_link_layer(device, priv->port) != 
> IB_LINK_LAYER_INFINIBAND)
>  > +  continue;
> 
> Why do we need this chunk here?  How could a netdev get on our list if
> we never create IPoIB interfaces for IBoE ports?
> 

Right, this is not necessary and can be removed.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 03/11] IB/umad: Enable support only for IB ports

2010-05-06 Thread Eli Cohen
On Wed, May 05, 2010 at 03:11:09PM -0700, Roland Dreier wrote:
> Why do we not allow umad for IBoE ports?  I understand there's no QP0
> but why can't userspace use QP1 just like for IB link layer ports?

Currently QP1 is only used by the CM protocol which is implemented in
the kernel.
Since we handle the iboe specific flow in the cma rather than the SA,
there is no need to expose qp1 to userspace.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ofa_1_5_kernel 20100506-0200 daily build status

2010-05-06 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-186.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.25
Passed on ia64 with linux-2.6.26
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 01/11] ib core: Add link layer property to ports

2010-05-06 Thread Eli Cohen
On Wed, May 05, 2010 at 03:19:50PM -0700, Roland Dreier wrote:
> Hi Eli,
> 
> I'm hoping to get this IBoE stuff in for 2.6.35.  I started an "iboe"
> branch in my tree (similar to the xrc branch I've been carrying for a
> while), and I added this patch in, except I renamed
> rdma_port_link_layer() to rdma_port_get_link_layer().  This seems to
> match rdma_node_get_transport() better.
> 
> In any case as I add patches to my branch, you can stop worrying about
> them, which should make keeping this series updated easier.
> 

Hi Roland,

I am glad to hear this and will be happy to help.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores

2010-05-06 Thread Eli Cohen
On Wed, May 05, 2010 at 12:55:54PM -0700, Roland Dreier wrote:
>  > We found it in performance work of our EN (10G) driver
> 
> By the way, it would certainly make sense for the ethernet driver to use
> a number of queues that matches num_online_cpus() at the time the
> interface is brought up.  Since we can't change the # of MSI-X vectors
> very easily I think we need to allow for the possible CPUs, but bouncing
> a net interface seems lighter weight to me.
> 
> Although perhaps reloading a driver on CPU hotplug is OK too?
> 

Yes, we have a system where num_possible_cpus is 32 and
num_online_cpus is 16. It's a RH5.4 and the kernel has no problem
allocating 33 MSI-X vectors. The point is that using more than one EQ
per CPU core does not buy us anything; in fact it can contiribute to a
higher rate of interrupts since the same EQ serves less CQs and the
chances for coalescing EQEs are lower.

So what do you think about the following patch to mlx4_en:


diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c
index 21786ad..07c0779 100644
--- a/drivers/net/mlx4/en_cq.c
+++ b/drivers/net/mlx4/en_cq.c
@@ -49,11 +49,12 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv,
 {
struct mlx4_en_dev *mdev = priv->mdev;
int err;
+   int num_active_vectors = min_t(int, num_online_cpus(), 
mdev->dev->caps.num_comp_vectors);
 
cq->size = entries;
if (mode == RX) {
cq->buf_size = cq->size * sizeof(struct mlx4_cqe);
-   cq->vector   = ring % mdev->dev->caps.num_comp_vectors;
+   cq->vector   = ring % num_active_vectors;
} else {
cq->buf_size = sizeof(struct mlx4_cqe);
cq->vector   = 0;
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [RFC] libibverbs: ibv_fork_init() and libhugetlbfs

2010-05-06 Thread Alexander Schmidt
Hi all,

we are trying to make use of libhugetlbfs in an application that relies on
ibv_fork_init() to enable fork() support. The problem we are running into is
that calls to the madvise system call fail when registering a memory region
for memory that is provided by libhugetlbfs. We have written a preliminary
fix (see below) for this and are looking for comments / feedback to get an
acceptable solution.

When fork support is enabled in libibverbs, madvise() is called for every
memory page that is registered as a memory region. Memory ranges that
are passed to madvise() must be page aligned and the size must be a
multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find
out the system page size and rounds all ranges passed to reg_mr() according
to this page size. When memory from libhugetlbfs is passed to reg_mr(), this
does not work as the page size for this memory range might be different
(e.g. 16Mb). So libibverbs would have to use the huge page size to
calculate a page aligned range for madvise.

As huge pages are provided to the application "under the hood" when
preloading libhugetlbfs, the application does not have any knowledge about
when it registers a huge page or a usual page.

The patch below demonstrates a possible solution for this. It parses the
/proc/PID/maps file when registering a memory region and decides if the
memory that is to be registered is part of a libhugetlbfs range or not. If so,
a page size of 16Mb is used to align the memory range passed to madvise().

We see two problems with this: it is not a very elegant solution to parse the
procfs file and the 16Mb are hardcoded currently. The latter point could be
solved by calling gethugepagesize() from libhugetlbfs, which would add a new
dependency to libibverbs.

We are highly interested in reviews, comments, suggestions to get this solved
soon. Thanks!

Signed-off-by: Alexander Schmidt 
---
 src/memory.c |   50 +++---
 1 file changed, 47 insertions(+), 3 deletions(-)

--- libibverbs-1.1.2.orig/src/memory.c
+++ libibverbs-1.1.2/src/memory.c
@@ -40,6 +40,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "ibverbs.h"
 
@@ -54,6 +56,8 @@
 #define MADV_DOFORK11
 #endif
 
+#define HUGE_PAGE_SIZE (16 * 1024 * 1024)
+
 struct ibv_mem_node {
enum {
IBV_RED,
@@ -446,6 +450,48 @@ static struct ibv_mem_node *__mm_find_st
return node;
 }
 
+static void get_range_address(uintptr_t *start, uintptr_t *end, void *base, 
size_t size)
+{
+   pid_t pid;
+   FILE *file;
+   char buf[1024], lib[128];
+   int range_page_size = page_size;
+
+   pid = getpid();
+   snprintf(buf, sizeof(buf), "/proc/%d/maps", pid);
+
+   file = fopen(buf, "r");
+   if (!file)
+   goto out;
+
+   while (fgets(buf, sizeof(buf), file) != NULL) {
+   int n;
+   char *substr;
+   uintptr_t range_start, range_end;
+
+   n = sscanf(buf, "%lx-%lx %*s %*x %*s %*u %127s",
+   &range_start, &range_end, &lib);
+
+   if (n < 3)
+   continue;
+
+   substr = strstr(lib, "libhugetlbfs");
+   if (substr) {
+   if ((uintptr_t) base >= range_start &&
+   (uintptr_t) base < range_end) {
+   range_page_size = HUGE_PAGE_SIZE;
+   break;
+   }
+   }
+   }
+   fclose(file);
+
+out:
+   *start = (uintptr_t) base & ~(range_page_size - 1);
+   *end   = ((uintptr_t) (base + size + range_page_size - 1) &
+~(range_page_size - 1)) - 1;
+}
+
 static int ibv_madvise_range(void *base, size_t size, int advice)
 {
uintptr_t start, end;
@@ -458,9 +504,7 @@ static int ibv_madvise_range(void *base,
 
inc = advice == MADV_DONTFORK ? 1 : -1;
 
-   start = (uintptr_t) base & ~(page_size - 1);
-   end   = ((uintptr_t) (base + size + page_size - 1) &
-~(page_size - 1)) - 1;
+   get_range_address(&start, &end, base, size);
 
pthread_mutex_lock(&mm_mutex);
 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg