from:"Or Gerlitz"

Re: linux-next: manual merge of the net-next tree with the net tree

2018-12-19 Thread Or Gerlitz

On Thu, Dec 20, 2018 at 4:47 AM Stephen Rothwell  wrote:
>
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>
> between commit:
>
>   8956f0014ea5 ("net/mlx5e: Fix default amount of channels for VF 
> representors")
>
> from the net tree and commit:
>
>   d9ee0491c2ff ("net/mlx5e: Use dedicated uplink vport netdev representor")
>
> from the net-next tree.
>
> I fixed it up (I just used the net-next tree version) and can carry the
> fix as necessary.  [..]

Yes, this is correct, thanks!

Re: linux-next: manual merge of the net-next tree with the net tree

2018-12-17 Thread Or Gerlitz

On Mon, Dec 17, 2018 at 11:29 PM Saeed Mahameed
 wrote:
> On Sun, Dec 16, 2018 at 4:25 PM Stephen Rothwell  
> wrote:

> > I fixed it up (see below) and can carry the fix as necessary. This

> Looks good to me.

here too


> > Today's linux-next merge of the net-next tree got a conflict in:
> >   drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > between commit:

> >   154e62abe9cd ("net/mlx5e: Properly initialize flow attributes for slow 
> > path eswitch rule deletion")
> > from the net tree and commit:

> >   e88afe759a49 ("net/mlx5e: Err if asked to mirror a goto chain tc eswitch 
> > rule")
> >   e85e02bad29e ("net/mlx5: E-Switch, Rename esw attr mirror count field")
> > from the net-next tree.

Just a note,

e88afe759a49  ("net/mlx5e: Err if asked to mirror a goto chain tc
eswitch rule")i

s from net  and not net-next

see it here [1] among the top 10 patches

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/log/?qt=grep&q=mlx5

Re: linux-next: manual merge of the net-next tree with the net tree

2018-12-10 Thread Or Gerlitz

On Mon, Dec 10, 2018 at 3:38 AM Stephen Rothwell  wrote:
>
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   net/sched/cls_flower.c
>
> between commit:
>
>   35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under 
> skip_sw")
>
> from the net tree and commit:
>
>   5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This

[..]

The fix LGTM, thanks for addressing this.

Or.

Re: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same devices repeatedly

2018-07-14 Thread Or Gerlitz

On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang  wrote:
> When a CX5 device is configured in dual-port RoCE mode, after creating
> many VFs against port 1, creating the same number of VFs against port 2
> will flood kernel/syslog with something like
> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> affiliated."
>
> So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
> shouldn't repeatedly attempt to bind the new mpi data unit to every device
> on the list until it finds an unbound device.

Daniel,

What is mpi data unit?

Or.

>
> Reported-by: Gerald Gibson 
> Signed-off-by: Qing Huang 
> ---
>  drivers/infiniband/hw/mlx5/main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index b3ba9a2..1ddd1d3 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct 
> mlx5_core_dev *mdev, u8 port_num)
>
> mutex_lock(&mlx5_ib_multiport_mutex);
> list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
> -   if (dev->sys_image_guid == mpi->sys_image_guid)
> +   if (dev->sys_image_guid == mpi->sys_image_guid &&
> +   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
> bound = mlx5_ib_bind_slave_port(dev, mpi);
>
> if (bound) {
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions

2018-05-20 Thread Or Gerlitz

On Mon, May 21, 2018 at 12:33 AM, Marcelo Ricardo Leitner
 wrote:
> On Mon, May 21, 2018 at 12:13:06AM +0300, Or Gerlitz wrote:
>> On Sun, May 20, 2018 at 1:20 AM, Marcelo Ricardo Leitner
>>  wrote:
>> > On Mon, May 14, 2018 at 05:27:14PM +0300, Vlad Buslov wrote:
>> >> Substitute calls to action insert function with calls to action insert
>> >> unique function that warns if insertion overwrites index in idr.
>> >
>> > I know this patch may be gone on V2, but a general comment, please use
>> > the function names themselves instead of a textualized version. I.e.,
>> > s/action insert unique/tcf_idr_insert_unique/
>>
>> disagree. While doing reviews I found out that if I ask the developer
>> to use higher
>> level / descriptive language and specifically to avoid putting
>> variable / function names in
>> patch titles and change logs, the quality gets ++ big time, vs if the
>> developer is allowed to say
>>
>> net/mlx5: Changed add_vovo_bobo()
>>
>> Added variable do_it_right to add_vovo_bobo(), now we are terribly good.
>
> In your example I agree that it is not helping and it is even allowing
> such empty changelog, just as in the section I highlighted, the
> descriptive language is also not helping IMHO.
>
> I had to read it 3 times to make sure I wasn't missing a modifier word
> when comparing the two functions and well, it's just saying
> "Substitute calls to foo function to bar function". I don't see how
> the textualized version helps in this case while, at least in this
> one, I would have visually recognized the function names way faster.
>
> Sounds like 2 bad examples for either approach.

Properly written descriptive language is maybe harder to come up with
(specifically for those of us who are not native english speakers, like me)
but is more expressive and helpful for reviews and maintenance. Lets try
to add Vlad to come up with the right higher language and not ask him to
quote function and variable named.

Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions

2018-05-20 Thread Or Gerlitz

On Sun, May 20, 2018 at 1:20 AM, Marcelo Ricardo Leitner
 wrote:
> On Mon, May 14, 2018 at 05:27:14PM +0300, Vlad Buslov wrote:
>> Substitute calls to action insert function with calls to action insert
>> unique function that warns if insertion overwrites index in idr.
>
> I know this patch may be gone on V2, but a general comment, please use
> the function names themselves instead of a textualized version. I.e.,
> s/action insert unique/tcf_idr_insert_unique/

disagree. While doing reviews I found out that if I ask the developer
to use higher
level / descriptive language and specifically to avoid putting
variable / function names in
patch titles and change logs, the quality gets ++ big time, vs if the
developer is allowed to say

net/mlx5: Changed add_vovo_bobo()

Added variable do_it_right to add_vovo_bobo(), now we are terribly good.

Re: Setting large MTU size on slave interfaces may stall the whole system

2017-12-13 Thread Or Gerlitz

On Tue, Dec 12, 2017 at 5:21 AM, Qing Huang  wrote:
> Hi,
>
> We found an issue with the bonding driver when testing Mellanox devices.
> The following test commands will stall the whole system sometimes, with
> serial console
> flooded with log messages from the bond_miimon_inspect() function. Setting
> mtu size
> to be 1500 seems okay but very rarely it may hit the same problem too.
>
> ip address flush dev ens3f0
> ip link set dev ens3f0 down
> ip address flush dev ens3f1
> ip link set dev ens3f1 down
> [root@ca-hcl629 etc]# modprobe bonding mode=0 miimon=250 use_carrier=1
> updelay=500 downdelay=500
> [root@ca-hcl629 etc]# ifconfig bond0 up
> [root@ca-hcl629 etc]# ifenslave bond0 ens3f0 ens3f1
> [root@ca-hcl629 etc]# ip link set bond0 mtu 4500 up

> Seiral console output:
>
> ** 4 printk messages dropped ** [ 3717.743761] bond0: link status down for
> interface ens3f0, disabling it in 500 ms
[..]


> It seems that when setting a large mtu size on an RoCE interface, the RTNL
> mutex may be held too long by the slave
> interface, causing bond_mii_monitor() to be called repeatedly at an interval
> of 1 tick (1K HZ kernel configuration) and kernel to become unresponsive.

Did you try/managed to reproduce that also with other NIC drivers?

Or.

Re: [PATCH] [net-next] net/mlx5e: select CONFIG_MLXFW

2017-06-28 Thread Or Gerlitz

On Wed, Jun 28, 2017 at 11:10 PM, Arnd Bergmann  wrote:
> With the introduction of mlx5 firmware flash support, we get a link
> error with CONFIG_MLXFW=m and CONFIG_MLX5_CORE=y:
>
> drivers/net/ethernet/mellanox/mlx5/core/fw.o: In function 
> `mlx5_firmware_flash':
> fw.c:(.text+0x9d4): undefined reference to `mlxfw_firmware_flash'

Thanks Arnd, I got a report on that from Jakub but you were before me here..

> We could have a more elaborate method to force MLX5 to be a loadable
> module in this case, but the easiest fix seems to be to always enable
> MLXFW as well, like we do for CONFIG_MLXSW_SPECTRUM, which is the other
> user of mlxfw_firmware_flash.

We would not want to force mlx5 users to build mlxfw.

So lets either use the more elaborate method or maybe instead of using
IS_ENABLED in mlxfw.h use IS_REACHABLE (this was suggested by Jakub)

Or.

Re: [PATCH] IB/IPoIB: Check the headroom size

2017-04-25 Thread Or Gerlitz

On Tue, Apr 25, 2017 at 2:11 PM, Erez Shitrit  wrote:
> On Tue, Apr 25, 2017 at 1:32 PM, Or Gerlitz  wrote:
>> On Tue, Apr 25, 2017 at 12:55 PM, Honggang LI  wrote:
>>> From: Honggang Li 
>>>
>>> Minimal hard_header_len set by bond_compute_features is ETH_HLEN, which
>>> is smaller than IPOIB_HARD_LEN. ipoib_hard_header should check the
>>> size of headroom to avoid skb_under_panic.
>>
>> sounds terrible, ipoib bonding is supported since ~2007, thanks for
>> reporting on that.
>>
>>> [  122.871493] ipoib_hard_header: skb->head= 8808179d9400, skb->data= 
>>> 8808179d9420, skb_headroom= 0x20
>>> [  123.055400] bond0: Releasing backup interface mthca_ib1
>>> [  123.560529] bond_compute_features:1112 bond0 bond_dev->hard_header_len = 
>>> 14
>>> [  123.568822] CPU: 0 PID: 12336 Comm: ifdown-ib Not tainted 4.9.0-debug #1
>>
>> did you generate this trace by calling dump_stack or this is existing
>> kernel code.
>>
>>> Fixes: fc791b633515 ('IB/ipoib: move back IB LL address into the hard 
>>> header')
>>
>> this is more of WA to avoid some crash or failure but not fixing the
>> actual problem
>>
>> Erez, can you comment?
>
> We saw that after commit fc791b633515, it happened while removing bond
> interface after its slaves (ipoib interface) removed.
> At that point the bond interface sets its dev_harheader_len to be as
> eth interfaces (14 instead of 24), and if a process which doesn't
> aware of the slaves removal or was at the middle of the sending tries
> to send (igmp) packet it goes to ipoib with no space in the skb for
> it, and here comes the panic.

thanks for the info. Is this bug there since ipoib/bonding day one
(and hence my bug...)
or was indeed introduced later? if later, can you explain how
fc791b633515 introduced
that or you only know it by bisection?

> I agree with you that this fix is w/a, and it is a fix in the data
> path for all the packets while the panic is in a control flow. It
> probably should be fixed in the bonding driver.

so what's your suggestion? fc791b633515 is 6m old, and it means the bug
is in stable kernels and probably also in inbox drivers

Or.

Re: [PATCH] IB/IPoIB: Check the headroom size

2017-04-25 Thread Or Gerlitz

On Tue, Apr 25, 2017 at 12:55 PM, Honggang LI  wrote:
> From: Honggang Li 
>
> Minimal hard_header_len set by bond_compute_features is ETH_HLEN, which
> is smaller than IPOIB_HARD_LEN. ipoib_hard_header should check the
> size of headroom to avoid skb_under_panic.

sounds terrible, ipoib bonding is supported since ~2007, thanks for
reporting on that.

> [  122.871493] ipoib_hard_header: skb->head= 8808179d9400, skb->data= 
> 8808179d9420, skb_headroom= 0x20
> [  123.055400] bond0: Releasing backup interface mthca_ib1
> [  123.560529] bond_compute_features:1112 bond0 bond_dev->hard_header_len = 14
> [  123.568822] CPU: 0 PID: 12336 Comm: ifdown-ib Not tainted 4.9.0-debug #1

did you generate this trace by calling dump_stack or this is existing
kernel code.

> Fixes: fc791b633515 ('IB/ipoib: move back IB LL address into the hard header')

this is more of WA to avoid some crash or failure but not fixing the
actual problem

Erez, can you comment?

Or.

Re: [PATCH] [net-next] net/mlx5e: fix another maybe-uninitialized false-positive

2017-02-05 Thread Or Gerlitz

On Fri, Feb 3, 2017 at 6:37 PM, Arnd Bergmann  wrote:
> In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed 
> a
> gcc warning for the ipv4 offload handling. Now we get the same warning for the
> added ipv6 support:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' 
> may be used uninitialized in this function [-Wmaybe-uninitialized]
>
> We can apply the same workaround here as well.
>
> Fixes: ce99f6b97fcd ("net/mlx5e: Support SRIOV TC encapsulation offloads for 
> IPv6 tunnels")
> Signed-off-by: Arnd Bergmann 

frustrating... I don't see the warning with gcc 5.3.1 [1], but still,
the patch is OKay

Acked-by: Or Gerlitz 

[1] I used #make KCFLAGS='-Wmaybe-uninitialized'
M=drivers/net/ethernet/mellanox/mlx5/core   -j something

Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning

2017-01-12 Thread Or Gerlitz

On Thu, Jan 12, 2017 at 6:04 PM, Arnd Bergmann  wrote:
> On Thursday, January 12, 2017 5:21:49 PM CET Or Gerlitz wrote:


>> When I build here without CONFIG_INET in my system, the build goes fine
>> with this approach. However, we're pretty sure that in the past we got
>> 0-day report from the kbuild test robot where he was unhappy that we
>> make the ip_route_output_key call without being wrapped with that #if
>> IS_ENABLED(CONFIG_INET) -- so, we don't want to go there again... thoughts?

> I went back and forth between the two versions, either leaving the #if
> in place, or using the if(IS_ENABLED()) check to be really sure that
> we can't get compile error here.

> I did check that ip_route_output_key() is always declared, but now
> I see that net/route.h might not always be included from en_tc.c
> if CONFIG_INET is disabled (I don't see how it gets included, but
> it obviously is when CONFIG_INET is turned on).

> Adding an explicit include of that file should probably avoid the
> case you ran into earlier, but for I agree it's safer to not rely
> on that here for a bugfix, and just leave the #ifdef. Do you want to
> modify it yourself, or should I spin a new version with that?

I can do that next week, thanks

Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning

2017-01-12 Thread Or Gerlitz


On 1/11/2017 11:14 PM, Arnd Bergmann wrote:

As found by Olof's build bot, today's mainline kernel gained a harmless
warning about a potential uninitalied variable reference:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 
'parse_tc_fdb_actions':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may 
be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was 
declared here

This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair that
gcc is unfortunately unable to completely figure out. Replacing it with
PTR_ERR_OR_ZERO makes the code more understandable to gcc so it no longer
warns.


can you elaborate on this a little further?


Hadar Hen Zion already attempted to fix the warning earlier by adding
fake initializations, but that ended up just making the code worse without
fully addressing all warnings, so I'm reverting it now that it is no longer 
needed.


ok, so if your approach eliminates the warning on out_dev and also on 
the variables for which Hadar added the faked initializers, I guess we 
should be fine with this change (saw your reply on my other comment), 
just another question:



In order to avoid pulling a variable declaration into the #ifdef, I'm
removing it in favor of a more readable 'if()' statement here that has the same 
effect.


When I build here without CONFIG_INET in my system, the build goes fine 
with this approach. However, we're pretty sure that in the past we got 
0-day report from the kbuild test robot where he was unhappy that we 
make the ip_route_output_key call without being wrapped with that #if 
IS_ENABLED(CONFIG_INET) -- so, we don't want to go there again... thoughts?


Or.

Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning

2017-01-12 Thread Or Gerlitz


On 1/11/2017 11:14 PM, Arnd Bergmann wrote:

@@ -666,14 +666,15 @@ static int mlx5e_route_lookup_ipv4(struct mlx5e_priv 
*priv,
struct rtable *rt;
struct neighbour *n = NULL;
int ttl;
+   int ret;
+
+   if (!IS_ENABLED(CONFIG_INET))
+   return -EOPNOTSUPP;
  
-#if IS_ENABLED(CONFIG_INET)

rt = ip_route_output_key(dev_net(mirred_dev), fl4);
-   if (IS_ERR(rt))
-   return PTR_ERR(rt);
-#else
-   return -EOPNOTSUPP;
-#endif
+   ret = PTR_ERR_OR_ZERO(rt);
+   if (ret)
+   return ret;


but this means that if we got NULL from ip_route_output_key, we will 
return success (0) here which is wrong.

Re: [PULL REQUEST] Please pull rdma.git

2016-11-19 Thread Or Gerlitz

On Fri, Nov 18, 2016 at 4:01 AM, Doug Ledford  wrote:
> On 11/17/2016 5:24 PM, Or Gerlitz wrote:

[...]

> I agree with you.  It doesn't fix your patch.  The commit message can
> still be fixed up.

>> Please do not send it to Linus and wait for them to respond. I
>> disagree that it fixes my commit b/c my commit was prior to when
>> route-able RoCE  was introduced and on that time TOS had no relation.

> I agree.  A better fix tag would be the commit that added RoCEv2 support.

But this is the smaller part of the problem. The bigger part is that I
have asked for clarifications on the patch and they didn't provide
anything. So if you are picking patches where a reviewer comments are
ignored, what lesson are you teaching the submitter, that he can just
continue with this practice? why you letting this go that way?

>> does a tiny enhancement for a 10y old commit of Roland, why you think
>> we need it in 4.9-rc6 or 7??

> I don't, it's in the mlx-next branch which means I'll queue it up for
> the 4.10 merge window.  I have no plan on sending that branch for 4.9-rc.

Are you going to comment on that to the submitter? if not, they are
going to continue with this practice.

How are we supposed to realize from patchworks + your github branches
that patches that were submitted for 4.9-rc are picked for 4.10? this
is very confusing and error prone too.

Please comment also on the bunch of patches I pointed you where the
copy you have picked into your tree (pulled it from somewhere?) isn't
what was submitted.

Or.

Re: [PULL REQUEST] Please pull rdma.git

2016-11-17 Thread Or Gerlitz

On Thu, Nov 17, 2016 at 9:44 PM, Doug Ledford  wrote:
> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
>> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>>> Hi Linus,
>>>
>>> Due to various issues, I've been away and couldn't send a pull request
>>> for about three weeks.  There were a number of -rc patches that built up
>>> in the meantime (some where there already from the early -rc stages).
>>> Obviously, there were way too many to send now, so I tried to pare the
>>> list down to the more important patches for the -rc cycle.

Hi Doug,

Except for the hfi1 patches, all the rest (core, mlx5, mlx4 and rxe)
are marked now as only 21 hours old in your 4.9-rc branch and they
seems be made from you picking partial subsets of multiple series,
with none of them acked by you on the list.

If you agree that I am describing things correctly - how are we
expected to follow on your patch picking? I find it sort of impossible
and error prone.

>> Are you adding the rest to your for-next branch? We would like to have
>> enough time to check that nothing is lost.

> Yes, it's already there in the mlx-next branch on github.

Re the patches there, this one

IB/mlx4: Set traffic class in AH

"Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c41 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')"

claims to fix my commit, I have approached Leon and Co for
clarifications/questions over the list on the patch and nothing was
answered.

Please do not send it to Linus and wait for them to respond. I
disagree that it fixes my commit b/c my commit was prior to when
route-able RoCE  was introduced and on that time TOS had no relation.

and this one

"IB/mlx4: Put non zero value in max_ah device attribute

Use INT_MAX since this is the max value the attribute can hold, though
hardware capability is unlimited.

Fixes: 225c7b1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')"

does a tiny enhancement for a 10y old commit of Roland, why you think
we need it in 4.9-rc6 or 7??

Or.

Re: linux-next: manual merge of the net-next tree with the net tree

2016-11-13 Thread Or Gerlitz

On Thu, Nov 10, 2016 at 1:50 AM, Stephen Rothwell  wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>
> between commit:
>   ee39fbc4447d ("net/mlx5: E-Switch, Set the actions for offloaded rules 
> properly")
> from the net tree and commit:
>   66958ed906b8 ("net/mlx5: Support encap id when setting new steering entry")
> from the net-next tree.

> I fixed it up (see below) and can carry the fix as necessary.

Thanks Stephen, the fix is correct. Dave will hit the conflict the
next time he rebases
net-next on net and will solve it there. Hence the conflict will not
show up in linux-next
once you re-peek net-next.

Or.

> diff --cc drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index d239f5d0ea36,50fe8e8861bb..
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@@ -57,14 -58,14 +58,15 @@@ mlx5_eswitch_add_offloaded_rule(struct
> if (esw->mode != SRIOV_OFFLOADS)
> return ERR_PTR(-EOPNOTSUPP);
>
>  -  flow_act.action = attr->action;
>  +  /* per flow vlan pop/push is emulated, don't set that into the 
> firmware */
> -   action = attr->action & ~(MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH | 
> MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
> ++  flow_act.action = attr->action & ~(MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 
> | MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
>
> -   if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
> -   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
> -   dest.vport_num = attr->out_rep->vport;
> -   action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
> -   } else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
> +   if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
> +   dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
> +   dest[i].vport_num = attr->out_rep->vport;
> +   i++;
> +   }
> +   if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
> counter = mlx5_fc_create(esw->dev, true);
> if (IS_ERR(counter))
> return ERR_CAST(counter);

Re: [PATCH 1/2] net/mlx5e: shut up maybe-uninitialized warning

2016-10-01 Thread Or Gerlitz

On Fri, Sep 30, 2016 at 7:17 PM, Arnd Bergmann  wrote:
> Build-testing this driver with -Wmaybe-uninitialized gives a new 
> false-positive
> warning that I can't really explain:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 
> 'mlx5e_configure_flower':
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:509:3: error: 'old_attr' may 
> be used uninitialized in this function [-Werror=maybe-uninitialized]
>
> It's obvious from the code that 'old_attr' is initialized whenever 'old'
> is non-NULL here. The warning appears with all versions I tested from gcc-4.7
> through gcc-6.1, and I could not come up with a way to rewrite the function
> in a more readable way that avoids the warning, so I'm adding another
> initialization to shut it up.
>
> Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads")
> Signed-off-by: Arnd Bergmann 

Yeah, this is clearly false positive and I was sure that newish GCCs
don't show that, but you are the master here.. so FWIW

Acked-by: Or Gerlitz

Re: [PATCH] net: rtnl: avoid uninitialized data in IFLA_VF_VLAN_LIST handling

2016-09-30 Thread Or Gerlitz

On Fri, Sep 30, 2016 at 7:13 PM, Arnd Bergmann  wrote:
> With the newly added support for IFLA_VF_VLAN_LIST netlink messages,
> we get a warning about potential uninitialized variable use in
> the parsing of the user input when enabling the -Wmaybe-uninitialized
> warning:
>
> net/core/rtnetlink.c: In function 'do_setvfinfo':
> net/core/rtnetlink.c:1756:9: error: 'ivvl$' may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>
> I have not been able to prove whether it is possible to arrive in
> this code with an empty IFLA_VF_VLAN_LIST block, but if we do,
> then ndo_set_vf_vlan gets called with uninitialized arguments.
>
> This adds an explicit check for an empty list, making it obvious
> to the reader and the compiler that this cannot happen.
>
> Fixes: 79aab093a0b5 ("net: Update API for VF vlan protocol 802.1ad support")
Added the authors of the above patch

> Signed-off-by: Arnd Bergmann

Re: [patch] devlink: clean up a condition

2016-06-16 Thread Or Gerlitz


On 6/16/2016 9:50 AM, Dan Carpenter wrote:

Presumably having a _get() function implies that we also have a _set()
function but lets make it match when we're calling.

Signed-off-by: Dan Carpenter 

diff --git a/net/core/devlink.c b/net/core/devlink.c
index a4f88cb..b2e592a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1461,7 +1461,7 @@ static int devlink_nl_cmd_eswitch_mode_set_doit(struct 
sk_buff *skb,
  
  	mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
  
-	if (ops && ops->eswitch_mode_get)

+   if (ops && ops->eswitch_mode_set)
return ops->eswitch_mode_set(devlink, mode);
return -EOPNOTSUPP;
  }



good catch Dan, we will incorporate that into the patch set

Re: [PATCH net-next v3 6/7] vmxnet3: introduce command to register memory region

2016-06-13 Thread Or Gerlitz

On Tue, Jun 14, 2016 at 4:50 AM, Shrikrishna Khare  wrote:

> +++ b/drivers/net/vmxnet3/vmxnet3_defs.h
> @@ -81,6 +81,7 @@ enum {
> VMXNET3_CMD_RESERVED2,
> VMXNET3_CMD_RESERVED3,
> VMXNET3_CMD_SET_COALESCE,
> +   VMXNET3_CMD_REGISTER_MEMREGS,
>
> VMXNET3_CMD_FIRST_GET = 0xF00D,
> VMXNET3_CMD_GET_QUEUE_STATUS = VMXNET3_CMD_FIRST_GET,
> @@ -668,6 +669,22 @@ struct Vmxnet3_CoalesceScheme {
> } coalPara;
>  };
>
> +struct Vmxnet3_MemoryRegion {
> +   __le64  startPA;
> +   __le32  length;
> +   __le16  txQueueBits;
> +   __le16  rxQueueBits;
> +};


What is the use case for this command?

What's the role of the tx/rx queue bits fields?

Re: [PATCH net-next v3 0/7] vmxnet3: upgrade to version 3

2016-06-13 Thread Or Gerlitz

On Tue, Jun 14, 2016 at 4:50 AM, Shrikrishna Khare  wrote:
> This patchset upgrades vmxnet3 to version 3.

Except for one patch, all the rest lack change-log, which makes it
somehow needlessly harder to review and maintain

> Shrikrishna Khare (7):
>   vmxnet3: prepare for version 3 changes
>   vmxnet3: introduce generic command interface to configure the device
>   vmxnet3: allow variable length transmit data ring buffer
>   vmxnet3: add receive data ring support
>   vmxnet3: add support for get_coalesce, set_coalesce ethtool operations
>   vmxnet3: introduce command to register memory region
>   vmxnet3: update to version 3

Re: [PATCH v5 09/21] IB/hns: Add hca support

2016-05-03 Thread Or Gerlitz

On Tue, May 3, 2016 at 10:57 AM, Wei Hu (Xavier)
 wrote:
> On 2016/4/30 12:33, Or Gerlitz wrote:


>> Can you elaborate what design aspects in the driver or anywhere else
>> should impose that limitation?

> 1.  Oulijun resolved the problem, and sent PATCH V6 on 2016-4-28. Thanks for
> more comments.

V6 still conditions things on ARM


> 2.  This driver for Hisilicon RoCE Engine run in ARM SoCs.
> The Hisilicon Network Subsystem is a long term evolution IP which is
> supposed to be used in Hisilicon ICT SoCs. HNS(Hisilicon Network Subsystem)
> has a hardware support of performing RDMA with RoCE engine.

I understand that the HW is targeted just for ARM environments. Is
this the reason
why you want to impose this build limitation or there's something in
the driver design
or code that is not going to work in other environments?

> This Kconfig related with this driver as below:
>
> config INFINIBAND_HISILICON_HNS
> tristate "Hisilicon Hns ROCE Driver"
> depends on NET_VENDOR_HISILICON
> depends on ARM64 && HNS && HNS_DSAF && HNS_ENET

this is understood, my question came to better understand the limitations

Re: [PATCH v5 09/21] IB/hns: Add hca support

2016-04-29 Thread Or Gerlitz

On Wed, Apr 27, 2016 at 6:34 AM, oulijun  wrote:
> On 2016/4/26 22:25, Jiri Pirko wrote:
>> Tue, Apr 26, 2016 at 04:18:21PM CEST, l...@kernel.org wrote:

 I appreciate your keen eye. this code is meant for ARM64bit therefore 
 should run corretly for 64-bit AARCH64.

>> The driver should run correctly on any arch.

>  Hi Jiri Pirko,
> Our driver run in ARM64 platform by depending on Kconfig. It will be 
> configure in Kconfig file.

Can you elaborate what design aspects in the driver or anywhere else
should impose that limitation?

Or.

Re: linux-next: manual merge of the rdma tree with the net-next tree

2016-03-23 Thread Or Gerlitz

On Wed, Mar 16, 2016 at 7:44 PM, Linus Torvalds
 wrote:
> On Wed, Mar 16, 2016 at 10:35 AM, Doug Ledford  wrote:
>> On 3/16/2016 1:18 PM, Linus Torvalds wrote:
>>> On Tue, Mar 15, 2016 at 5:58 PM, Stephen Rothwell  
>>> wrote:

>>>> I fixed it up (see below) and can carry the fix as necessary (no action
>>>> is required).

>>> Side note: can you change this wording for your manual merge script?
>>> Last merge window (or was it the one before it?) we had confusion with
>>> people who thought that "no action is required" means "you can just
>>> ignore this entirely".

>> I certainly didn't take it that way regardless of the wording.

> It was Or Gerlitz. You were cc'd, since it was the whole rdma Mellanox
> mess. I quote from that thread:
>
>  "> However, the fact that it got resolved in linux-next is purely
>   > informational. It doesn't "fix" the conflict - it just means that both
>   > sides should have gotten informed about it. That doesn't mean that the
>   > conflict goes away or becomes better.
>
>   That's news to me. When such things happen and caught by Stephen, we
>   are getting an email saying something like
>
>   "Today's linux-next merge of the infiniband tree got a conflict
>   between commit X from net-next tree and commit Y from the infiniband
>   tree. I fixed it up (see below) and can carry the fix as necessary (no
>   action is required)."
>
>   Also asked around a bit and got to learn on Stephen using git rerere,
>   so all (no action needed note + seeing git rerere in action...) that
>   leaded me to think that indeed no action is required from our side,
>   but after reading your email (twice, so far), I realized that this was
>   wrong conclusion."

> So that whole "no action is required" wording very much has caused
> confusion before in the rdma camp.

> Let's fix the wording. I'm indeed hopeful that the rdma camp is now
> keenly aware of the issues, but that doesn't change the fact that the
> wording has been problematic.

>> "[...] The Mellanox people are on my xxit-list until they show that they can
>> actually act like responsible people [...]"

Linus,

As I wrote you in that other thread you were quoting from there,
following to the happenings mentioned there, we took responsibility
and made bunch of corrective actions within Mellanox which got us to a
point where there was only one rdma/netdev-next conflict for the 4.6
merge window.

I know there's history here, and in the 4.5 cycle things were much
worse, but I still wanted to put things in their more precise place,
if you don't mind.

Or.

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-19 Thread Or Gerlitz

On Thu, Mar 17, 2016 at 3:40 AM, Alexey Kardashevskiy  wrote:
> On 03/16/2016 08:45 PM, Or Gerlitz wrote:
>> On Wed, Mar 16, 2016 at 10:34 AM, Alexey Kardashevskiy 
>> wrote:
>>
>>> Oh. ok. It also looks like even with the reverted patch, mlx4 VF does not
>>> work in a guest:
>>
>>
>> So where is the breakage point for you? does 4.4 works? if not, what?

> Ah, my bad. It is unrelated to the kernel version.
> I tried passing a PF to a guest while its VFs are already passed to another
> guest and see how exactly it blows up (AER/EEH were thrown but the host
> recovered => good) but this left the device in a weird state when I could
> not use VF in a guest anymore but it seemed to keep working on the host.

> It seems like the actual adapter does not reset completely when the machine
> is rebooted, I had unplug/replug power cables to fix this.

So to make sure, now things works fine with the patch reverted?

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-16 Thread Or Gerlitz

On Wed, Mar 16, 2016 at 10:34 AM, Alexey Kardashevskiy  wrote:

> Oh. ok. It also looks like even with the reverted patch, mlx4 VF does not
> work in a guest:

So where is the breakage point for you? does 4.4 works? if not, what?

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Or Gerlitz

On Tue, Mar 15, 2016 at 2:18 PM, Christoph Hellwig  wrote:
> On Tue, Mar 15, 2016 at 12:40:06PM +0200, Or Gerlitz wrote:
>> "[..] Regarding backward compatibility in SR-IOV, if hypervisor has
>> this new code, the virtual OS must be updated. [...]"

> Which is broken, we can't break user or guest VM ABIs ever.

Let us check. I was under (the maybe wrong) impression, that before this
patch both PF/VF drivers were not operative on some systems, so on those
systems it's fair to require the VF driver to be patched too.

Or.

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Or Gerlitz

On Tue, Mar 15, 2016 at 12:19 PM, Alexey Kardashevskiy  wrote:
> This reverts commit 85743f1eb34548ba4b056d2f184a3d107a3b8917.
>
> Without this revert, POWER "pseries" KVM guests with a VF passed to a guest
> using VFIO fail to bring the driver up:
>
> mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
> mlx4_core: Initializing :00:00.0
> mlx4_core :00:00.0: enabling device ( -> 0002)
> mlx4_core :00:00.0: Detected virtual function - running in slave mode
> mlx4_core :00:00.0: Sending reset
> mlx4_core :00:00.0: Sending vhcr0
> mlx4_core :00:00.0: HCA minimum page size:512
> mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
> mlx4_core :00:00.0: Failed to obtain slave caps

> Both host and guest use 64K system pages.
>
> How to fix this properly? Thanks.

The commit message says:

"[..] Regarding backward compatibility in SR-IOV, if hypervisor has
this new code, the virtual OS must be updated. [...]"


Or.

Re: [PATCH RESEND] infiniband:core:Fix error handling in the function cm_lap_handler

2016-02-23 Thread Or Gerlitz


On 2/22/2016 8:59 PM, Nicholas Krause wrote:

This fixes error handling in the function cm_lap_handler to properly
check if the internal call to the function cm_init_av_for_response
has failed by returning a error code and if so exit immediately from
this particular function by freeing all previously allocated
resources before returning the error code to this function's caller
in order to allow the caller to handle the error properly in it's

Signed-off-by: Nicholas Krause


I think this is the longest sentence I have ever read in English, and it 
even doesn't have a end... please break it and tell us the (hopefully) 
happy end.


Use IB/core: prefix and point to the commit you are fixing

Fixes: xxx ('yyy')

where xxx is the 12 digit abbrev-iated git short log  and yyy the commit 
title

Re: linux-next: manual merge of the rdma tree with the nfsd tree

2016-01-06 Thread Or Gerlitz

On 1/6/2016 5:20 PM, Chuck Lever wrote:

Chuck,
>
>Lets be concrete... anything wrong with patch [1]?

Yes. It is missing Acked-by: lines from the maintainers of
those files.

All changes to files under net/sunrpc need an Ack from one
of the maintainers listed in MAINTAINERS for that directory,
if the changes are going through another maintainer's tree.

I have been personally asked to remind folks that the
nfs-sunrpc maintainers do not read linux-rdma, so they
must be contacted directly (and cc: linux-nfs) as part of
proposing finished patches in that area.

I did that!!

I copied you and Anna on the patch [1].

Again, lets be concrete, this very small cleanup was picked and merged, 
anything there

need to be fixed?

Or.

[1] http://marc.info/?l=linux-rdma&m=145042924110411&w=2

Unfortunately I have not been able to review every patch
that has come by on linux-rdma in the past 9 months to
ensure the eyes are dotted and tees crossed. More than
a few commits in the tree are missing the proper tags.

>[1]  commit e3e45b1 "xprtrdma: Avoid calling ib_query_device"
>
>in git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git k.o/for-4.5
>
>http://git.kernel.org/cgit/linux/kernel/git/dledford/rdma.git/commit/?h=k.o/for-4.5&id=e3e45b1b43988b99007a9908ca0ba738b3fbd0ff
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the rdma tree with the nfsd tree

2016-01-06 Thread Or Gerlitz


On 1/6/2016 4:24 PM, Chuck Lever wrote:

Actually, one of Or's for-4.5 devattr patches doesn't appear to have the proper 
Ack's for the changes under net/sunrpc/xprtrdma either.


Chuck,

Lets be concrete... anything wrong with patch [1]?

Or.

[1]  commit e3e45b1 "xprtrdma: Avoid calling ib_query_device"

in git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git 
k.o/for-4.5


http://git.kernel.org/cgit/linux/kernel/git/dledford/rdma.git/commit/?h=k.o/for-4.5&id=e3e45b1b43988b99007a9908ca0ba738b3fbd0ff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the rdma tree with the nfsd tree

2016-01-06 Thread Or Gerlitz


On 1/6/2016 2:15 PM, Christoph Hellwig wrote:

I'm not Doug, but all the recent for 4.5 work is in Dougs tree
at

https://github.com/dledford/linux  rdma/k.o/for-4.5


Christoph,

As I wrote here, the bits are already @ kernel.org

git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git k.o/for-4.5

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the rdma tree with the nfsd tree

2016-01-06 Thread Or Gerlitz


On 1/6/2016 2:01 PM, Chuck Lever wrote:

what is the URL for the branch I should rebase on?


k.o/for-4.5 on Doug's kernel.org tree
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the rdma tree with the net-next tree

2016-01-05 Thread Or Gerlitz


On 1/5/2016 3:51 AM, Stephen Rothwell wrote:

Hi Doug,

Today's linux-next merge of the rdma tree got conflicts in:

   drivers/net/ethernet/mellanox/mlx5/core/vport.c
   include/linux/mlx5/mlx5_ifc.h
   include/linux/mlx5/vport.h

between commits:

   e1d7d349c69d ("net/mlx5: Update access functions to Query/Modify vport MAC 
address")
   e75465148b7d ("net/mlx5: Introduce access functions to modify/query vport 
state")

from the net-next tree and commit:

   e5f6175c5b66 ("net/mlx5_core: Break down the vport mac address query 
function")

from the rdma tree and maybe some others.

I have no hope of fixing this stuff up, so I have dropped the rdma tree
again for today.  There is similar functionality being introduced in
both trees ... please sort this mess out ...


Hi Stephen,

Saeed/Matan and Co are working here on a fix which would solve the 
conflict.


We have the basic patch and people will be testing it later/tomorrow.

Could you please advice how would it be possible to provide you the
git rerere output -- the pre-image and post-images files from .git/rr-cache?

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net/mlx4_core: fix handling return value of mlx4_slave_convert_port

2015-12-15 Thread Or Gerlitz


On 12/14/2015 12:05 PM, Andrzej Hajda wrote:

The function can return negative values, so its result should
be assigned to signed variable.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107



Please add here

Fixes: fc48866f7 ('net/mlx4: Adapt code for N-Port VF')



Signed-off-by: Andrzej Hajda 


otherwise, Looks good

Acked-by: Or Gerlitz 

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] IB/iser: Use a dedicated descriptor for login

2015-11-15 Thread Or Gerlitz


On 11/13/2015 3:46 PM, Christoph Hellwig wrote:

From: Sagi Grimberg

Makes better sense and we'll need it later with CQ abstraction.
iser switch login bufs to void


Sagi, few quick comments on this patch, please address for next version..

The 2nd sentence of the change-log needs better phrasing.

also multiple checkpatch hits on the patch, please fix

CHECK: Please don't use multiple blank lines
#26: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:329:

+

WARNING: __packed is preferred over __attribute__((packed))
#42: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:345:
+} __attribute__((packed));

CHECK: Please don't use multiple blank lines
#44: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:347:
+
+

CHECK: Alignment should match open parenthesis
#161: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:209:
+   if (ib_dma_mapping_error(device->ib_device,
+   desc->req_dma))

CHECK: Alignment should match open parenthesis
#172: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:220:
+   if (ib_dma_mapping_error(device->ib_device,
+   desc->rsp_dma))






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/9] IB/iser: Convert to CQ abstraction

2015-11-15 Thread Or Gerlitz


On 11/13/2015 3:46 PM, Christoph Hellwig wrote:

From: Sagi Grimberg


Care to sparse some text here to assist a reviewer and future bisections?!

I have asked multiple times to avoid empty change-logs for patches in 
this driver.




Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
---
  drivers/infiniband/ulp/iser/iscsi_iser.h |  68 ---
  drivers/infiniband/ulp/iser/iser_initiator.c | 142 ++-
  drivers/infiniband/ulp/iser/iser_memory.c|  21 ++-
  drivers/infiniband/ulp/iser/iser_verbs.c | 258 ++-
  4 files changed, 209 insertions(+), 280 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-15 Thread Or Gerlitz

On Sun, Nov 15, 2015 at 10:48 AM, Sagi Grimberg
 wrote:
> Or is correct,
>
> I have attempted to convert iser to use blk_iopoll in the past, however
> I've seen inconsistent performance and latency skews (comparing to
> tasklets iser is using today). This was manifested in IOPs test cases
> where I ran multiple threads with higher queue-depth and not in
> sanitized pure latency (QD=1) test cases. Unfortunately I didn't have
> the time to pick it up since.
>
> I do have every intention of testing it again with this. If it still
> exist we will need to find the root-cause of it before converting
> drivers to use it.

Good, this way (inconsistent performance and latency skews) or another
(all shines up) -- please
let us know your findings, best through commenting within V > 0 the
cover letter posts of this series
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-13 Thread Or Gerlitz

On Fri, Nov 13, 2015 at 3:46 PM, Christoph Hellwig  wrote:
> The new name is irq_poll as iopoll is already taken.  Better suggestions
> welcome.

Sagi (or Christoph if you can address that),

@ some pointer over the last 18 months there was a port done at
mellanox for iser to use blk-iopoll and AFAIR it didn't work well or
didn't work at all. Can you tell now what was the problem and how did
you address it at your generalization?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-22 Thread Or Gerlitz

On Wed, Oct 21, 2015 at 10:20 PM, Alex Williamson
 wrote:

> This is why the typical VF agnostic approach here is to using bonding
> and fail over to a emulated device during migration, so performance
> suffers, but downtime is something acceptable.

bonding in the VM isn't a zero touch solution, right? is it really acceptable?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-21 Thread Or Gerlitz

On Wed, Oct 21, 2015 at 7:37 PM, Lan Tianyu  wrote:
> This patchset is to propose a new solution to add live migration support
> for 82599 SRIOV network card.

> In our solution, we prefer to put all device specific operation into VF and
> PF driver and make code in the Qemu more general.

[...]

> Service down time test
> So far, we tested migration between two laptops with 82599 nic which
> are connected to a gigabit switch. Ping VF in the 0.001s interval
> during migration on the host of source side. It service down
> time is about 180ms.

So... what would you expect service down wise for the following
solution which is zero touch and I think should work for any VF
driver:

on host A: unplug the VM and conduct live migration to host B ala the
no-SRIOV case.

on host B:

when the VM "gets back to live", probe a VF there with the same assigned mac

next, udev on the VM will call the VF driver to create netdev instance

DHCP client would run to get the same IP address

+ under config directive (or from Qemu) send Gratuitous ARP to notify
the switch/es on the new location for that mac.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix return value error

2015-10-14 Thread Or Gerlitz

On 10/14/2015 2:59 PM, Leon Romanovsky wrote:

On Wed, Oct 14, 2015 at 11:17 AM, Heloise NH  wrote:

>Signed-off-by: Heloise NH

The patch is a correct one, however can you update the subject and
description to be more informative?
Please add that new_inode() function can fail for allocation only.

>---
>  drivers/infiniband/hw/ipath/ipath_fs.c | 2 +-
>  drivers/infiniband/hw/qib/qib_fs.c | 2 +-

NO... the ipath driver has moved to staging as part of its EOL process, 
it's now under drivers/staging/rdma/ipath

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-10-04 Thread Or Gerlitz

On Wed, Sep 30, 2015 at 9:01 AM,   wrote:

> This interface has no current users and is obsolete.

mmm, how does Sean's madeye util is working?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: rdma: Fix braces around if/else

2015-10-04 Thread Or Gerlitz


On 10/3/2015 11:55 PM, Martin Kletzander wrote:

Get rid of all ELSE_AFTER_BRACE type errors reported by checkpatch.pl.


Hi Greg,

Is there a way to signal people/tools that a certain driver parks in 
staging on their way **out** of the kernel
and not the other way around? I guess you (nor Doug) don't want to spend 
time on fixing such drivers, right?


Or.



Signed-off-by: Martin Kletzander 
---
There is one warning reported in this patch though.  That's because of
the multiline string and it's pre-existing.  Feel free to let me know
if that should be fixed too, I'd also remove the pointless '#' then.
On the other hand, it'll create more-than-80-columns long line.

  drivers/staging/rdma/ipath/ipath_driver.c| 19 ---
  drivers/staging/rdma/ipath/ipath_file_ops.c  | 12 ++--
  drivers/staging/rdma/ipath/ipath_iba6110.c   |  7 +++
  drivers/staging/rdma/ipath/ipath_init_chip.c | 10 +-
  drivers/staging/rdma/ipath/ipath_intr.c  |  7 +++
  drivers/staging/rdma/ipath/ipath_sysfs.c |  7 +++
  drivers/staging/rdma/ipath/ipath_verbs.c |  4 ++--
  7 files changed, 30 insertions(+), 36 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] infiniband:mlx4:Fix assumation that ib_get_cached_pkey runs successfully in build_mlx_header

2015-09-09 Thread Or Gerlitz


On 9/10/2015 2:04 AM, Nicholas Krause wrote:

This fixes a incorrect assumation that ib_get_cached_pkey always runs
successfully in the function build_mlx_header by checking if the calls
to this particular function return the error code, -EINVAL in order to
signal they failed to grap the public key for the device structure pointer
passed to this function and if so return immediately to the caller of
mlx_header with the error code -EINVAL to signal a error has occurred
when calling this particular function that the caller must handle in
its own intended error paths.


I am totally breathless, put 2-3 periods somewhere along this crazingly 
long sentence.


Does this fixes some specific commit? if yes, please add Fixes: tag @ 
before your S.O.B linewith the offending commit short ID (--abbrev=12)


run speller (e.g assumation  --> assumption), use IB/mlx4: prefix, add 
space between the ":" to the 1st word of the title, e.g:


IB/mlx4: Fix assumption that ib_get_cached_pkey runs successfully in 
build_mlx_header


Or.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/RFT PATCH v3 0/4] KVM: x86: full virtualization of guest MTRR

2015-07-29 Thread Or Gerlitz


On 7/8/2015 6:18 PM, Paolo Bonzini wrote:

This part of the MTRR patches was dropped by Xiao.  Bring SVM on feature
parity with VMX, and then do guest MTRR virtualization for both VMX and SVM.

The IPAT bit of VMX extended page tables is emulated by mangling the guest
PAT value.

I do not have any AMD machines that support an IOMMU, so I would like
some help testing these patches.  Thanks,




Hi Paolo,

We (finally) have results showing that the patches work well and provide 
benefit.


For getting better latency with ConnectX RDMA devices, we write send 
descriptors
to a write-combining (WC) mapped buffer instead of ringing a doorbell 
and having
the HW fetch the descriptor from system memory. In the mlx4 jargon, this 
optimization

is called Blue-Flame (BF).

To test the patches, we booted two hosts with 4.2-rc, patched with
this series, over which a legacy (RHEL 6.x) guests arerunning.

Under SRIOV, the mlx4 VF driver queries the mlx4 PF driver on the
host if BF is availablefor them to use.

We did two runs:

In [1] the guest VF driver was told by the host PF driver
that BF isn't supportedand hence they didn't use WC.

In [2] the VFs were told by the host PF driver that BF is supported
and hence used WC.

The results are provided in micro-seconds and account for half RTT of
nativeRDMA latency test, the WC advantage is notable, so +1for this series!

Or.


[1] guests not-using Blue-Flame / Write-Combining

root@host-194-168-80-68 ~]# ib_send_lat -a
 #bytes #iterationst_min[usec]t_max[usec] t_typical[usec]
 2   1000  1.13   11.531.16
 4   1000  1.13   6.33 1.16
 8   1000  1.13   5.17 1.17
 16  1000  1.14   4.37 1.17
 32  1000  1.15   5.01 1.18
 64  1000  1.19   7.96 1.22
 128 1000  1.28   5.44 1.31
 256 1000  1.62   6.90 1.65
 512 1000  1.78   5.65 1.82


[2] guests using Blue-Flame when the host allows Write-Combining mapping 
by VMs


root@host-194-168-80-68 ~]# ib_send_lat -a
#bytes #iterationst_min[usec]t_max[usec] t_typical[usec]
 2   1000  0.86   16.970.89
 4   1000  0.87   4.81 0.90
 8   1000  0.87   4.89 0.90
 16  1000  0.87   6.46 0.90
 32  1000  0.88   4.22 0.91
 64  1000  0.94   4.03 0.97
 128 1000  1.03   6.50 1.06
 256 1000  1.36   7.71 1.39
 512 1000  1.49   5.92 1.52



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] mellanox IB driver fails to load on large config

2015-07-15 Thread Or Gerlitz


On 7/14/2015 11:28 PM, Alex Thorlton wrote:


We see the same exact messages on 4.1-rc8.




does this solves the problem?


diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ad31e47..c8ae3b9 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -45,7 +45,7 @@
 #include 

 #define MAX_MSIX_P_PORT17
-#define MAX_MSIX   64
+#define MAX_MSIX   1024
 #define MIN_MSIX_P_PORT5
 #define MLX4_IS_LEGACY_EQ_MODE(dev_cap) ((dev_cap).num_comp_vectors < \
(dev_cap).num_ports * MIN_MSIX_P_PORT)
--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] mellanox IB driver fails to load on large config

2015-07-14 Thread Or Gerlitz

On Tue, Jul 14, 2015 at 9:48 PM, Alex Thorlton  wrote:
> On Tue, Jul 14, 2015 at 01:22:34PM -0500, andrew banman wrote:
>> On Sat, Jul 11, 2015 at 11:20:19PM +0300, Or Gerlitz wrote:
>> > On Fri, Jul 10, 2015 at 10:15 PM, andrew banman  wrote:
>> > > I'm seeing a large number of allocation errors originating from the 
>> > > Mellanox IB
>> > > driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system:
>> >
>> > Just to make sure, mlx4 works fine on this small (...) system with 4.1
>> > and 4.2-rc1 breaks, or 4.2-rc1 is the 1st time you're trying that
>> > config?
>>
>> I'll let Alex comment on that, he did some testing on that.
>
> I started seeing this on a 4.1-rc8 kernel, so it's been around for a
> little while.  It may have been around before 4.1-rc8, but I haven't run
> any kernels older than that on the big machine for some time.

To make sure I am correctly following, on 4.1-rc8  you also see
something, right? are these the same messages or different ones? if
the latter send to us.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] mellanox IB driver fails to load on large config

2015-07-11 Thread Or Gerlitz

On Fri, Jul 10, 2015 at 10:15 PM, andrew banman  wrote:
> I'm seeing a large number of allocation errors originating from the Mellanox 
> IB
> driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system:

Just to make sure, mlx4 works fine on this small (...) system with 4.1
and 4.2-rc1 breaks, or 4.2-rc1 is the 1st time you're trying that
config?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlx4: "failed to allocate default counter port 1"

2015-06-30 Thread Or Gerlitz

On Tue, Jun 30, 2015 at 1:45 PM, Sebastian Ott
 wrote:
> after the latest mellanox update the mlx4 driver fails to probe a VF:
> [   88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: failed 
> to allocate default counter port 1 err -22
> [   88.909564] mlx4_core :00:00.0: Failed to allocate default counters, 
> aborting
> [   88.961735] mlx4_core: probe of :00:00.0 failed with error -22
>
> PFs still work. See below for more dmesg output - I also added a line of
> debug output...maybe this helps.

Can you please send your "lspci | grep nox" listing? also what
Firmware version you have there? e.g when you probe the PF with
mlx4_core debug_level=1 can you sens us the lines that follow the PF
probe, e.g as here + dump of all caps as you did for the VF

  952.367911] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[  952.374606] mlx4_core: Initializing :06:00.0
[  953.384332] mlx4_core :06:00.0: FW version 2.34.5000 (cmd intf
rev 3), max commands 16
[...]

Also send us the output of "dmesg | grep -i counter" after such verbose load.

thanks,

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Networking

2015-06-25 Thread Or Gerlitz

On Thu, Jun 25, 2015 at 2:38 AM, Linus Torvalds
 wrote:
>
> On Wed, Jun 24, 2015 at 6:39 AM, David Miller  wrote:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
>
> Just going through the conflicts, I see commit 7193a141eb74 ("IB/mlx4:
> Set VF to read from QP counters"), and wonder...
>
> Is that code really supposed to fall through to the
> infiniband-over-ethernet case when the link layer is
> IB_LINK_LAYER_INFINIBAND but it's a slave?
>
> The commit message is not in the least helpful.
>

And this is a bug indeed.

Under IB links, we should use the the infiniband-over-ethernet flow only for one
specific case (reading link performance counters by SRIOV VFs) and nothing else.

I sent a fix, https://patchwork.kernel.org/patch/6675921/

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v2] Documentation/infiniband: Add docs for rdma-helpers

2015-05-18 Thread Or Gerlitz

On Mon, May 18, 2015 at 11:41 AM, Michael Wang
 wrote:
> Since v1:
>   * Merge the descriptions from Doug:
> http://www.spinics.net/lists/linux-rdma/msg25172.html
>
> This is the following patch for:
>   https://lkml.org/lkml/2015/5/5/417
> which try to document the settled rdma_cap_XX().
>
> Highlights:
>   There could be many missing/mistakes/misunderstanding, please don't
>   be hesitate to point out the issues, any suggestions to improve or
>   complete the description are very welcomed ;-)

Michael, none of what you wrote above belongs to the change-log which
is going to stay for-ever in the upstream git repo. You should put it
all belong the --- line after your S.O.B and add proper change log
telling what this patch is about

>
> Signed-off-by: Michael Wang 
> ---
>  Documentation/infiniband/rdma_helpers.txt | 79 
> +++
>  1 file changed, 79 insertions(+)
>  create mode 100644 Documentation/infiniband/rdma_helpers.txt
>
> diff --git a/Documentation/infiniband/rdma_helpers.txt 
> b/Documentation/infiniband/rdma_helpers.txt
> new file mode 100644
> index 000..be9416d
> --- /dev/null
> +++ b/Documentation/infiniband/rdma_helpers.txt
> @@ -0,0 +1,79 @@
> +RDMA HELPERS
> +
> +  The following helpers are used to check the specific capabilities of a
> +  particular port before utilizing those capabilities.
> +
> +rdma_cap_ib_mad- Infiniband Management Datagrams.
> +rdma_cap_ib_smi- Infiniband Subnet Management Interface.
> +rdma_cap_ib_cm - Infiniband Communication Manager.
> +rdma_cap_iw_cm - IWARP Communication Manager.
> +rdma_cap_ib_sa - Infiniband Subnet Administration.
> +rdma_cap_ib_mcast  - Infiniband Multicast Join/Leave Protocol.
> +rdma_cap_read_multi_sge- RDMA Read Work Request Support Multiple SGE.
> +rdma_cap_af_ib - Native Infiniband Address.
> +rdma_cap_eth_ah- InfiniBand Transport With Ethernet Address.
> +
> +USAGE
> +
> +  if (rdma_cap_XX(device, i)) {
> +   /* The port i of device support XX */
> +   ...
> +  } else {
> +   /* The port i of device don't support XX */
> +   ...
> +  }
> +
> +  rdma_cap_ib_mad
> +  ---
> +Management Datagrams (MAD) are a required part of the InfiniBand
> +specification and are supported on all InfiniBand devices.  A slightly
> +extended version are also supported on OPA interfaces.
> +
> +  rdma_cap_ib_smi
> +  ---
> +Subnet Management Interface (SMI) will handle SMP packet from SM
> +in an infiniband fabric.
> +
> +  rdma_cap_ib_cm
> +  ---
> +Communication Manager (CM) service, used to ease the process of 
> connecting
> +to a remote host.  The IB-CM can be used to connect to remote hosts using
> +either InfiniBand or RoCE connections, iWARP has its own CM.
> +
> +  rdma_cap_iw_cm
> +  ---
> +iWARP Communication Manager (CM), Similar to the IB-CM, but only used on
> +iWARP devices.
> +
> +  rdma_cap_ib_sa
> +  ---
> +Subnet Administration (SA) is the database built by SM in an
> +infiniband fabric.
> +
> +  rdma_cap_ib_mcast
> +  ---
> +InfiniBand (and OPA) use a different multicast mechanism rather than
> +traditional IP multicast found on Ethernet devices.  If this is true, 
> then
> +traditional IPv4/IPv6 multicast is handled by the IPoIB layer and direct
> +multicast joins and leaves are handled per the InfiniBand specifications.
> +
> +  rdma_cap_read_multi_sge
> +  ---
> +Certain devices (iWARP in particular) have restrictions on the number of
> +scatter gather elements that can be present in an RDMA READ work request,
> +this is true if the device does not have that restriction.
> +
> +  rdma_cap_af_ib
> +  ---
> +Many code paths for traditional InfiniBand and RoCE links are the same,
> +but need minor differences to accommodate the different addresses on the
> +two types of connections.  This helper is true when the address of the
> +specific connection is of the InfiniBand native variety.
> +
> +  rdma_cap_eth_ah
> +  ---
> +Queue Pair is InfiniBand transport, but uses Ethernet address instead
> +of native InfiniBand address (aka, this is a RoCE QP, and that means
> +ethertype 0x8915 + GRH for RoCEv1 and IP/UDP to well known UDP port for
> +RoCEv2), this is true when the address family of the specific queue pair
> +is of the Ethernet (RoCE) variety.
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info

Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers

2015-05-05 Thread Or Gerlitz


On 5/5/2015 3:50 PM, Michael Wang wrote:

Since v7:
   * Thanks to Doug, Ira, Devesh for the testing:-)
   * Thanks for the comments from or, Doug, Ira, Jason:-)
 Please remind me if anything missed:-P
   * Use rdma_cap_XX() instead of cap_XX() for readability
   * Remove CC list in git log for maintainability
   * Use bool as return value
   * Updated github repository to v8


Didn't you see that patches 15~23 will be squashed into one?


Also, when you post version N you need not only to list the changes 
since version N-1 but rather also to keep the full changes since Vx for 
x=1...N-2, reviewers needs not chase your previous cover letters and see 
if/whatwent wrong, specifically with a sensitive series like this one.


So if/when there's V9, use that practice, and for the time being, reply 
on the V8 cover letter with the full listing of changes


Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 04/23] IB/Verbs: Reform IB-core cm

2015-04-29 Thread Or Gerlitz

On 4/29/2015 10:40 AM, Michael Wang wrote:

On 04/28/2015 09:02 PM, Or Gerlitz wrote:

>On Tue, Apr 28, 2015 at 6:10 PM, Michael Wang  
wrote:

>>Use raw management helpers to reform IB-core cm.
>>
>>Cc: Hal Rosenstock
>>Cc: Steve Wise
>>Cc: Tom Talpey
>>Cc: Jason Gunthorpe
>>Cc: Doug Ledford
>>Cc: Ira Weiny
>>Cc: Sean Hefty
>>Signed-off-by: Michael Wang
>>---
>>  drivers/infiniband/core/cm.c | 20 +---
>>  1 file changed, 17 insertions(+), 3 deletions(-)

>
>Hi Michael,
>
>I don't really see the benefit (e.g for someone doing bisection
>1/2/5/10 years from now and landing here) of listing all the group of
>reviewers for each of the ~30 patches that make this series, any
>special reason that caused you doing so?

Those on the CC list are used to help correct some problems or
contributed to the definition/plan:-)  They are familiar with
the whole story of this patch set.

As you mentioned, few years later when someone bisect out the patches
and want to learn why it's like that, he could have enough address to
send his question, although few of them may not work on the same aspect
anymore, but the chance to find someone have the story is higher.

The kernel development model works well for both driver and core 
patches, w.o adding 7-10

people as CC to patches.

I think the CC list is not that big for a patch set covered such a wide
range, isn't it:-P

Maybe it's a matter of taste, but for me it look way way too big. If you 
really want to have such
a huge listing, do it in the early patches of the series where you 
introduce the new concepts, and later,on downstream patches, when you 
use it, put one person if they happen to be the author or maintainthat 
area (e.g Sean <-- CM/CMA, Doug/Erez <-- IPoIB Ira, Hal <-- MAD, Steve 
<-- IW_CM, etc)

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 04/23] IB/Verbs: Reform IB-core cm

2015-04-29 Thread Or Gerlitz

On Wed, Apr 29, 2015 at 10:40 AM, Michael Wang
 wrote:
> Hi, Or
>
> On 04/28/2015 09:02 PM, Or Gerlitz wrote:
>> On Tue, Apr 28, 2015 at 6:10 PM, Michael Wang  
>> wrote:
>>> Use raw management helpers to reform IB-core cm.
>>>
>>> Cc: Hal Rosenstock 
>>> Cc: Steve Wise 
>>> Cc: Tom Talpey 
>>> Cc: Jason Gunthorpe 
>>> Cc: Doug Ledford 
>>> Cc: Ira Weiny 
>>> Cc: Sean Hefty 
>>> Signed-off-by: Michael Wang 
>>> ---
>>>  drivers/infiniband/core/cm.c | 20 +---
>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> Hi Michael,
>>
>> I don't really see the benefit (e.g for someone doing bisection
>> 1/2/5/10 years from now and landing here) of listing all the group of
>> reviewers for each of the ~30 patches that make this series, any
>> special reason that caused you doing so?
>
> Those on the CC list are used to help correct some problems or
> contributed to the definition/plan :-) They are familiar with
> the whole story of this patch set.
>
> As you mentioned, few years later when someone bisect out the patches
> and want to learn why it's like that, he could have enough address to
> send his question, although few of them may not work on the same aspect
> anymore, but the chance to find someone have the story is higher.

The kernel development model works well for both driver and core
patches, w.o adding 7-10
people as CC to patches.

>
> I think the CC list is not that big for a patch set covered such a wide
> range, isn't it :-P

Maybe it's a matter of taste, but for me it look way way too big. If
you really want to have such
a huge listing, do it in the early patches of the series where you
introduce the new concepts, and later,on downstream patches, when you
use it, put one person if they happen to be the author or maintainthat
area (e.g Sean <-- CM/CMA, Doug/Erez <-- IPoIB Ira, Hal <-- MAD, Steve
<-- IW_CM, etc)

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

2015-04-28 Thread Or Gerlitz

On Tue, Apr 28, 2015 at 9:56 PM, Jason Gunthorpe
 wrote:
> On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote:
>> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
>
>> Having some of it refer to things as IBOE and some as ROCE would be
>> similarly confusing, and switching existing IBOE usage to ROCE would
>> cause pain to people with out of tree drivers (Lustre is the main one I
>> know of).  There's not a good answer here.  There's only less sucky
>> ones.
>
> The tide has already turned, we should ditch iboe:
>
> $git grep -i roce_ drivers/infiniband/ | wc -l
> 91
> $git grep -i iboe_ drivers/infiniband/ | wc -l
> 37
>
> It isn't really mainline's role to be too concerned about out of tree
> things like Lustre.

FWIW, note that Lustre is under staging for a while, not sure how
close they are for actual acceptance.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 04/23] IB/Verbs: Reform IB-core cm

2015-04-28 Thread Or Gerlitz

On Tue, Apr 28, 2015 at 6:10 PM, Michael Wang  wrote:
> Use raw management helpers to reform IB-core cm.
>
> Cc: Hal Rosenstock 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cm.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)

Hi Michael,

I don't really see the benefit (e.g for someone doing bisection
1/2/5/10 years from now and landing here) of listing all the group of
reviewers for each of the ~30 patches that make this series, any
special reason that caused you doing so?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

2015-04-28 Thread Or Gerlitz

On Fri, Apr 10, 2015 at 7:27 PM, Benjamin Poirier  wrote:
> By default, the number of tx queues is limited by the number of online cpus in
> mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
> .set_channels method has been called. In that situation, the driver may access
> invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.
>

Hi Benjamin,

Can this fix be related to a specific commit? if yes, would be good if
you can add here a Fixes: line so it would be easier to spot down to
which stable kernels the fix should go.

Or.

> Signed-off-by: Benjamin Poirier 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 25/28] IB/Verbs: Use management helper cap_af_ib()

2015-04-16 Thread Or Gerlitz

On Mon, Apr 13, 2015 at 3:35 PM, Michael Wang  wrote:
>
> Introduce helper cap_af_ib() to help us check if the port of an
> IB device support Native Infiniband Address.
>
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c |  2 +-
>  include/rdma/ib_verbs.h   | 15 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 65e41f4..7f5815d 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -470,7 +470,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private 
> *id_priv)
>
> list_for_each_entry(cur_dev, &dev_list, list) {
> for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
> -   if (!rdma_ib_or_iboe(cur_dev->device, p))
> +   if (!cap_af_ib(cur_dev->device, p))
> continue;
>
> if (ib_find_cached_pkey(cur_dev->device, p, pkey, 
> &index))
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 29ddd14..dfe33f3 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1879,6 +1879,21 @@ static inline int cap_ipoib(struct ib_device *device, 
> u8 port_num)
>  }
>
>  /**
> + * cap_af_ib - Check if the port of device has the capability
> + * Native Infiniband Address.
> + *
> + * @device: Device to be checked
> + * @port_num: Port number of the device
> + *
> + * Return 0 when port of the device don't support
> + * Native Infiniband Address.
> + */
> +static inline int cap_af_ib(struct ib_device *device, u8 port_num)
> +{
> +   return rdma_ib_or_iboe(device, port_num);
> +}

Sean, can you please put a precise writeup what does it take to
support AF_IB... I am a bit
confused here and wasn't sure if this can be supported with RoCE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH linux-next v3] mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures

2015-04-14 Thread Or Gerlitz


On 4/15/2015 6:19 AM, Honggang Li wrote:

Fix(bf0bf77 mlx5: Support communicating arbitrary host page size to firmware)


This isn't the way to write the fix note, do it that way

Fixes: bf0bf77f6519 ('mlx5: Support communicating arbitrary host page 
size to firmware')




Signed-off-by: Honggang Li


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-14 Thread Or Gerlitz


On 4/14/2015 11:41 PM, Jason Gunthorpe wrote:


Erez, you basically rewrote this, please make a proper patch with the Fixes and 
Reported-By credit for Honggang. Lets merge this through Dave M's tree right 
away.


Agree, Erez, add proper Fixes: XXX note and send a patch to netdev 
against net-next. No need for the lengthy crash dump there. Nicolas, 
next time you patch IPoIB, please cc linux-rdma.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] if_link: Add VF multicast promiscuous control

2015-03-08 Thread Or Gerlitz

On Mon, Feb 23, 2015 at 11:14 PM, Jeff Kirsher
 wrote:
[...]
> We discussed this during NetConf last week, and Don is correct that a
> custom sysfs interface is not the way we want to handle this.  We agreed
> upon a generic interface so that any NIC is able to turn on or off VF
> multicast promiscuous mode.

Jeff, please make sure to either respond to my comments on the V2
thread (or better) address them for the V3 post.


http://marc.info/?l=linux-netdev&m=142441852518152&w=2
http://marc.info/?l=linux-netdev&m=142441867218183&w=2

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-02-19 Thread Or Gerlitz

On Fri, Feb 20, 2015 at 3:01 AM, Hiroshi Shimamoto
 wrote:

> The administrator can allow to VF multicast promiscuous mode for only
> trusted VM.
>  # ./ip link set dev eth0 vf 1 mc_promisc on
> When disallowing multicast promiscuous mode, we can only use 30 IPv6 
> addresses.
>  # ./ip link set dev eth0 vf 1 mc_promisc off

all the above text belongs to the change-log of the patch that
introduces the new ndo also use just "ip" not "./ip" and specify that
the default is off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the infiniband tree

2014-12-16 Thread Or Gerlitz


On 12/16/2014 3:56 AM, Roland Dreier wrote:

On Mon, Dec 15, 2014 at 5:47 PM, Stephen Rothwell  wrote:

Hi all,

After merging the infiniband tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_query_device':
drivers/infiniband/hw/mlx5/main.c:248:34: error: 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' 
undeclared (first use in this function)
   if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
   ^
[...]
Really?  Code added half way though the merge window not even build
tested?

It's not quite as bad as it seems.  The infiniband tree itself builds,
the problem is the merged tree.

The Mellanox guys merged the "cleanup"


Hi Roland,

So shit happens... Eli is away this week, but it's clear that this 
portion of the cleanup
was terribly wrongand done by mistake, sorry for that and thanks for 
addressing quickly.


Or.



commit 0c7aac854f52
Author: Eli Cohen 
Date:   Tue Dec 2 02:26:14 2014

 net/mlx5_core: Remove unused dev cap enum fields

 These enumerations are not used so remove them.

 Signed-off-by: Eli Cohen 
 Signed-off-by: David S. Miller 

through davem's tree, and then went ahead and used at least
MLX5_DEV_CAP_FLAG_ON_DMND_PG (which that patch removes) in patches
they merged through my tree.

I'll add a partial revert of that patch to my tree to get back the
now-used enum values.

  - R.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlx5: don't duplicate kvfree()

2014-11-20 Thread Or Gerlitz


On 11/20/2014 10:17 AM, Al Viro wrote:

On Thu, Nov 20, 2014 at 08:13:57AM +, Al Viro wrote:

>  9 files changed, 21 insertions(+), 35 deletions(-)

grr... 8 files changed, actually - that was from the diff that included mlx4
bits.  Patch split correctly and sent in two pieces, summary left as is ;-/
Sorry about the confusion it might cause...

Al,

I think the best way to proceed here is to come up with mlx4 and mlx5 
patches and get both of them in through netdev / net-nexttree, don't 
worry about each of these drivers have a part which is also under 
drivers/infiniband/hw -- it doesn't justify splitting in this case


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 net 3/4] net/mlx4_en: Implement ndo_gso_check()

2014-11-16 Thread Or Gerlitz

On Fri, Nov 14, 2014 at 2:38 AM, Joe Stringer  wrote:
> Use vxlan_gso_check() to advertise offload support for this NIC.
>
> Signed-off-by: Joe Stringer 

FWIW (since the patches is applied already...)

Acked-by: Or Gerlitz 

thanks for addressing this piece.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics

2014-11-06 Thread Or Gerlitz

On Thu, Nov 6, 2014 at 4:44 AM, Tom Herbert  wrote:
> On Wed, Nov 5, 2014 at 6:15 PM, David Miller  wrote:
>> From: Joe Stringer 
>> Date: Wed, 5 Nov 2014 17:06:46 -0800
>>
>>> My impression was that the changes are more likely to be
>>> hardware-specific (like the i40e changes) rather than software-specific,
>>> like changes that might be integrated into the helper.
>>
>> I think there is more commonality amongst hardware capabilities,
>> and this is why I want the helper to play itself out.
>>
>>> That said, I can rework for one helper. The way I see it would be the
>>> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>>> drivers/net/vxlan.c which would be called from each driver. Is that what
>>> you had in mind?
>>
>> Yes.
>
> Note that this code is not VXLAN specific, it will also accept NVGRE
> and GRE/UDP with keyid and TEB. I imagine all these cases should be
> indistinguishable to the hardware so they probably just work (which
> would be cool!). It might be better to name and locate the helper
> function to reflect that.

Unlike the VXLAN case, currently there's no indication from the
network stack towards the driver that an NVGRE tunnel was set, so in
our case we're not arming the HW offloads for NVGRE. I'll look into
that, maybe we can make them work always. Also re the math there to be
the same for VXLAN/NVGRE  -- skb_inner_mac_header(skb) -
skb_transport_header(skb) is exactly 8 (sizeof struct gre_base_hdr),
isn't that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics

2014-11-05 Thread Or Gerlitz

On Wed, Nov 5, 2014 at 8:00 PM, Tom Herbert  wrote:
> On Wed, Nov 5, 2014 at 9:50 AM, Joe Stringer  wrote:
>>
>> On 5 November 2014 04:38, Or Gerlitz  wrote:
>>>
>>> On Tue, Nov 4, 2014 at 11:56 PM, Joe Stringer  
>>> wrote:
>>> > Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other
>>> > UDP-based encapsulation protocols where the format and size of the header 
>>> > may
>>> > differ. This patch series implements ndo_gso_check() for these NICs,
>>> > restricting the GSO handling to something that looks and smells like 
>>> > VXLAN.
>>> >
>>> > Implementation shamelessly stolen from Tom Herbert (with minor fixups):
>>> > http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>>>
>>>
>>> Hi Joe,
>>>
>>> 1st, thanks for picking this task...2nd, for drivers that currently
>>> support only pure VXLAN, I don't see the point
>>> to replicate the helper suggested by Tom (good catch on the size check
>>> to be 16 and not 12) four times and who know how more in the future.
>>> Let's just have one generic helper and make the mlx4/be/fm10k/benet
>>> drivers to have it as their ndo, OK?
>>
>>
>> Thanks for taking a look.
>>
>> I had debated whether to do this or not as the actual support on each NIC 
>> may differ, and each implementation may morph over time to match these 
>> capabilities better. Obviously the vendors will know better than me on this, 
>> so I'm posing this series to prod them for more information. At this point 
>> I've had just one maintainer come back and confirm that this helper is a 
>> good fit for their hardware, so I'd like to confirm that multiple drivers 
>> will use a ndo_gso_check_vxlan_helper() function before I go and create it.
>
>
> Thanks for implementing this fix!
>
> Personally, I would rather not have the helper. This is already a
> small number of drivers, and each driver owner should consider what
> limitations are of their device and try to enable to allow the maximum
> number of use cases possible. I'm also hoping that new devices will
> implement the more generic mechanism so that VXLAN is just one
> supported protocol.

but fact is that the proposed patch series has the --same-- helper for
four drivers, so why not start with a that limited helper which would
be picked up by these drivers and we'll take it from there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-nics] [PATCH net 3/5] fm10k: Implement ndo_gso_check()

2014-11-05 Thread Or Gerlitz

On Wed, Nov 5, 2014 at 2:34 PM, Jeff Kirsher
 wrote:
> On Tue, 2014-11-04 at 13:56 -0800, Joe Stringer wrote:
>> ndo_gso_check() was recently introduced to allow NICs to report the
>> offloading support that they have on a per-skb basis. Add an
>> implementation for this driver which checks for something that looks
>> like VXLAN.
>>
>> Implementation shamelessly stolen from Tom Herbert:
>> http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>>
>> Signed-off-by: Joe Stringer 
>> ---
>> Should this driver report support for GSO on packets with tunnel
>> headers
>> up to 64B like the i40e driver does?
>> ---
>>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 
>>  1 file changed, 12 insertions(+)
>
> Thanks Joe, I will add your patch to my queue.

Hi Jeff, please see my comment on patch 0/5, we're essentially
replicating the same helper four different times (fm10k, mlx4, benet,
qlgc) - I don't see the point in doing so. I asked Joe to come up with
one generic helper and then to pick it up by the four drivers, makes
sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net 4/5] net/mlx4_en: Implement ndo_gso_check()

2014-11-05 Thread Or Gerlitz

On Tue, Nov 4, 2014 at 11:56 PM, Joe Stringer  wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for something that looks
> like VXLAN.
>
> Implementation shamelessly stolen from Tom Herbert:
> http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>
> Signed-off-by: Joe Stringer 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index f3032fe..aca9908 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2344,6 +2344,17 @@ static void mlx4_en_del_vxlan_port(struct  net_device 
> *dev,
>  }
>  #endif
>
> +static bool mlx4_en_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +   if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
> +   (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
> +skb->inner_protocol != htons(ETH_P_TEB) ||
> +skb_inner_mac_header(skb) - skb_transport_header(skb) != 16))
> +   return false;

Let's have this 16 constant to be more clear... e.g make it the sum of
sizeof struct udphdr and struct vxlanhdr - you would need to move the
latter from vxlan.c into a public header. All for the general patch I
suggested

Or.

> +
> +   return true;
> +}
> +
>  static const struct net_device_ops mlx4_netdev_ops = {
> .ndo_open   = mlx4_en_open,
> .ndo_stop   = mlx4_en_close,
> @@ -2374,6 +2385,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
> .ndo_add_vxlan_port = mlx4_en_add_vxlan_port,
> .ndo_del_vxlan_port = mlx4_en_del_vxlan_port,
>  #endif
> +   .ndo_gso_check  = mlx4_en_gso_check,
>  };
>
>  static const struct net_device_ops mlx4_netdev_ops_master = {
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics

2014-11-05 Thread Or Gerlitz

On Tue, Nov 4, 2014 at 11:56 PM, Joe Stringer  wrote:
> Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other
> UDP-based encapsulation protocols where the format and size of the header may
> differ. This patch series implements ndo_gso_check() for these NICs,
> restricting the GSO handling to something that looks and smells like VXLAN.
>
> Implementation shamelessly stolen from Tom Herbert (with minor fixups):
> http://thread.gmane.org/gmane.linux.network/332428/focus=333111


Hi Joe,

1st, thanks for picking this task...2nd, for drivers that currently
support only pure VXLAN, I don't see the point
to replicate the helper suggested by Tom (good catch on the size check
to be 16 and not 12) four times and who know how more in the future.
Let's just have one generic helper and make the mlx4/be/fm10k/benet
drivers to have it as their ndo, OK?

Or.

>
> If there are particular differences for your driver on actual support, I'd 
> like
> to hear about it. I adjusted the i40e driver to report support with tunnel
> headers of up to 64 octets, perhaps there are other specifics that I've 
> missed.
>
> Joe Stringer (5):
>   be2net: Implement ndo_gso_check()
>   i40e: Implement ndo_gso_check()
>   fm10k: Implement ndo_gso_check()
>   net/mlx4_en: Implement ndo_gso_check()
>   qlcnic: Implement ndo_gso_check()
>
>  drivers/net/ethernet/emulex/benet/be_main.c  |   12 
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   12 
>  drivers/net/ethernet/intel/i40e/i40e_main.c  |   14 +-
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |   12 
>  5 files changed, 61 insertions(+), 1 deletion(-)
>
> --
> 1.7.10.4
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: vxlan gro problem ?

2014-10-13 Thread Or Gerlitz

On Mon, Oct 13, 2014 at 11:14 AM, yinpeijun  wrote:
> On 2014/10/13 3:50, Or Gerlitz wrote:
> my test environment use mellanox ConnectX-3 Pro nic ,  as I know the nic 
> support Rx  checksum offload.  but I am not confirm if should  I  do some 
> special configure?
> or the nic driver or firmware need update  ?  also , I have used redhat7.0 
> ovs vxlan to test with the similar configure as before, but there is also no 
> improvement .

The NIC (HW model and firmware) look just fine. As it seems now, this
boils down to get the RHEL7 inbox mlx4 driver to work properly on your
setup, something which goes a bit beyond the interest of the upstream
mailing lists...

Or.

>
> the nic infomation:
>
> 04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 
> Pro]
>
> root@localhost:~# ethtool -i eth4
> driver: mlx4_en
> version: 2.0(Dec 2011)
> firmware-version:  2.31.5050
> bus-info: :04:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: vxlan gro problem ?

2014-10-12 Thread Or Gerlitz


On 10/8/2014 10:46 AM, yinpeijun wrote:

Hi all,
 recently Linux 3.14 has been released and I find the networking has 
added udp gro and vxlan gro funtion, then I use the redhat 7.0(there is also 
add this funtion)
to test, I use kernel vxlan module and  create a vxlan device then attach the 
device to  ovs  bridge , the configure as follow:
root@25:~$ ip link
 15: vxlan0:  mtu 1450 qdisc noqueue 
master ovs-system state UNKNOWN mode DEFAULT
 link/ether be:e1:ae:3d:8b:f2 brd ff:ff:ff:ff:ff:ff
 16: vnet0:  mtu 1400 qdisc mq master 
ovs-system state UNKNOWN mode DEFAULT qlen 5000

root@25:~$ ovs-vsctl show

 aa1294f3-9952-4393-b2b5-54e9a6eb76ee
 Bridge ovs-vx
 Port ovs-vx
 Interface ovs-vx
 type: internal
 Port "vnet0"
 Interface "vnet0"
 Port "vxlan0"
 Interface "vxlan0"
 ovs_version: "2.0.2"

vnet0 is a vm backend device,  and the end is the same configuration. then I 
use netperf to test throughput  in vm (netperf -H  -t TCP_STREAM -l 10 -- 
-m 1460),
the result is 3-4 Gbit/sec, the  improvement  is not obvious,   and I also 
confused there is no aggregation  packets (length > mtu) in the end vm.   so I 
want to know what
wrong ?   or how to test the function ?



As things are set in 3.14 and AFAIK also in RHEL 7.0, for GRO/VXLAN to 
come into play you need to run over a NIC which supports RX checksum 
offload too, is this the case?


Also, the configuration you run with isn't the typical play of VXLAN 
with OVS... I didn't try it out and this week being out to LPC.


Did you try the usual track of running OVS VXLAN port?e.g as explained 
in the Example section of [1]


Or.

[1] http://community.mellanox.com/docs/DOC-1446

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] please pull infiniband.git

2014-09-27 Thread Or Gerlitz

On Wed, Sep 24, 2014 at 12:58 AM, Roland Dreier  wrote:
> Hi Linus,
[..]
> This is later and bigger than I would like, and the blame is all on
> me: I got very busy with other stuff for a few weeks during the 3.17
> cycle, and didn't prepare this tree as soon as I should have.  However
> I don't think there's anything risky here, and no one really cares if
> we break InfiniBand in 3.17 anyway...

Roland, sorry, but I don't understand the joke @ the last sentence,
can you elaborate?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-17 Thread Or Gerlitz


On 9/13/2014 12:16 AM, Or Gerlitz wrote:

Per your request we provided the information on tests conducted with
the patches.

Note that the patches can't really disrupt existing applications that
don't set the new IB_ACCESS_ON_DEMAND MR flag when they register
memory. Also the whole set of changes to the umem area is dependent on building 
with
CONFIG_INFINIBAND_ON_DEMAND_PAGING -- all in all, everything is in
place for protecting against potential regression that this series
could introduce.

As you didn't provide any feedback for > six months, and we have all
the above in place (report on stability tests, performance data and
mechanics to avoid regressions) I think it would be fair to get this
picked for the coming merge window, thoughts?


Roland,

Can you please comment here, not only that you didn't provide any 
feedback on the patches, you
are even not willing to respond if the the data we gave addresses your 
questions on testing and performance. Are you planning to pick this to 
the next merge window?


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-12 Thread Or Gerlitz

On Tue, Sep 9, 2014, Haggai Eran  wrote:
> On 04/09/2014, Roland Dreier wrote:

>> Have you done any review or testing of these changes?  If so can you
>> share the results?

> We have tested this feature thoroughly inside Mellanox. We ran random
> tests that performed MR registrations, memory mappings and unmappings,
> calls to madvise with MADV_DONTNEED for invalidations, sending and
> receiving of data, and RDMA operations. The test validated the integrity
> of the data, and we verified the integrity of kernel memory by running
> the tests under a debugging kernel.

Hi Roland,

Per your request we provided the information on tests conducted with
the patches.

Note that the patches can't really disrupt existing applications that
don't set the new IB_ACCESS_ON_DEMAND MR flag when they register
memory. Also
the whole set of changes to the umem area is dependent on building with
CONFIG_INFINIBAND_ON_DEMAND_PAGING -- all in all, everything is in
place for protecting against potential regression that this series
could introduce.

As you didn't provide any feedback for > six months, and we have all
the above in place (report on stability tests, performance data and
mechanics to avoid regressions) I think it would be fair to get this
picked for the coming merge window, thoughts?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.17-rc1 oops during network interface configuration

2014-09-10 Thread Or Gerlitz



On 9/9/2014 10:30 PM, Chuck Lever wrote:

This crash happens when booting v3.17-rcN on any of my IB-enabled
systems. I have both ConnectX-2 and mthca systems, all are affected.

I bisected this to:

commit e0f31d8498676fda36289603a054d0d490aa2679
Author: Govindarajulu Varadarajan <_gov...@gmx.com>
AuthorDate: Mon Jun 23 16:07:58 2014 +0530
Commit: David S. Miller 
CommitDate: Mon Jun 23 14:32:19 2014 -0700

 flow_keys: Record IP layer protocol in skb_flow_dissect()

 skb_flow_dissect() dissects only transport header type in ip_proto. It 
dose not
 give any information about IPv4 or IPv6.
 This patch adds new member, n_proto, to struct flow_keys. Which records the
 IP layer type. i.e IPv4 or IPv6.
 This can be used in netdev->ndo_rx_flow_steer driver function to dissect 
flow.
 Adding new member to flow_keys increases the struct size by around 4 bytes.
 This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
 qdisc_cb_private_validate()
 So increase data size by 4

 Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com>
 Signed-off-by: David S. Miller 


This commit includes a hunk that increases the size of struct qdisc_skb_cb
by at least 4 bytes:


diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 624f985..a3cfb8e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -231,7 +231,7 @@ struct qdisc_skb_cb {
 unsigned intpkt_len;
 u16 slave_dev_queue_mapping;
 u16 _pad;
-   unsigned char   data[20];
+   unsigned char   data[24];
  };
  
  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)



IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:


struct ipoib_cb {
 struct qdisc_skb_cb qdisc_cb;
 u8  hwaddr[INFINIBAND_ALEN];
};

IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
52 bytes.

Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
field, which contains a pointer. By the time we get into
__dev_queue_xmit() and try to use the result of skb_dst(), that pointer
is garbage, and we oops.

Obviously, cb[] could be increased to 56 bytes to accommodate struct
ipoib_cb. I tried this, and it is effective in preventing the oops on
one of my systems.

But I suspect there is an historical reason I’m not aware of that it
has remained 48 bytes for years.


Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?


Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?


Or.


BUG: unable to handle kernel paging request at 8809007e
IP: __dev_queue_xmit+0x519
Call Trace:
? __dev_queue_xmit+0x49
dev_queue_xmit+0x10
neigh_connected_output
? ip_finish_output
ip_finish_output
? ip_finish_output
? netif_rx_ni
ip_mc_output
ip_local_out_sk
ip_send_skb
udp_send_skb
udp_sendmsg
? ip_reply_glue_bits
? __lock_is_held
inet_sendmsg
? inet_sendmsg
sock_sendmsg
? might_fault
? might_fault
? move_addr_to_kernel.part.38
SYSC_sendto
? sysret_check
? trace_hardirqs_on_caller
? trace_hardirqs_on_thunk
SyS_sendto
system_call_fastpath

Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
drm_kms_helper: panic occurred, switching back to text console

A screenshot of this kernel oops can be found here:
https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/

gdb translates the crash address into the following (not sure this makes sense
since offset 0x519 is past the end of __dev_queue_xmit()):

(gdb) list *(__dev_queue_xmit+0x519)
0x8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
5162void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
5163{
5164struct netdev_adjacent *iter;
5165
5166list_for_each_entry(iter, &dev->adj_list.upper, list) {
5167netdev_adjacent_sysfs_del(iter->dev, oldname,
5168  &iter->dev->adj_list.lower);
5169netdev_adjacent_sysfs_add(iter->dev, dev,
5170  &iter->dev->adj_list.lower);
5171}

And the address __dev_queue_xmit+0x49 is translated by gdb into:

(gdb) list *(__dev_queue_xmit+0x49)
0

Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-03 Thread Or Gerlitz

On Tue, Sep 2, 2014, Or Gerlitz  wrote:
> On 7/3/2014 11:44 AM, Haggai Eran wrote:
>>
>> Hi Roland,
>>
>> I understand that you were reluctant to review these patches as long as
>> there was an ongoing debate on whether or not the i_mmap_mutex should be
>> changed into a spinlock.
>>
>> It seems that the debate concluded with the decision to change it into a
>> rwsem [1], as apparently this provides the optimal performance with the new
>> optimistic spinning patch [2].
>>
>> I believe this means that there will be no problem adding paging support
>> to the RDMA stack that depends on sleepable MMU notifiers.
>
>
> Hi Roland,
>
> The ODP patch set was initially posted whole six months ago (March 2nd,
> 2014). We did it prior to LSF so you can discuss that with Sagi while he's
> there. Well no comment from your side so far. It's really (really) hard to
> do proper kernel development when the sub-system maintainer doesn't provide
> you almost no concrete feedback over half a year.
>
> Can you please go ahead and tell us your position re this features/patches?

Hi Roland,

Bump. Can you comment here? these patches were worked out here for
long time by a dedicated group and implement a strategic feature for
the RDMA industry.
I don't see why the RDMA kernel maintainer can leave the development
team in the air without any comment on their work for half a year.

Or.


>> Changes from V0: http://marc.info/?l=linux-rdma&m=139375790322547&w=2
>>
>> - Rebased against latest upstream / for-next branch.
>> - Removed dependency on patches that were accepted upstream.
>> - Removed pre-patches that were accepted upstream [3].
>> - Add extended uverb call for querying device (patch 1) and use kernel
>> device
>>attributes to report ODP capabilities through the new uverb entry
>> instead of
>>having a special verb.
>> - Allow upgrading page access permissions during page faults.
>> - Minor fixes to issues that came up during regression testing of the
>> patches.
>>
>> The following set of patches implements on-demand paging (ODP) support
>> in the RDMA stack and in the mlx5_ib Infiniband driver.
>>
>> What is on-demand paging?
>>
>> Applications register memory with an RDMA adapter using system calls,
>> and subsequently post IO operations that refer to the corresponding
>> virtual addresses directly to HW. Until now, this was achieved by
>> pinning the memory during the registration calls. The goal of on demand
>> paging is to avoid pinning the pages of registered memory regions (MRs).
>> This will allow users the same flexibility they get when swapping any
>> other part of their processes address spaces. Instead of requiring the
>> entire MR to fit in physical memory, we can allow the MR to be larger,
>> and only fit the current working set in physical memory.
>>
>> This can make programming with RDMA much simpler. Today, developers that
>> are working with more data than their RAM can hold need either to
>> deregister and reregister memory regions throughout their process's
>> life, or keep a single memory region and copy the data to it. On demand
>> paging will allow these developers to register a single MR at the
>> beginning of their process's life, and let the operating system manage
>> which pages needs to be fetched at a given time. In the future, we might
>> be able to provide a single memory access key for each process that
>> would provide the entire process's address as one large memory region,
>> and the developers wouldn't need to register memory regions at all.
>>
>> How does page faults generally work?
>>
>> With pinned memory regions, the driver would map the virtual addresses
>> to bus addresses, and pass these addresses to the HCA to associate them
>> with the new MR. With ODP, the driver is now allowed to mark some of the
>> pages in the MR as not-present. When the HCA attempts to perform memory
>> access for a communication operation, it notices the page is not
>> present, and raises a page fault event to the driver. In addition, the
>> HCA performs whatever operation is required by the transport protocol to
>> suspend communication until the page fault is resolved.
>>
>> Upon receiving the page fault interrupt, the driver first needs to know
>> on which virtual address the page fault occurred, and on what memory
>> key. When handling send/receive operations, this information is inside
>> the work queue. The driver reads the needed work queue elements, and
>> parses them to gather the address and memory

Re: [PATCH for 3.16] IB/mlx5: Enable block multicast loopback for kernel consumers too

2014-07-15 Thread Or Gerlitz


On 25/06/2014 16:44, Or Gerlitz wrote:

Under commit f360d88 we advertize blocking multicast loopback to both
kernel and user-space consumers, but disallow the kernel ones (e.g IPoIB)
to use it with their UD QPs, fix that.

Fixes: f360d88 ('IB/mlx5: Add block multicast loopback support')
Reported-by: Haggai Eran 
Signed-off-by: Eli Cohen 
Signed-off-by: Or Gerlitz 
---


Roland, this needs to go into 3.16 and 3.15-stable too as the bug was 
introduced in 3.15-rc2, please make sure to add it to your 3.16 rc pull 
request, OK?


Or.


  drivers/infiniband/hw/mlx5/qp.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index d13ddf1..bbbcf38 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -675,7 +675,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
int err;

uuari = &dev->mdev.priv.uuari;
-   if (init_attr->create_flags & ~IB_QP_CREATE_SIGNATURE_EN)
+   if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN | 
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK))
return -EINVAL;

if (init_attr->qp_type == MLX5_IB_QPT_REG_UMR)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.10 26/52] net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()

2014-06-24 Thread Or Gerlitz


On 25/06/2014 09:02, Or Gerlitz wrote:

On 24/06/2014 18:50, Greg Kroah-Hartman wrote:
3.10-stable review patch.  If anyone has any objections, please let 
me know.


--

From: Wei Yang 

[ Upstream commit befdf8978accecac2e0739e6b5075afc62db37fe ]


Hi Wei, Dave,

In the same manner you acted for 3.14.y -- for this commit to go into 
3.10.y it must be accompanied by commit 
da1de8dfff09d33d4a5345762c21b487028e25f5  "net/mlx4_core: Keep only 
one driver entry release mlx4_priv" which fixes a bug introduced by 
the former, agree?




NAK, you're right... we only need the 2nd fix for kernels >= 3.14 which 
contain commit 367d56f "net/mlx4: Support shutdown() interface", before 
that -- it's irrelevant, sorry for the spam.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.10 26/52] net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()

2014-06-24 Thread Or Gerlitz


On 24/06/2014 18:50, Greg Kroah-Hartman wrote:

3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Wei Yang 

[ Upstream commit befdf8978accecac2e0739e6b5075afc62db37fe ]


Hi Wei, Dave,

In the same manner you acted for 3.14.y -- for this commit to go into 
3.10.y it must be accompanied by commit 
da1de8dfff09d33d4a5345762c21b487028e25f5  "net/mlx4_core: Keep only one 
driver entry release mlx4_priv" which fixes a bug introduced by the 
former, agree?


Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.10 26/52] net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()

2014-06-24 Thread Or Gerlitz


On 24/06/2014 18:50, Greg Kroah-Hartman wrote:

3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Wei Yang 

[ Upstream commit befdf8978accecac2e0739e6b5075afc62db37fe ]


Hi Wei, Dave,

In the same manner you acted for 3.14.y -- for this commit to go into 
3.10.y it must be accompanied by commit 
da1de8dfff09d33d4a5345762c21b487028e25f5  "net/mlx4_core: Keep only one 
driver entry release mlx4_priv" which fixes a bug introduced by the 
former, agree?


Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH for-next 00/15] Bug fixes for ocrdma driver

2014-06-09 Thread Or Gerlitz

On Wed, Jun 4, 2014 at 8:06 PM, Roland Dreier  wrote:
> On Wed, Jun 4, 2014 at 9:37 AM, Steve Wise  
> wrote:
>> Seems like the subject lines in each patch are getting truncated?
> Yes, truncated in patchwork too -- for example:
> https://patchwork.kernel.org/patch/4292461/
>
> Please fix and resend.

Roland, from quick looking on this series of fifteen patches, only 1/3
of them (4/5/12/14) have non-empty change-log. I would expect kernel
work which goes upstream through linux-rdma to use a bit of higher
standards, agree?

Selvin, can you sit down and come up with change-logs to your driver updates?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-28 Thread Or Gerlitz

On Wed, May 28, 2014 at 5:05 PM, Roland Dreier  wrote:
>
>
> > Roland, so M2 starts worrying, you have picked into your for-next
> > branch yesterday the other pending series, but not this one, so?
>
> Who is M2?

me too
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-28 Thread Or Gerlitz

On Mon, May 26, 2014 at 1:52 PM, Jiri Kosina  wrote:
> On Mon, 19 May 2014, Roland Dreier wrote:
>
>> > Roland, we're soon on -rc6 and there's no reason for this to miss
>>> 3.16, could you please comment whether you want it to go through your
>>> tree or net-next?

>> I will pick it up.

> Thanks Roland.
> Sorry for being a bit persistent, but another week has passed, -rc7 is out
> and I still don't see it in your tree. Could you please clarify what is
> the destiny of this patchset?

Roland, so M2 starts worrying, you have picked into your for-next
branch yesterday the other pending series, but not this one, so?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next V6 1/2] cpumask: Utility function to set n'th cpu - local cpu first

2014-05-27 Thread Or Gerlitz

On Tue, May 27, 2014 at 10:24 PM, David Miller  wrote:

> I would like someone who cares about these cpumask interfaces to provide
> a review.

understood, still, looking in the git log of that file didn't yield
much only 1-2 commits per years for 2011/12/13, so, any concrete
suggestion?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-17 Thread Or Gerlitz

On Tue, May 13, 2014 at 2:38 PM, Jiri Kosina  wrote:
> On Sun, 11 May 2014, Or Gerlitz wrote:

>> This series is a refactored form of the one posted by Jiri Kosina
>> to LKML and netdev according to the discussion that followed
>> and the guidelines you provided here https://lkml.org/lkml/2014/3/5/250

>> Basically, the functionality changes introduced by this series fully
>> reside on the IB side of things, so I am only posting the actual patches
>> to linux-rdma with CC on the cover-letter to the lists that were on V0.

> Thanks, I am fine with my Signoff on that.
> Roland, is this going to be merged by you, or should this go to DaveM 
> directly?

Roland, we're soon on -rc6 and there's no reason for this to miss
3.16, could you please comment whether you want it to go through your
tree or net-next?

> Jiri Kosina
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers: net: ethernet: mellanox: mlx4: let mlx4 depend on SMP

2014-05-17 Thread Or Gerlitz

On Sat, May 17, 2014 at 8:36 AM, David Miller  wrote:
>
> From: Chen Gang 
> Date: Sat, 17 May 2014 13:26:16 +0800
>
> > 'struct irq_affinity_notify' and the related functions are only defined
> > when SMP enabled, so at present, mlx4 has to only run under SMP.
> >
> > The related error (allmodconfig under unicore32):
>
> Making the entire driver depend upon SMP is not the answer,


Indeed, we would do that just for the relevant portion,


>
> other Mellanox developers said that a proper fix is pending
> so please be patient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: randconfig build error with next-20140515, in drivers/net/ethernet/mellanox/mlx4/eq.c

2014-05-15 Thread Or Gerlitz


On 15/05/2014 16:48, Jim Davis wrote:

Building with the attached random configuration file,

drivers/net/ethernet/mellanox/mlx4/eq.c:58:29: error: field ‘notify’
has incomplete type
   struct irq_affinity_notify notify;
  ^
In file included from include/linux/interrupt.h:5:0,
  from drivers/net/ethernet/mellanox/mlx4/eq.c:34:
drivers/net/ethernet/mellanox/mlx4/eq.c: In function ‘mlx4_irq_notifier_notify’:
include/linux/kernel.h:834:48: warning: initialization from
incompatible pointer type [enabled by default]
   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
 ^
drivers/net/ethernet/mellanox/mlx4/eq.c:1094:30: note: in expansion of
macro ‘container_of’
   struct mlx4_irq_notify *n = container_of(notify,
   ^
include/linux/kernel.h:834:48: warning: (near initialization for ‘n’)
[enabled by default]
   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
 ^
drivers/net/ethernet/mellanox/mlx4/eq.c:1094:30: note: in expansion of
macro ‘container_of’
   struct mlx4_irq_notify *n = container_of(notify,
   ^
drivers/net/ethernet/mellanox/mlx4/eq.c:1104:24: error: dereferencing
pointer to incomplete type
if (cq->irq == notify->irq)
 ^
In file included from include/linux/interrupt.h:5:0,
  from drivers/net/ethernet/mellanox/mlx4/eq.c:34:
drivers/net/ethernet/mellanox/mlx4/eq.c: In function
‘mlx4_release_irq_notifier’:
include/linux/kernel.h:834:48: warning: initialization from
incompatible pointer type [enabled by default]
   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
 ^
drivers/net/ethernet/mellanox/mlx4/eq.c::30: note: in expansion of
macro ‘container_of’
   struct mlx4_irq_notify *n = container_of(ref, struct mlx4_irq_notify,
   ^
include/linux/kernel.h:834:48: warning: (near initialization for ‘n’)
[enabled by default]
   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
 ^
drivers/net/ethernet/mellanox/mlx4/eq.c::30: note: in expansion of
macro ‘container_of’
   struct mlx4_irq_notify *n = container_of(ref, struct mlx4_irq_notify,
   ^
drivers/net/ethernet/mellanox/mlx4/eq.c: In function ‘mlx4_assign_irq_notifier’:
drivers/net/ethernet/mellanox/mlx4/eq.c:1133:2: error: implicit
declaration of function ‘irq_set_affinity_notifier’
[-Werror=implicit-function-declaration]
   err = irq_set_affinity_notifier(irq, &irq_notifier->notify);
   ^
cc1: some warnings being treated as errors


I can see the problem, irq_set_affinity_notifier isn't defined when CONFIG_SMP 
isn't set, I am copying here the developer and he would work on a fix.

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-11 Thread Or Gerlitz

Hi Roland,

This series is a refactored form of the one posted by Jiri Kosina 
to LKML and netdev according to the discussion that followed
and the guidelines you provided here https://lkml.org/lkml/2014/3/5/250

Basically, the functionality changes introduced by this series fully 
reside on the IB side of things, so I am only posting the actual patches 
to linux-rdma with CC on the cover-letter to the lists that were on V0.

Or.

As described in the change log of patch #2 this series comes to address 
a problem whereby NFS client writes would enter uninterruptible sleep 
forever. The issue happened when using NFS over IPoIB connected mode.

The problem encountered was described as follows: it's not memory
reclamation that is the problem as such. There is an indirect dependency
between network filesystems writing back pages and ipoib_cm_tx_init()
due to how a kworker is used. Page reclaim cannot make forward progress
until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself
waiting for network transmission. Ordinarily this situation may be
avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init()
does not have that information.

To address this, we take a more general approach vs. V0 and generalize 
the solution such that when the new QP creation flag is provided, the
HW driver should use a GFP_NOIO for the memory allocations related
to the new QP.

changes from V0:
 - removed the module param for IPoIB, the connected mode code would
   attempt to use GFP_NOIO for the QP creation and fallback to GFP_KERNEL
   (as before) if the HW driver doesn't support that. This approach will let
   

Jiri Kosina (1):
  mlx4: Enhance the QP creation path to use a given GFP directive

Or Gerlitz (2):
  IB: Return error when QP creation are provided for driver not supporting that
  IB: Add a QP creation flag to allow specifying a NOIO allocation directive

 drivers/infiniband/hw/mlx4/cq.c|6 ++--
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |1 +
 drivers/infiniband/hw/mlx4/qp.c|   30 +++
 drivers/infiniband/hw/mlx4/srq.c   |7 ++--
 drivers/infiniband/hw/qib/qib_qp.c |3 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c   |3 ++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|   19 ++--
 drivers/net/ethernet/mellanox/mlx4/alloc.c |   27 +
 drivers/net/ethernet/mellanox/mlx4/cq.c|4 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |6 ++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |2 +-
 drivers/net/ethernet/mellanox/mlx4/icm.c   |7 ++--
 drivers/net/ethernet/mellanox/mlx4/icm.h   |3 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |4 +-
 drivers/net/ethernet/mellanox/mlx4/mr.c|   17 ++-
 drivers/net/ethernet/mellanox/mlx4/qp.c|   20 ++--
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |4 +-
 drivers/net/ethernet/mellanox/mlx4/srq.c   |4 +-
 include/linux/mlx4/device.h|   10 --
 include/rdma/ib_verbs.h|1 +
 20 files changed, 104 insertions(+), 74 deletions(-)

Cc: Jiri Kosina 
Cc: Mel Gorman 
Cc: net...@vger.kernel.org
Cc: linux-kernel 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-04-24 Thread Or Gerlitz

On Thu, Apr 24, 2014 at 8:03 PM, Jiri Kosina  wrote:
>
> On Tue, 11 Mar 2014, Or Gerlitz wrote:

[...]
>
> > So sounds like a plan that makes sense?
>
> Hi everybody, seems like this fell through cracks,

Hi Jiri,

I sent you private note on Mar 19th saying "are you on track with this
for 3.15? the merge window is coming soon and you have 99% of what you
need -- lets get there" which seems to be the piece that fell between
the cracks.

>
> probably because it wasn't clearly stated *who* will be preparing patch(es) 
> that'd be implementing the ideas above. My understanding was that it'd be 
> Mellanox, given that they basically own the driver,

to be precise partially own the mlx4 HW driver (Roland is the
mlx4_core/ib author and maintainer, we are asking for few years to get
co-maintainer hat there, no success so far), the problem you have
described can happen with any HW driver, but as stated earlier, the
suggested plan will fix ipoib and mlx4 and following that more hw
drivers can be enhanced to support that too.

>
> have the best testing coverage compared to very very limited
> testing coverage I can do, and will be pushing it upstream anyway.
>
> I believe all the above can be easily created on top of the original patch I 
> sent.

Indeed, so will take a look next week and let you know if it works for me

>
>
> So ... was there a misunderstanding on who is going to do it? :)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-03-11 Thread Or Gerlitz


On 05/03/2014 21:25, Roland Dreier wrote:

It's quite clear that this is a general problem with IPoIB connected
mode on any IB device.  In connected mode, a packet send can trigger
establishing a new connection, which will allocate a new QP, which in
particular will allocate memory for the QP in the low-level IB device
driver.  Currently I'm positive that every driver will do GFP_KERNEL
allocations when allocating a QP (ehca does both a GFP_KERNEL
kmem_cache allocation and vmalloc in internal_create_qp(), mlx5 and
mthca are similar to mlx4 and qib does vmalloc() in qib_create_qp()).
So this patch needs to be extended to the other 4 IB device drivers in
the tree.

Also, I don't think GFP_NOFS is enough -- it seems we need GFP_NOIO,
since we could be swapping to a block device over iSCSI over IPoIB-CM,
so even non-FS stuff could deadlock.

I don't think it makes any sense to have a "do_not_deadlock" module
parameter, especially one that defaults to "false."  If this is the
right thing to do, then we should just unconditionally do it.

It does seem that only using GFP_NOIO when we really need to would be
a very difficult problem--how can we carry information about whether a
particular packet is involved in freeing memory through all the layers
of, say, NFS, TCP, IPSEC, bonding, &c?


Agree with all the above... next,

If we don't have away to nicely overcome the layer violations here, 
let's change IPoIB so they always ask the
IB driver to allocate QPs used for Connected Mode in a GFP_NOIO manner, 
to be practical I suggest the following:


1. Add new QP creation flag IB_QP_CREATE_USE_GFP to the existing 
creation flags of struct ib_qp_init_attr

and a new "gfp_t gfp" field to that structure too

2. in the IPoIB CM code, do the vzalloc allocation for new connection in 
GFP_NOIO manner and issue
the call to create QP with setting the IB_QP_CREATE_USE_GFP flag and 
GFO_NOIO to the gfp field


3. If the QP creation fails, with -EINVAL, issue a warning and retry the 
QP creation attempt without the GFP setting


4. implement in the mlx4 driver the support for GFP directives on QP 
creation


5. for the rest of the IB drivers, return -EINVAL if 
IB_QP_CREATE_USE_GFP is set


This will allow to provide working solution for mlx4 users and gradually 
add support for the rest of the IB drivers.


as for proper patch planning

patch #1 / items 1 and 5
patch #2 / item 4
patch #3 / item 3

Re item 5 -- I made a check and the ehca, ipath and mthca driver already 
return -EINVAL if provided with any creation flag, so you only need to 
patch the qib driver in qib_create_qp() to do that as well which is trivial.


As for the rest of the code, you practically have it all by now, just 
need to port the mlx4 changes you did to the suggested framework, remove 
the module param (which you don't like either) and add the new gfp_t 
field to ib_qp_init_attr


So sounds like a plan that makes sense?

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-03-06 Thread Or Gerlitz


On 21/02/2014 23:53, Jiri Kosina wrote:

This was originally a patch from Matthew Finlay  that
addressed a problem whereby NFS writes would enter uninterruptible sleep
forever.  The issue happened when using NFS over IPoIB. This is not a
recommended configuration as RDMA is preferred but it is still a valid
configuration and is important to have in situations where the NFS server
does not support RDMA. The problem encountered was described as follows:

It's not memory reclamation that is the problem as such. There is
an indirect dependency between network filesystems writing back
pages and ipoib_cm_tx_init() due to how a kworker is used. Page
reclaim cannot make forward progress until ipoib_cm_tx_init()
succeeds and it is stuck in page reclaim itself waiting for network
transmission. Ordinarily this sitaution may be avoided by having
the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that 
information.



Hi Jiri,

Reading again (*) the problem description, the team here would be happy 
to clarify with you some details (possibly

few MM newbie questions, but it will help us):

1. just to make sure, the problem happen on the NFS client, not the NFS 
server, right? so writing-back means client

writing over the NFS mount --> network

2. you wrote "due to how a kworker is used", can you clarify if/why 
things go wrong b/c of the kworker usage, or this is matter of phrasing?


in earlier post over this thread you wrote "There was a problem with 
swapping over NFS, as writeback was deadlocked with memory reclaim 
(memory needs to be allocated so that > swap could be accessed to 
reclaim memory). That's fixed by allocating the buffers from PF_MEMALLOC 
reserve, introduced by Mel's and Peter's patchset back in 3.9 or so. Oh, 
and the same has been done for swapping over NBD, btw", in that respect:


3. you mentioned that the memory allocations in ipoib_cm_tx_init() and 
ib_create_qp() --> mlx4 driver requires
page reclaim and waits for network transmission, so this client node put 
their swap over that NFS partition?


4. Can you shed more light, why the problem hits also for kmalloc based 
allocations and not only for vmalloc
based allocation e.g not only b/c of the vzalloc call in 
ipoib_cm_tx_init but rather also b/c of misc kmalloc calls within

the HW (here mlx4) driver?

thanks,

Or.

(*) and sorry for my stupid question from yesterday, sometimes it's bad 
idea to ask questions on mailing lists when you are very tired

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-03-05 Thread Or Gerlitz

On Wed, Feb 26, 2014 at 12:11 AM, Jiri Kosina  wrote:
> The problem encountered was described as follows:
> It's not memory reclamation that is the problem as such. There is
> an indirect dependency between network filesystems writing back
> pages and ipoib_cm_tx_init() due to how a kworker is used. Page
> reclaim cannot make forward progress until ipoib_cm_tx_init()
> succeeds and it is stuck in page reclaim itself waiting for network
> transmission. Ordinarily this sitaution may be avoided by having
> the caller use GFP_NOFS but ipoib_cm_tx_init() does not have
> that information.

So to hit the bug, one just needs to attempt doing NFS client mount
over an IP subnet served by IPoIB NIC that uses connected-mode and
runs over mlx4 device?  Or this happens when  the connection is going
through teardown/re-establishment or something else?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-03-05 Thread Or Gerlitz


On 05/03/2014 00:48, Jiri Kosina wrote:

On Thu, 27 Feb 2014, Jiri Kosina wrote:


On Thu, 27 Feb 2014, Or Gerlitz wrote:


ipoib is coded over the verbs API (include/rdma/ib_verbs.h)  --- so tracking
the path from ipoib through the verbs api into mlx4 should be similar exercise
as doing so for mlx5, but let's 1st treat the higher level elements involved
with this patch.

Can you shed some light why the problem happens only for NFS, and not for
example with other IP/TCP storage protocols?

For example, do you expect it to happen with iSCSI/TCP too? the Linux
iSCSI initiator 1st open a TCP socket from user space to the target,
next they do login exchange over this socket and later provide the
socket to the kernel iscsi code to use as the back-end of a SCSI block
device registered with the SCSI midlayer

Frankly, no idea. There was a problem with swapping over NFS, as writeback
was deadlocked with memory reclaim (memory needs to be allocated so that
swap could be accessed to reclaim memory). That's fixed by allocating the
buffers from PF_MEMALLOC reserve, introduced by Mel's and Peter's patchset
back in 3.9 or so. Oh, and the same has been done for swapping over NBD,
btw. Maybe iSCSI needs similar treatment, maybe it has it already, I
haven't checked. We haven't seen a bugreport for that though.


I don't think we have, and it indeed should be rather easy to add. The
more challenging part of the problem is where (and based on which
data) the flag would actually be set up on the netdevice so that it's
not horrible layering violation.

I assume that in the same manner netdevices advertize features to the
networking core, the core can provide them operating directives after
they register themselves.

Whatever suits you best. To sum it up:

- mlx4 is confirmed to have this problem, and we know how that problem
   happens -- see the paragraph in the changelog explaining the dependency
   between memory reclaim and allocation of TX ring

- we have a work around which requires human interaction in order
   to provide the information whether GFP_NOFS should be used or not

- I can very well understand why Mellanox would see that as a hack, but if
   more comprehensive fix is necessary, I'd expect those who understand
   the code the best to come up with a solution/proposal. I'd assume that
   you don't  want to keep the code with known and easily triggerable
   deadlock out there unfixed.

- where I see the potential for layering violation in any 'general'
   solution is that it's the filesystem that has to be "talking" to the
   underlying netdevice, i.e. you'll have to make filesystem
   netdevice-aware, right?

Mellanox folks, do you have any plan how to proceed here please?



Hi Jiri,

Yep, we will look on that. I think we still have few directions to 
resolve here


1. (our task) deeper understanding of the problem

2. if the solution goes in the way you took it, look for

2.1 a more generic verbs interface, e.g QP creation flag  that dictates 
the GFP_YYY to use when allocating memory

for that QP, e.g NOIO, NOFS, ATOMIC, etc

2.2 a more programmable interface for the file-system to let the NIC 
know they are under constraint YYY for their memory

allocations, maybe per nieghbour? maybe use netdevice private flags?

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux rdma 3.14 merge plans

2014-03-05 Thread Or Gerlitz

On 05/03/2014 17:18, Roland Dreier wrote:

On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger
  wrote:

>That all said, do you have an objection wrt taking this bits through
>target-pending..?  Given the dependencies involved, that would seem the
>most logical path to take.

Perhaps not surprisingly, I would prefer to get a chance to review a
major change to the core RDMA midlayer rather than having you merge it
through your tree.  So yes I do object.  Please give me a chance to
review and merge it.  I am working on that this week.

Hi Roland, we're very happy to hear that!!

As you know, we are soon marking whole five months!! (patches posted Oct 
15th/2013) of sitting and waiting to your feedback, lost few upstream 
releases on the way. Let's get this into the works.

Or.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-02-27 Thread Or Gerlitz


On 27/02/2014 11:48, Jiri Kosina wrote:

On Wed, 26 Feb 2014, Or Gerlitz wrote:


But let's make sure that we don't diverge from the original problem too
much. Simple fact is that the deadlock is there when using connected mode,
and there is nothing preventing users from using it this way, therefore I
believe it should be fixed one way or another.

the patch is titled with "mlx4:" -- do you expect the problem to come
into play only when ipoib connected mode runs over the mlx4 driver?
what's about mlx5 or other upstream IB drivers?

Honestly, I have no idea. I am pretty sure that Mellanox folks have much
better understanding of the mlx* driver internals than I do. I tried to
figure out where mlx5 is standing in this respect, but I don't even see
where ipoib_cm_tx->tx_ring is being allocated there.


ipoib is coded over the verbs API (include/rdma/ib_verbs.h)  --- so 
tracking the path from ipoib through the verbs api into mlx4 should be 
similar exercise as doing so for mlx5, but let's 1st treat the higher 
level elements involved with this patch.


Can you shed some light why the problem happens only for NFS, and not 
for example with other IP/TCP storage protocols?


For example, do you expect it to happen with iSCSI/TCP too? the Linux 
iSCSI initiator 1st open a TCP socket from user space to the target, 
next they do login exchange over this socket and later provide the 
socket to the kernel iscsi code to use as the back-end of  a SCSI block 
device registered with the SCSI midlayer






I'll be looking on the details of the problem/solution,

Awesome, thanks a lot, that's highly appreciated.


Do we have a way to tell a net-device instance they should do their
memory allocations in a NOFS manner? if not, shouldn't we come up with
more general injection method?

I don't think we have, and it indeed should be rather easy to add. The
more challenging part of the problem is where (and based on which data)
the flag would actually be set up on the netdevice so that it's not
horrible layering violation.



I assume that in the same manner netdevices advertize features to the 
networking core, the core can provide them

operating directives after they register themselves.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-02-26 Thread Or Gerlitz

On Wed, Feb 26, 2014 at 12:55 AM, Jiri Kosina  wrote:
> On Wed, 26 Feb 2014, Or Gerlitz wrote:
[...]
> That definitely can be verified, and I am putting it on my TODO list.

OK, thanks


> But let's make sure that we don't diverge from the original problem too
> much. Simple fact is that the deadlock is there when using connected mode,
> and there is nothing preventing users from using it this way, therefore I
> believe it should be fixed one way or another.

the patch is titled with "mlx4:" -- do you expect the problem to come
into play only when ipoib connected mode runs over the mlx4 driver?
what's about mlx5 or other upstream IB drivers?

I'll be looking on the details of the problem/solution, but this way
or another the API being module param sounds more like a hack

Do we have a way to tell a net-device instance they should do their
memory allocations in a NOFS manner? if not, shouldn't we come up with
more general injection method?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-02-25 Thread Or Gerlitz

On Wed, Feb 26, 2014 at 12:40 AM, Jiri Kosina  wrote:
> On Wed, 26 Feb 2014, Or Gerlitz wrote:
>
>>>> And what happens if you use IPoIB datagram mode, is/why the patch is
>>>> needed there?

>>> I admittedly am no infiniband expert, but my understanding is that in
>>> principle Connected/Datagram mode is about MTU and checksum
>>> offloading


>> yes, the differences between the mode relate to these aspects, however

> Thanks for confirming

Still, even if different, I still don't see why not use datagram mode
if the problem hits you only for connected mode. E.g datagram mode
supports LSO/GRO and TX/RX checksum offloads which should cover on the
smaller MTU vs. connected mode


>> > but the TX path is the same. Please correct me if I am wrong.

>> no, note that your patch only touched drivers/infiniband/ulp/ipoib/ipoib_cm.c
>> which is basically compiled out if you set CONFIG_INFINIBAND_IPOIB_CM,
>> so surely the TX path for the datagram vs. connected modes are
>> different.

> Yes, but for datagram mode, the tx_ring is allocated in a completely
> different way (not from kworker), so this might be a non-issue, right? I
> will have to look into it more deeply to be really sure; if you can
> provide your insight, that'd be helpful.


Note that even when operating in connected mode, the ipoib net-device
instance can speak in datagram mode with remote nodes who don't
support connected mor and/or when sending multicast -- specifically
ipoib_dev_init() does the setup of the TX ring. Maybe you can just try
this out and see if it works?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 141 matches

Mail list logo