Re: [Gluster-users] RDMA Client Hang Problem

2018-04-25 Thread Raghavendra Gowdappa
+Amar, +Rafi - Other maintainers and Peers of transport/rdma

* Can you attach logs from client and brick? Please set
diagnostics.client-log-level and diagnostics.brick-log-level to TRACE
before starting your tests.
* Does fuse client recover from hang?

I think we might not be handling the poll_err path correctly. The fact that
we see issues only after brick reboots we are seeing the issues, makes me
suspect the error path.

regards,
Raghavendra

On Wed, Apr 25, 2018 at 6:05 PM, Necati E. SISECI  wrote:

> Thank you for your mail.
>
> ibv_rc_pingpong seems working between servers and client. Also udaddy,
> ucmatose, rping etc are working.
>
> root@gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
>   local address:  LID 0x, QPN 0x0001e4, PSN 0x10090e, GID
> fe80::ee0d:9aff:fec0:1dc8
>   remote address: LID 0x, QPN 0x00014c, PSN 0x09402b, GID
> fe80::ee0d:9aff:fec0:1b14
> 8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
> 1000 iters in 0.01 seconds = 8.23 usec/iter
>
> root@cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
>   local address:  LID 0x, QPN 0x00014c, PSN 0x09402b, GID
> fe80::ee0d:9aff:fec0:1b14
>   remote address: LID 0x, QPN 0x0001e4, PSN 0x10090e, GID
> fe80::ee0d:9aff:fec0:1dc8
> 8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
> 1000 iters in 0.01 seconds = 7.78 usec/iter
>
>
> Thank you.
>
> Necati.
>
>
> On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
>
> Is infiniband itself working fine? You can run tools like ibv_rc_pingpong
> to find out.
>
> On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI 
> wrote:
>
>> Dear Gluster-Users,
>>
>> I am experiencing RDMA problems.
>>
>> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel,
>> MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers.
>> All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers
>> are connected via Mellanox SN2100 Switch.
>>
>> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers.
>> These 3 boxes are running as gluster cluster. Additionally, I have
>> installed Glusterfs Client to the last one.
>>
>> I have created Gluster Volume with this command:
>>
>> # gluster volume create db transport rdma replica 3 arbiter 1
>> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force
>>
>> (network.ping-timeout is 3)
>>
>> Then I have mounted this volume using mount command below.
>>
>> mount -t glusterfs -o transport=rdma gluster1:/db /db
>>
>> After mountings "/db", I can access the files.
>>
>> The problem is, when I reboot one of the cluster nodes, fuse client gives
>> this error below and hangs.
>>
>> [2018-04-17 07:42:55.506422] W [MSGID: 103070]
>> [rdma.c:4284:gf_rdma_handle_failed_send_completion]
>> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error
>> wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len
>> = 0, post->reused = 135*
>>
>> When I change transport mode from rdma to tcp, fuse client works well. No
>> hangs.
>>
>> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on
>> Ubuntu 16.04.4 and Centos 7.4. But results were the same.
>>
>> Thank you.
>> Necati.
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RDMA Client Hang Problem

2018-04-25 Thread Necati E. SISECI

Thank you for your mail.

ibv_rc_pingpong seems working between servers and client. Also udaddy, 
ucmatose, rping etc are working.


root@gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
  local address:  LID 0x, QPN 0x0001e4, PSN 0x10090e, GID 
fe80::ee0d:9aff:fec0:1dc8
  remote address: LID 0x, QPN 0x00014c, PSN 0x09402b, GID 
fe80::ee0d:9aff:fec0:1b14

8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
1000 iters in 0.01 seconds = 8.23 usec/iter

root@cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
  local address:  LID 0x, QPN 0x00014c, PSN 0x09402b, GID 
fe80::ee0d:9aff:fec0:1b14
  remote address: LID 0x, QPN 0x0001e4, PSN 0x10090e, GID 
fe80::ee0d:9aff:fec0:1dc8

8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
1000 iters in 0.01 seconds = 7.78 usec/iter


Thank you.

Necati.

On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
Is infiniband itself working fine? You can run tools like 
ibv_rc_pingpong to find out.


On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI > wrote:


Dear Gluster-Users,

I am experiencing RDMA problems.

I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic
kernel, MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4
different servers. All of them has Mellanox ConnectX-4 LX dual
port NICs. These four servers are connected via Mellanox SN2100
Switch.

I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3
servers. These 3 boxes are running as gluster cluster.
Additionally, I have installed Glusterfs Client to the last one.

I have created Gluster Volume with this command:

# gluster volume create db transport rdma replica 3 arbiter 1
gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force

(network.ping-timeout is 3)

Then I have mounted this volume using mount command below.

mount -t glusterfs -o transport=rdma gluster1:/db /db

After mountings "/db", I can access the files.

The problem is, when I reboot one of the cluster nodes, fuse
client gives this error below and hangs.

[2018-04-17 07:42:55.506422] W [MSGID: 103070]
[rdma.c:4284:gf_rdma_handle_failed_send_completion]
0-rpc-transport/rdma: *send work request on `mlx5_0' returned
error wc.status = 5, wc.vendor_err = 245, post->buf =
0x7f8b92016000, wc.byte_len = 0, post->reused = 135*

When I change transport mode from rdma to tcp, fuse client works
well. No hangs.

I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs)
on Ubuntu 16.04.4 and Centos 7.4. But results were the same.

Thank you.

Necati.

___
Gluster-users mailing list
Gluster-users@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users





___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RDMA Client Hang Problem

2018-04-25 Thread Raghavendra Gowdappa
Is infiniband itself working fine? You can run tools like ibv_rc_pingpong
to find out.

On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI  wrote:

> Dear Gluster-Users,
>
> I am experiencing RDMA problems.
>
> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel,
> MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers.
> All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers
> are connected via Mellanox SN2100 Switch.
>
> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers.
> These 3 boxes are running as gluster cluster. Additionally, I have
> installed Glusterfs Client to the last one.
>
> I have created Gluster Volume with this command:
>
> # gluster volume create db transport rdma replica 3 arbiter 1
> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force
>
> (network.ping-timeout is 3)
>
> Then I have mounted this volume using mount command below.
>
> mount -t glusterfs -o transport=rdma gluster1:/db /db
>
> After mountings "/db", I can access the files.
>
> The problem is, when I reboot one of the cluster nodes, fuse client gives
> this error below and hangs.
>
> [2018-04-17 07:42:55.506422] W [MSGID: 103070] 
> [rdma.c:4284:gf_rdma_handle_failed_send_completion]
> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error
> wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len
> = 0, post->reused = 135*
>
> When I change transport mode from rdma to tcp, fuse client works well. No
> hangs.
>
> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on
> Ubuntu 16.04.4 and Centos 7.4. But results were the same.
>
> Thank you.
> Necati.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] RDMA Client Hang Problem

2018-04-25 Thread Necati E. SISECI

Dear Gluster-Users,

I am experiencing RDMA problems.

I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, 
MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. 
All of them has Mellanox ConnectX-4 LX dual port NICs. These four 
servers are connected via Mellanox SN2100 Switch.


I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. 
These 3 boxes are running as gluster cluster. Additionally, I have 
installed Glusterfs Client to the last one.


I have created Gluster Volume with this command:

# gluster volume create db transport rdma replica 3 arbiter 1 
gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force


(network.ping-timeout is 3)

Then I have mounted this volume using mount command below.

mount -t glusterfs -o transport=rdma gluster1:/db /db

After mountings "/db", I can access the files.

The problem is, when I reboot one of the cluster nodes, fuse client 
gives this error below and hangs.


[2018-04-17 07:42:55.506422] W [MSGID: 103070] 
[rdma.c:4284:gf_rdma_handle_failed_send_completion] 
0-rpc-transport/rdma: *send work request on `mlx5_0' returned error 
wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, 
wc.byte_len = 0, post->reused = 135*


When I change transport mode from rdma to tcp, fuse client works well. 
No hangs.


I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on 
Ubuntu 16.04.4 and Centos 7.4. But results were the same.


Thank you.

Necati.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users