Re: [gpfsug-discuss] Problems with remote mount via routed IB

2018-02-26 Thread Aaron Knister

Hi Jan Erik,

It was my understanding that the IB hardware router required RDMA CM to 
work. By default GPFS doesn't use the RDMA Connection Manager but it can 
be enabled (e.g. verbsRdmaCm=enable). I think this requires a restart on 
clients/servers (in both clusters) to take effect. Maybe someone else on 
the list can comment in more detail-- I've been told folks have 
successfully deployed IB routers with GPFS.


-Aaron

On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote:


Dear all

we are currently trying to remote mount a file system in a routed Infiniband 
test setup and face problems with dropped RDMA connections. The setup is the 
following:

- Spectrum Scale Cluster 1 is setup on four servers which are connected to the 
same infiniband network. Additionally they are connected to a fast ethernet 
providing ip communication in the network 192.168.11.0/24.

- Spectrum Scale Cluster 2 is setup on four additional servers which are 
connected to a second infiniband network. These servers have IPs on their IB 
interfaces in the network 192.168.12.0/24.

- IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated 
machine.

- We have a dedicated IB hardware router connected to both IB subnets.


We tested that the routing, both IP and IB, is working between the two clusters 
without problems and that RDMA is working fine both for internal communication 
inside cluster 1 and cluster 2

When trying to remote mount a file system from cluster 1 in cluster 2, RDMA 
communication is not working as expected. Instead we see error messages on the 
remote host (cluster 2)


2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 1
2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3
2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1
2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1
2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 0
2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0
2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0
2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 2
2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3


and in the cluster with the file system (cluster 1)

2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 
gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129
2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 
(iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to 
RDMA read error IBV_WC_RETRY_EXC_ERR index 3
2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 
192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 
fabnum 0 sl 0 index 3
2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 

[gpfsug-discuss] Problems with remote mount via routed IB

2018-02-26 Thread Sundermann, Jan Erik (SCC)

Dear all

we are currently trying to remote mount a file system in a routed Infiniband 
test setup and face problems with dropped RDMA connections. The setup is the 
following: 

- Spectrum Scale Cluster 1 is setup on four servers which are connected to the 
same infiniband network. Additionally they are connected to a fast ethernet 
providing ip communication in the network 192.168.11.0/24.

- Spectrum Scale Cluster 2 is setup on four additional servers which are 
connected to a second infiniband network. These servers have IPs on their IB 
interfaces in the network 192.168.12.0/24.

- IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated 
machine.

- We have a dedicated IB hardware router connected to both IB subnets.


We tested that the routing, both IP and IB, is working between the two clusters 
without problems and that RDMA is working fine both for internal communication 
inside cluster 1 and cluster 2

When trying to remote mount a file system from cluster 1 in cluster 2, RDMA 
communication is not working as expected. Instead we see error messages on the 
remote host (cluster 2)


2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 1
2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3
2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1
2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1
2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 0
2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0
2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0
2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 2
2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3


and in the cluster with the file system (cluster 1)

2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 
gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 
2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 
(iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to 
RDMA read error IBV_WC_RETRY_EXC_ERR index 3
2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 
192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 
fabnum 0 sl 0 index 3
2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 
gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 
2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 
(iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to 
RDMA read error IBV_WC_RETRY_EXC_ERR index 3
2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 
192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 
fabnum 0 sl 0 index 3
2018-02-23_13:48:32.523+0100: [E] VERBS 

Re: [gpfsug-discuss] Finding all bulletins and APARs

2018-02-26 Thread IBM Spectrum Scale
Hi John,

For all Flashes, alerts and bulletins for IBM Spectrum Scale, please check
this link:
https://www.ibm.com/support/home/search-results/1060/system_storage/storage_software/software_defined_storage/ibm_spectrum_scale?filter=DC.Type_avl:CT792,CT555,CT755=-dcdate_sortrange=fab

For any other content which you got in the notification, please check this
link:
https://www.ibm.com/support/home/search-results/1060/IBM_Spectrum_Scale?docOnly=true=-dcdate_sortrange=rc

Regards, The Spectrum Scale (GPFS) team

--

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:   John Hearns 
To: gpfsug main discussion list 
Date:   02/21/2018 05:28 PM
Subject:[gpfsug-discuss] Finding all bulletins and APARs
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Firstly, let me apologise for not thanking people who hav ereplied to me on
this list with help.
I have indeed replied and thanked you – however the list software has taken
a dislike to my email address.

I am currently on the myibm support site. I am looking for a specific APAR
on Spectrum Scale.
However I want to be able to get a list of all APARs and bulletins for
Spectrum Scale, right up to date.
I do get email alerts but somehow I suspect I am not getting them all, and
it is a pain to search back in your email.

Thanks
John H



-- The information contained in this communication and any attachments is
confidential and may be privileged, and is for the sole use of the intended
recipient(s). Any unauthorized review, use, disclosure or distribution is
prohibited. Unless explicitly stated otherwise in the body of this
communication or the attachment thereto (if any), the information is
provided on an AS-IS basis without any express or implied warranties or
liabilities. To the extent you are relying on this information, you are
doing so at your own risk. If you are not the intended recipient, please
notify the sender immediately by replying to this message and destroy all
copies of this message and any attachments. Neither the sender nor the
company/group of companies he or she represents shall be liable for the
proper and complete transmission of the information contained in this
communication, or for any delay in its receipt.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=IbxtjdkPAM2Sbon4Lbbi4w=v0fVzSMP-N6VctcEcAQKTLJlrvu0WUry8rSo41ia-mY=_zoOdAst7NdP-PByM7WrniXyNLofARAf9hayK0BF5rU=



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss