Re: nfstimeout on server ISILON storage
ISP is not always the best when it comes to parallelization, and it needs a helping hand - Put the DB on the best flash you can afford. It has increasingly become our bottleneck particularly when backing up LOTS of small files - Break up the backup jobs into LOTS of sessions using proxy nodes, number of dsmc clients and resource utilisation - if you are writing the backups to afile storage pool on the isilon use multiple NFS mounts and multiple isilon nodes. - increase the maximum number of mount points for ISP and the storage pool so that it is larger than the total number of sessions ie #proxy_nodes * #dsmc_proceses * resource_utilisation - decrease the size of the file pool volumes so that there can be at LEAST the same number as the number of mount points. - check your client networking options and kernel settings That's all I can think of at the moment HTH Grant From: ADSM: Dist Stor Manager on behalf of Zoltan Forray Sent: Friday, 7 September 2018 4:22 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] nfstimeout on server ISILON storage >>> Are the timeouts repeatable enough that you can get a packet capture in there before and while they're happening? They happen often/sometimes all-the-time if there is any kind of storagepool activity. Looking through /var/log/messages - it happened almost every 5-minutes starting before from before 8pm yesterday and stopped around 3am. Looking through the ISP server logs I see reclaims ending around the time the messages stopped. Before that there were Identify/dedupe processes, a DB backup (upstream to one of the ISP servers at my physical location. The Earth server is offsite used solely for DB backups and replication target). As my SAN person said, maybe we are expecting too much from the ISILON/NFS. Unfortunately, it was/is the cheapest solution since I need the 500TB (almost always at 90% used even with dedup). We have been working with networking since we are also addressing the issue of seeing lots of completely unrelated TCP traffic/broadcasts on the same VLAN as the NFS storage. However, a few days ago they moved it to a new VLAN and the extraneous noise" has stopped. On Wed, Sep 5, 2018 at 7:29 PM Skylar Thompson wrote: > Yep, you're right, I misread that (shouldn't send email pre-coffee). > > Are the timeouts repeatable enough that you can get a packet capture in > there before and while they're happening? > > On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote: > > Skylar, > > > > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got > > this feedback: > > > > I assume what they are talking about is this: > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300, > > *mountproto=udp*,local_lock=none,addr=192.168.19.12) > > > > Looks like this is the default setting (also on all the other servers to > > initiate a conversation with the NFS server). However, if you read the > > documentation on this option it goes into detail about how this option > > differs from proto (which is also defined): > > > > https://clicktime.symantec.com/a/1/_WRrK8Ud1QlbS4lMAmGly9__1m2hrzx-E5Do8uVTOJQ=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D=https%3A%2F%2Faccess.redhat.com%2Fsolutions%2F183583 > > > > "mountproto differs from proto as it defines what protocol (TCP or UDP) > the > > client will use to initiate the connection and conduct the mount and > > umountoperations. > > This differs from the proto option which sets the protocol that the > initial > > connection *and* the actual transportation will use." > > > > The proto option (set to TCP in the mount) appears to be determining how > > the actual connection and transport of data is conducted. > > > > When running a tcpdump on Earth I see NFS TCP traffic running over the 23 > > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak > of. > > > > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson wrote: > > > > > It looks like you're using UDP as a transport - have you tried > switching to > > > TCP? Especially with large NFS payload sizes, you're going to get lots > of > >
Re: nfstimeout on server ISILON storage
If any one is getting slow backup performance either - you are not using it right - a NAS is not right for your workflow We are able to archive 50+TB of source data that is written to both onsite and offsite tape per day. We have a method of scaling that further but that is sufficient for us at the moment. Our biggest bottleneck is the TSM database, not how fast we can get the data off the storage. Slight corrections - "each client can only talk to one isilon node PER MOUNT" - NFS and isilon are not slow nor are they sequential - It is the lack of scalable multithreading in the TSM agent that makes it slow and cumbersome, not isilon nor NFS - It is the lack of snapshot/snapdiff aware backups in the TSM agent that make complete back ups happen in an "inefficient way" - Isilon is a scalable NAS that can be very fast. Being a NAS it has restrictions in the latencies of TCP networking. If your after storage that is faster than what Network speeds/throughputs can provide you should be looking at other storage solutions. If anyone would like further clarification on these points, only happy to help give you more information or experience Grant From: ADSM: Dist Stor Manager on behalf of Frank Kraemer Sent: Friday, 7 September 2018 7:50 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] nfstimeout on server ISILON storage Isilon = Slow Performance - Although a parallel filesystem inside (OneFS), each client node can only talk to a single Isilon node using standard NAS protocols, which then performs parallel I/O across the internal high speed IB network to other Storage Nodes. - NFS Client nodes (=TSM Server) have to use slow non-parallel data access over Ethernet to the Isilon. NFS v3 is technology from 1986 - designed with networks in mind of that time - No direct client IB or high-speed network I/O with RDMA enabled to support so single client performance is poor in comparison to other real filesystems that scale. - Multiple NFS mounts from the same client (TSM Server) to the Isilon box can help a little but the setup is clumsy and this is not real parallel I/O - it's a hack! Still slow. - "Magic tools" like dsmisi from (?) can NOT fix this problem, they just hide the multiple NFS mount mess a little bit and cost way to much money. - For backups were large I/O are the norm; NFS is the most inefficient way of using your resources. - Get a real scalable filesystem, use a single mountpoint and drive your networks with optimal I/O speed. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraem...@de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -- Grant Street Senior Systems Engineer T: +61 2 9383 4800 (main) D: +61 2 8310 3582 (direct) E: grant.str...@al.com.au Building 54 / FSA #19, Fox Studios Australia, 38 Driver Avenue Moore Park, NSW 2021 AUSTRALIA [LinkedIn] <https://www.linkedin.com/company/animal-logic> [Facebook] <https://www.facebook.com/Animal-Logic-129284263808191/> [Twitter] <https://twitter.com/AnimalLogic> [Instagram] <https://www.instagram.com/animallogicstudios/> [Animal Logic]<http://www.animallogic.com> www.animallogic.com<http://www.animallogic.com> CONFIDENTIALITY AND PRIVILEGE NOTICE This email is intended only to be read or used by the addressee. It is confidential and may contain privileged information. If you are not the intended recipient, any use, distribution, disclosure or copying of this email is strictly prohibited. Confidentiality and legal privilege attached to this communication are not waived or lost by reason of the mistaken delivery to you. If you have received this email in error, please delete it and notify us immediately by telephone or email.
Re: nfstimeout on server ISILON storage
Isilon = Slow Performance - Although a parallel filesystem inside (OneFS), each client node can only talk to a single Isilon node using standard NAS protocols, which then performs parallel I/O across the internal high speed IB network to other Storage Nodes. - NFS Client nodes (=TSM Server) have to use slow non-parallel data access over Ethernet to the Isilon. NFS v3 is technology from 1986 - designed with networks in mind of that time - No direct client IB or high-speed network I/O with RDMA enabled to support so single client performance is poor in comparison to other real filesystems that scale. - Multiple NFS mounts from the same client (TSM Server) to the Isilon box can help a little but the setup is clumsy and this is not real parallel I/O - it's a hack! Still slow. - "Magic tools" like dsmisi from (?) can NOT fix this problem, they just hide the multiple NFS mount mess a little bit and cost way to much money. - For backups were large I/O are the norm; NFS is the most inefficient way of using your resources. - Get a real scalable filesystem, use a single mountpoint and drive your networks with optimal I/O speed. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraem...@de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany
Re: nfstimeout on server ISILON storage
>>> Are the timeouts repeatable enough that you can get a packet capture in there before and while they're happening? They happen often/sometimes all-the-time if there is any kind of storagepool activity. Looking through /var/log/messages - it happened almost every 5-minutes starting before from before 8pm yesterday and stopped around 3am. Looking through the ISP server logs I see reclaims ending around the time the messages stopped. Before that there were Identify/dedupe processes, a DB backup (upstream to one of the ISP servers at my physical location. The Earth server is offsite used solely for DB backups and replication target). As my SAN person said, maybe we are expecting too much from the ISILON/NFS. Unfortunately, it was/is the cheapest solution since I need the 500TB (almost always at 90% used even with dedup). We have been working with networking since we are also addressing the issue of seeing lots of completely unrelated TCP traffic/broadcasts on the same VLAN as the NFS storage. However, a few days ago they moved it to a new VLAN and the extraneous noise" has stopped. On Wed, Sep 5, 2018 at 7:29 PM Skylar Thompson wrote: > Yep, you're right, I misread that (shouldn't send email pre-coffee). > > Are the timeouts repeatable enough that you can get a packet capture in > there before and while they're happening? > > On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote: > > Skylar, > > > > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got > > this feedback: > > > > I assume what they are talking about is this: > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300, > > *mountproto=udp*,local_lock=none,addr=192.168.19.12) > > > > Looks like this is the default setting (also on all the other servers to > > initiate a conversation with the NFS server). However, if you read the > > documentation on this option it goes into detail about how this option > > differs from proto (which is also defined): > > > > https://access.redhat.com/solutions/183583 > > > > "mountproto differs from proto as it defines what protocol (TCP or UDP) > the > > client will use to initiate the connection and conduct the mount and > > umountoperations. > > This differs from the proto option which sets the protocol that the > initial > > connection *and* the actual transportation will use." > > > > The proto option (set to TCP in the mount) appears to be determining how > > the actual connection and transport of data is conducted. > > > > When running a tcpdump on Earth I see NFS TCP traffic running over the 23 > > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak > of. > > > > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson wrote: > > > > > It looks like you're using UDP as a transport - have you tried > switching to > > > TCP? Especially with large NFS payload sizes, you're going to get lots > of > > > fragmentation with UDP's 512-byte packet limit. > > > > > > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote: > > > > A pair of 10G links bonded - CISCO switches. > > > > > > > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson > wrote: > > > > > > > > > Quick question - what's the data link protocol (Ethernet, IB, > etc.) and > > > > > link rate > > > > > that you're using? > > > > > > > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > > > > We are still fighting issues with ISILON storage. Our current > issue > > > is > > > > > with > > > > > > NFS timeouts for the storage a server is using. We see message > like > > > > > these > > > > > > in the server /var/log > > > > > > > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:22:14 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:22:15 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > > not responding, still trying > > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > OK > > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > OK > > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > > hhisilonnfs23.rams.adp.vcu.edu > > > > > OK > > > > > > > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > >
Re: [EXTERNAL] Re: nfstimeout on server ISILON storage
Zoltan, Here is the info on the LACP issue. I recently ran into it during a firmware/OneFS upgrade. Be advised that there is a fix for it, it may be helpful to reference the topic in OneFS release notes. This example is 8.0.0.7: https://emcservice.force.com/CustomersPartners/kA5f100L0KCCA0 . In OneFS 8.0.0.6, 8.0.1.2, 8.1.0.2, and 8.1.1.1, when some of the ports of a lagg interface in LACP mode go down with other ports still active, OneFS treats the lagg interface as down with No Carrier status. If this lagg is the only interface for front end network, connection to the node from outside the cluster will be lost. These are the possible options for this upgrade: 1. Postpone the upgrade until the next Maintenance Release is available, which will resolve this issue. 2. Remove LACP as an interim step and proceed with the scheduled upgrade. A patch will be made available in the near future at which time you can re-enable LACP. 3. Proceed with the upgrade inclusive of LACP configuration, taking into consideration the potential connectivity issues, being aware of the risk of DU... not recommended. I initially put the upgrade on hold and they recently notified me that code addressing the issue had been released, we completed the upgrade and so far there have been no issues. If your Isilon is vulnerable to the issue it should surface in the pre-upgrade assessment. Rick Adamson Information Technology Southeastern Grocers LLC -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Zoltan Forray Sent: Wednesday, September 5, 2018 3:07 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] [EXTERNAL] Re: nfstimeout on server ISILON storage * This email originated outside of the organization. Use caution when opening attachments or clicking links. * -- Rick, Thanks for the reply. I passed your comments to my SAN guy and he said: *Earth is connected to HHIsilon and it is running 8.1.0.4 **All the others are connected to ISPIsilon and it is running 8.0.0.4* *I'm pretty sure all the Cisco switchports are LACP and Isilon networking is configured for LACP. Can that person elaborate on that bug?* On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. < rrho...@firstenergycorp.com> wrote: > We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover. > When we put them in OneFS had a bug with LACP and wouldn't work, > forcing us to go active/passive. Due to other problems, we just > discussed with them converting from active/passive to LACP, but EMC > said there is still a bug in LACP support. > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager On Behalf Of > Zoltan Forray > Sent: Wednesday, September 5, 2018 9:03 AM > To: ADSM-L@VM.MARIST.EDU > Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage > > A pair of 10G links bonded - CISCO switches. > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) > > and link rate that you're using? > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > We are still fighting issues with ISILON storage. Our current > > > issue is > > with > > > NFS timeouts for the storage a server is using. We see message > > > like > > these > > > in the server /var/log > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:14 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:15 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > documents. > > > So they asked us to implement the nfstimeout option from this > > > document > ( > > > > > > https://urldefense.proofpoint.com/v2/url?u=
Re: [EXTERNAL] Re: nfstimeout on server ISILON storage
I'm not sure of the specifics, and our isilon person is out on vacation this week. Sorry. -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Zoltan Forray Sent: Wednesday, September 5, 2018 3:07 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [EXTERNAL] Re: nfstimeout on server ISILON storage Rick, Thanks for the reply. I passed your comments to my SAN guy and he said: *Earth is connected to HHIsilon and it is running 8.1.0.4 **All the others are connected to ISPIsilon and it is running 8.0.0.4* *I'm pretty sure all the Cisco switchports are LACP and Isilon networking is configured for LACP. Can that person elaborate on that bug?* On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. < rrho...@firstenergycorp.com> wrote: > We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover. > When we put them in OneFS had a bug with LACP and wouldn't work, forcing us > to go active/passive. Due to other problems, we just discussed with them > converting from active/passive to LACP, but EMC said there is still a bug > in LACP support. > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager On Behalf Of Zoltan > Forray > Sent: Wednesday, September 5, 2018 9:03 AM > To: ADSM-L@VM.MARIST.EDU > Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage > > A pair of 10G links bonded - CISCO switches. > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > > link rate > > that you're using? > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > We are still fighting issues with ISILON storage. Our current issue is > > with > > > NFS timeouts for the storage a server is using. We see message like > > these > > > in the server /var/log > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:14 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:15 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > documents. > > > So they asked us to implement the nfstimeout option from this document > ( > > > > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > > ). > > > Yes I realize it is primarily for a client backup of an NFS mount, but > > the > > > statement: > > > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > > server can also define this option*. > > > > > > throws us - kind-of implying I can use this from the server > perspective? > > > But I can't find any documentation to support using it from the server. > > > > > > For you Linux guru's - this is what the mount says: > > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > > > > > My OS person also asks why ISP is so slow to write to NFS? When they > > did a > > > test copy of a large file to the NFS mount, they were getting upwards > of > > 8G/s > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator > > > VMware Administrator > > > V
Re: nfstimeout on server ISILON storage
Yep, you're right, I misread that (shouldn't send email pre-coffee). Are the timeouts repeatable enough that you can get a packet capture in there before and while they're happening? On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote: > Skylar, > > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got > this feedback: > > I assume what they are talking about is this: > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300, > *mountproto=udp*,local_lock=none,addr=192.168.19.12) > > Looks like this is the default setting (also on all the other servers to > initiate a conversation with the NFS server). However, if you read the > documentation on this option it goes into detail about how this option > differs from proto (which is also defined): > > https://access.redhat.com/solutions/183583 > > "mountproto differs from proto as it defines what protocol (TCP or UDP) the > client will use to initiate the connection and conduct the mount and > umountoperations. > This differs from the proto option which sets the protocol that the initial > connection *and* the actual transportation will use." > > The proto option (set to TCP in the mount) appears to be determining how > the actual connection and transport of data is conducted. > > When running a tcpdump on Earth I see NFS TCP traffic running over the 23 > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak of. > > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson wrote: > > > It looks like you're using UDP as a transport - have you tried switching to > > TCP? Especially with large NFS payload sizes, you're going to get lots of > > fragmentation with UDP's 512-byte packet limit. > > > > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote: > > > A pair of 10G links bonded - CISCO switches. > > > > > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > > > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > > > > link rate > > > > that you're using? > > > > > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > > > We are still fighting issues with ISILON storage. Our current issue > > is > > > > with > > > > > NFS timeouts for the storage a server is using. We see message like > > > > these > > > > > in the server /var/log > > > > > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:21:49 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:22:14 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:22:15 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > > not responding, still trying > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > OK > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > OK > > > > > Sep 4 13:22:16 earth kernel: nfs: server > > hhisilonnfs23.rams.adp.vcu.edu > > > > OK > > > > > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > > > documents. > > > > > So they asked us to implement the nfstimeout option from this > > document ( > > > > > > > > > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > > > > ). > > > > > Yes I realize it is primarily for a client backup of an NFS mount, > > but > > > > the > > > > > statement: > > > > > > > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > > > > server can also define this option*. > > > > > > > > > > throws us - kind-of implying I can use this from the server > > perspective? > > > > > But I can't find any documentation to support using it from the > > server. > > > > > > > > > > For you Linux guru's - this is what the mount says: > > > > > > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > > > > > > > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > > > > > > > Any thoughts? Suggestion? Are we simply expecting too much from > > NFS? > > > > > > > > > > My OS person also asks why ISP is so slow to write to NFS? When they > > > > did a > > > > > test copy of a large file to the NFS mount, they were getting > > upwards of > > > >
Re: nfstimeout on server ISILON storage
Skylar, I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got this feedback: I assume what they are talking about is this: hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300, *mountproto=udp*,local_lock=none,addr=192.168.19.12) Looks like this is the default setting (also on all the other servers to initiate a conversation with the NFS server). However, if you read the documentation on this option it goes into detail about how this option differs from proto (which is also defined): https://access.redhat.com/solutions/183583 "mountproto differs from proto as it defines what protocol (TCP or UDP) the client will use to initiate the connection and conduct the mount and umountoperations. This differs from the proto option which sets the protocol that the initial connection *and* the actual transportation will use." The proto option (set to TCP in the mount) appears to be determining how the actual connection and transport of data is conducted. When running a tcpdump on Earth I see NFS TCP traffic running over the 23 VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak of. On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson wrote: > It looks like you're using UDP as a transport - have you tried switching to > TCP? Especially with large NFS payload sizes, you're going to get lots of > fragmentation with UDP's 512-byte packet limit. > > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote: > > A pair of 10G links bonded - CISCO switches. > > > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > > > link rate > > > that you're using? > > > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > > We are still fighting issues with ISILON storage. Our current issue > is > > > with > > > > NFS timeouts for the storage a server is using. We see message like > > > these > > > > in the server /var/log > > > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:22:14 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:22:15 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > > not responding, still trying > > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > OK > > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > OK > > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > OK > > > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > > documents. > > > > So they asked us to implement the nfstimeout option from this > document ( > > > > > > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > > > ). > > > > Yes I realize it is primarily for a client backup of an NFS mount, > but > > > the > > > > statement: > > > > > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > > > server can also define this option*. > > > > > > > > throws us - kind-of implying I can use this from the server > perspective? > > > > But I can't find any documentation to support using it from the > server. > > > > > > > > For you Linux guru's - this is what the mount says: > > > > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > > > > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > > > > > Any thoughts? Suggestion? Are we simply expecting too much from > NFS? > > > > > > > > My OS person also asks why ISP is so slow to write to NFS? When they > > > did a > > > > test copy of a large file to the NFS mount, they were getting > upwards of > > > 8G/s > > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > > > > > -- > > > > *Zoltan Forray* > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > > Xymon Monitor Administrator > > > > VMware Administrator > > > > Virginia Commonwealth University > > > > UCC/Office of Technology Services > > > > www.ucc.vcu.edu > > > > zfor...@vcu.edu - 804-828-4807 > > > > Don't be a phishing victim - VCU and other reputable
Re: [EXTERNAL] Re: nfstimeout on server ISILON storage
Rick, Thanks for the reply. I passed your comments to my SAN guy and he said: *Earth is connected to HHIsilon and it is running 8.1.0.4 **All the others are connected to ISPIsilon and it is running 8.0.0.4* *I'm pretty sure all the Cisco switchports are LACP and Isilon networking is configured for LACP. Can that person elaborate on that bug?* On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. < rrho...@firstenergycorp.com> wrote: > We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover. > When we put them in OneFS had a bug with LACP and wouldn't work, forcing us > to go active/passive. Due to other problems, we just discussed with them > converting from active/passive to LACP, but EMC said there is still a bug > in LACP support. > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager On Behalf Of Zoltan > Forray > Sent: Wednesday, September 5, 2018 9:03 AM > To: ADSM-L@VM.MARIST.EDU > Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage > > A pair of 10G links bonded - CISCO switches. > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > > link rate > > that you're using? > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > We are still fighting issues with ISILON storage. Our current issue is > > with > > > NFS timeouts for the storage a server is using. We see message like > > these > > > in the server /var/log > > > > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:14 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:15 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server > hhisilonnfs23.rams.adp.vcu.edu > > OK > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > documents. > > > So they asked us to implement the nfstimeout option from this document > ( > > > > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > > ). > > > Yes I realize it is primarily for a client backup of an NFS mount, but > > the > > > statement: > > > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > > server can also define this option*. > > > > > > throws us - kind-of implying I can use this from the server > perspective? > > > But I can't find any documentation to support using it from the server. > > > > > > For you Linux guru's - this is what the mount says: > > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > > > > > My OS person also asks why ISP is so slow to write to NFS? When they > > did a > > > test copy of a large file to the NFS mount, they were getting upwards > of > > 8G/s > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator > > > VMware Administrator > > > Virginia Commonwealth University > > > UCC/Office of Technology Services > > > www.ucc.vcu.edu > > > zfor...@vcu.edu - 804-828-4807 > > > Don't be a phishing victim - VCU and other reputable organizations will > > > never use email to request that you reply with your pa
Re: [EXTERNAL] Re: nfstimeout on server ISILON storage
We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover. When we put them in OneFS had a bug with LACP and wouldn't work, forcing us to go active/passive. Due to other problems, we just discussed with them converting from active/passive to LACP, but EMC said there is still a bug in LACP support. Rick -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Zoltan Forray Sent: Wednesday, September 5, 2018 9:03 AM To: ADSM-L@VM.MARIST.EDU Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage A pair of 10G links bonded - CISCO switches. On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > link rate > that you're using? > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > We are still fighting issues with ISILON storage. Our current issue is > with > > NFS timeouts for the storage a server is using. We see message like > these > > in the server /var/log > > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > > > OS folks say the NFS mount is setup as IBM recommends in various > documents. > > So they asked us to implement the nfstimeout option from this document ( > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > ). > > Yes I realize it is primarily for a client backup of an NFS mount, but > the > > statement: > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > server can also define this option*. > > > > throws us - kind-of implying I can use this from the server perspective? > > But I can't find any documentation to support using it from the server. > > > > For you Linux guru's - this is what the mount says: > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > > > My OS person also asks why ISP is so slow to write to NFS? When they > did a > > test copy of a large file to the NFS mount, they were getting upwards of > 8G/s > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > -- > > *Zoltan Forray* > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > Xymon Monitor Administrator > > VMware Administrator > > Virginia Commonwealth University > > UCC/Office of Technology Services > > www.ucc.vcu.edu > > zfor...@vcu.edu - 804-828-4807 > > Don't be a phishing victim - VCU and other reputable organizations will > > never use email to request that you reply with your password, social > > security number or confidential personal information. For more details > > visit http://phishing.vcu.edu/ > > -- > -- Skylar Thompson (skyl...@u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/ -- The information contained in t
Re: nfstimeout on server ISILON storage
It looks like you're using UDP as a transport - have you tried switching to TCP? Especially with large NFS payload sizes, you're going to get lots of fragmentation with UDP's 512-byte packet limit. On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote: > A pair of 10G links bonded - CISCO switches. > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > > link rate > > that you're using? > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > > We are still fighting issues with ISILON storage. Our current issue is > > with > > > NFS timeouts for the storage a server is using. We see message like > > these > > > in the server /var/log > > > > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > > not responding, still trying > > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > OK > > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > OK > > > > > > OS folks say the NFS mount is setup as IBM recommends in various > > documents. > > > So they asked us to implement the nfstimeout option from this document ( > > > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > > ). > > > Yes I realize it is primarily for a client backup of an NFS mount, but > > the > > > statement: > > > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > > server can also define this option*. > > > > > > throws us - kind-of implying I can use this from the server perspective? > > > But I can't find any documentation to support using it from the server. > > > > > > For you Linux guru's - this is what the mount says: > > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > > > > > My OS person also asks why ISP is so slow to write to NFS? When they > > did a > > > test copy of a large file to the NFS mount, they were getting upwards of > > 8G/s > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator > > > VMware Administrator > > > Virginia Commonwealth University > > > UCC/Office of Technology Services > > > www.ucc.vcu.edu > > > zfor...@vcu.edu - 804-828-4807 > > > Don't be a phishing victim - VCU and other reputable organizations will > > > never use email to request that you reply with your password, social > > > security number or confidential personal information. For more details > > > visit http://phishing.vcu.edu/ > > > > -- > > -- Skylar Thompson (skyl...@u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > > > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > Xymon Monitor Administrator > VMware Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > www.ucc.vcu.edu > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://phishing.vcu.edu/ -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
Re: nfstimeout on server ISILON storage
A pair of 10G links bonded - CISCO switches. On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson wrote: > Quick question - what's the data link protocol (Ethernet, IB, etc.) and > link rate > that you're using? > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > > We are still fighting issues with ISILON storage. Our current issue is > with > > NFS timeouts for the storage a server is using. We see message like > these > > in the server /var/log > > > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > > not responding, still trying > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > OK > > > > OS folks say the NFS mount is setup as IBM recommends in various > documents. > > So they asked us to implement the nfstimeout option from this document ( > > > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html > ). > > Yes I realize it is primarily for a client backup of an NFS mount, but > the > > statement: > > > > Supported Clients This option is for all UNIX and Linux clients. *The > > server can also define this option*. > > > > throws us - kind-of implying I can use this from the server perspective? > > But I can't find any documentation to support using it from the server. > > > > For you Linux guru's - this is what the mount says: > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > > > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > > > My OS person also asks why ISP is so slow to write to NFS? When they > did a > > test copy of a large file to the NFS mount, they were getting upwards of > 8G/s > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > > > -- > > *Zoltan Forray* > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > Xymon Monitor Administrator > > VMware Administrator > > Virginia Commonwealth University > > UCC/Office of Technology Services > > www.ucc.vcu.edu > > zfor...@vcu.edu - 804-828-4807 > > Don't be a phishing victim - VCU and other reputable organizations will > > never use email to request that you reply with your password, social > > security number or confidential personal information. For more details > > visit http://phishing.vcu.edu/ > > -- > -- Skylar Thompson (skyl...@u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/
Re: nfstimeout on server ISILON storage
Quick question - what's the data link protocol (Ethernet, IB, etc.) and link rate that you're using? On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote: > We are still fighting issues with ISILON storage. Our current issue is with > NFS timeouts for the storage a server is using. We see message like these > in the server /var/log > > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu > not responding, still trying > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK > Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK > > OS folks say the NFS mount is setup as IBM recommends in various documents. > So they asked us to implement the nfstimeout option from this document ( > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html). > Yes I realize it is primarily for a client backup of an NFS mount, but the > statement: > > Supported Clients This option is for all UNIX and Linux clients. *The > server can also define this option*. > > throws us - kind-of implying I can use this from the server perspective? > But I can't find any documentation to support using it from the server. > > For you Linux guru's - this is what the mount says: > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) > > Any thoughts? Suggestion? Are we simply expecting too much from NFS? > > My OS person also asks why ISP is so slow to write to NFS? When they did a > test copy of a large file to the NFS mount, they were getting upwards of 8G/s > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > Xymon Monitor Administrator > VMware Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > www.ucc.vcu.edu > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://phishing.vcu.edu/ -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
nfstimeout on server ISILON storage
We are still fighting issues with ISILON storage. Our current issue is with NFS timeouts for the storage a server is using. We see message like these in the server /var/log Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu not responding, still trying Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK Sep 4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK OS folks say the NFS mount is setup as IBM recommends in various documents. So they asked us to implement the nfstimeout option from this document ( https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html). Yes I realize it is primarily for a client backup of an NFS mount, but the statement: Supported Clients This option is for all UNIX and Linux clients. *The server can also define this option*. throws us - kind-of implying I can use this from the server perspective? But I can't find any documentation to support using it from the server. For you Linux guru's - this is what the mount says: hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12) Any thoughts? Suggestion? Are we simply expecting too much from NFS? My OS person also asks why ISP is so slow to write to NFS? When they did a test copy of a large file to the NFS mount, they were getting upwards of 8G/s vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools). -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/