Re: nfstimeout on server ISILON storage

2018-09-12 Thread Grant Street
ISP is not always the best when it comes to parallelization, and it needs a 
helping hand
- Put the DB on the best flash you can afford. It has increasingly become our 
bottleneck particularly when backing up LOTS of small files
- Break up the backup jobs into LOTS of sessions using proxy nodes,  number of 
dsmc clients and resource utilisation
- if you are writing the backups to afile storage pool on the isilon use 
multiple NFS mounts and multiple isilon nodes.
- increase the maximum number of mount points for ISP and the storage pool so 
that it is larger than the total number of sessions ie #proxy_nodes * 
#dsmc_proceses * resource_utilisation
- decrease the size of the file pool volumes so that  there can be at LEAST the 
same number as the number of mount points.
- check your client networking options and kernel settings

That's all I can think of at the moment

HTH

Grant

From: ADSM: Dist Stor Manager  on behalf of Zoltan Forray 

Sent: Friday, 7 September 2018 4:22 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] nfstimeout on server ISILON storage

>>> Are the timeouts repeatable enough that you can get a packet capture
in there before and while they're happening?

They happen often/sometimes all-the-time if there is any kind of
storagepool activity.  Looking through /var/log/messages - it happened
almost every 5-minutes starting before from before 8pm yesterday and
stopped around 3am.  Looking through the ISP server logs I see reclaims
ending around the time the messages stopped.  Before that there were
Identify/dedupe processes, a DB backup (upstream to one of the ISP servers
at my physical location.  The Earth server is offsite used solely for DB
backups and replication target).

As my SAN person said, maybe we are expecting too much from the ISILON/NFS.
Unfortunately, it was/is the cheapest solution since I need the 500TB
(almost always at 90% used even with dedup).

We have been working with networking since we are also addressing the issue
of seeing lots of completely unrelated TCP traffic/broadcasts on the same
VLAN as the NFS storage.  However, a few days ago they moved it to a new
VLAN and the extraneous
noise" has stopped.

On Wed, Sep 5, 2018 at 7:29 PM Skylar Thompson  wrote:

> Yep, you're right, I misread that (shouldn't send email pre-coffee).
>
> Are the timeouts repeatable enough that you can get a packet capture in
> there before and while they're happening?
>
> On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote:
> > Skylar,
> >
> > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got
> > this feedback:
> >
> > I assume what they are talking about is this:
> >
> > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,
> > *mountproto=udp*,local_lock=none,addr=192.168.19.12)
> >
> > Looks like this is the default setting (also on all the other servers to
> > initiate a conversation with the NFS server). However, if you read the
> > documentation on this option it goes into detail about how this option
> > differs from proto (which is also defined):
> >
> > https://clicktime.symantec.com/a/1/_WRrK8Ud1QlbS4lMAmGly9__1m2hrzx-E5Do8uVTOJQ=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D=https%3A%2F%2Faccess.redhat.com%2Fsolutions%2F183583
> >
> > "mountproto differs from proto as it defines what protocol (TCP or UDP)
> the
> > client will use to initiate the connection and conduct the mount and
> > umountoperations.
> > This differs from the proto option which sets the protocol that the
> initial
> > connection *and* the actual transportation will use."
> >
> > The proto option (set to TCP in the mount) appears to be determining how
> > the actual connection and transport of data is conducted.
> >
> > When running a tcpdump on Earth I see NFS TCP traffic running over the 23
> > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak
> of.
> >
> > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson  wrote:
> >
> > > It looks like you're using UDP as a transport - have you tried
> switching to
> > > TCP? Especially with large NFS payload sizes, you're going to get lots
> of
> >

Re: nfstimeout on server ISILON storage

2018-09-12 Thread Grant Street
If any one is getting slow backup performance either
 - you are not using it right
 - a NAS is not right for your workflow

We are able to archive 50+TB of source data that is written to both onsite and 
offsite tape per day.  We have a method of scaling that further but that is 
sufficient for us at the moment.  Our biggest bottleneck is the TSM database, 
not how fast we can get the data off the storage.

Slight corrections
- "each client can only talk to one isilon node  PER MOUNT"
- NFS and isilon are not slow nor are they sequential
- It is the lack of scalable multithreading in the TSM agent that makes it slow 
and cumbersome, not isilon nor NFS
- It is the lack of snapshot/snapdiff aware backups in the TSM agent that make 
complete back ups happen in an "inefficient way"
- Isilon is a scalable NAS that can be very fast. Being a NAS it has 
restrictions in the latencies of TCP networking. If your after storage that is 
faster than what Network speeds/throughputs  can provide you should be looking 
at other storage solutions.

If anyone would like further clarification on these points, only happy to help 
give you more information or experience

Grant


From: ADSM: Dist Stor Manager  on behalf of Frank Kraemer 

Sent: Friday, 7 September 2018 7:50 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] nfstimeout on server ISILON storage

Isilon = Slow Performance

- Although a parallel filesystem inside (OneFS), each client node can only
talk to a single Isilon node using standard NAS protocols, which then
performs
 parallel I/O across the internal high speed IB network to other Storage
Nodes.

- NFS Client nodes (=TSM Server) have to use slow non-parallel data access
over Ethernet to the Isilon. NFS v3 is technology from 1986 - designed with
networks in mind of that time

- No direct client IB or high-speed network I/O with RDMA enabled to
support so single client performance is poor in comparison to other real
filesystems that scale.

- Multiple NFS mounts from the same client (TSM Server) to the Isilon box
can help a little but the setup is clumsy and this is not real parallel I/O
- it's a hack! Still slow.

- "Magic tools" like dsmisi from (?) can NOT fix this problem, they just
hide the multiple NFS mount mess a little bit and cost way to much money.

- For backups were large I/O are the norm; NFS is the most inefficient way
of using your resources.

- Get a real scalable filesystem, use a single mountpoint and drive your
networks with optimal I/O speed.

-frank-

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Am Weiher 24, 65451 Kelsterbach
mailto:kraem...@de.ibm.com
voice: +49-(0)171-3043699 / +4970342741078
IBM Germany
--
Grant Street
Senior Systems Engineer

T: +61 2 9383 4800 (main)
D: +61 2 8310 3582 (direct)
E: grant.str...@al.com.au

Building 54 / FSA #19, Fox Studios Australia, 38 Driver Avenue
Moore Park, NSW 2021
AUSTRALIA

  [LinkedIn] <https://www.linkedin.com/company/animal-logic>   [Facebook] 
<https://www.facebook.com/Animal-Logic-129284263808191/>   [Twitter] 
<https://twitter.com/AnimalLogic>   [Instagram] 
<https://www.instagram.com/animallogicstudios/>

[Animal Logic]<http://www.animallogic.com>

www.animallogic.com<http://www.animallogic.com>

CONFIDENTIALITY AND PRIVILEGE NOTICE
This email is intended only to be read or used by the addressee. It is 
confidential and may contain privileged information. If you are not the 
intended recipient, any use, distribution, disclosure or copying of this email 
is strictly prohibited. Confidentiality and legal privilege attached to this 
communication are not waived or lost by reason of the mistaken delivery to you. 
If you have received this email in error, please delete it and notify us 
immediately by telephone or email.


Re: nfstimeout on server ISILON storage

2018-09-06 Thread Frank Kraemer
Isilon = Slow Performance

- Although a parallel filesystem inside (OneFS), each client node can only
talk to a single Isilon node using standard NAS protocols, which then
performs
 parallel I/O across the internal high speed IB network to other Storage
Nodes.

- NFS Client nodes (=TSM Server) have to use slow non-parallel data access
over Ethernet to the Isilon. NFS v3 is technology from 1986 - designed with
networks in mind of that time

- No direct client IB or high-speed network I/O with RDMA enabled to
support so single client performance is poor in comparison to other real
filesystems that scale.

- Multiple NFS mounts from the same client (TSM Server) to the Isilon box
can help a little but the setup is clumsy and this is not real parallel I/O
- it's a hack! Still slow.

- "Magic tools" like dsmisi from (?) can NOT fix this problem, they just
hide the multiple NFS mount mess a little bit and cost way to much money.

- For backups were large I/O are the norm; NFS is the most inefficient way
of using your resources.

- Get a real scalable filesystem, use a single mountpoint and drive your
networks with optimal I/O speed.

-frank-

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Am Weiher 24, 65451 Kelsterbach
mailto:kraem...@de.ibm.com
voice: +49-(0)171-3043699 / +4970342741078
IBM Germany


Re: nfstimeout on server ISILON storage

2018-09-06 Thread Zoltan Forray
>>> Are the timeouts repeatable enough that you can get a packet capture
in there before and while they're happening?

They happen often/sometimes all-the-time if there is any kind of
storagepool activity.  Looking through /var/log/messages - it happened
almost every 5-minutes starting before from before 8pm yesterday and
stopped around 3am.  Looking through the ISP server logs I see reclaims
ending around the time the messages stopped.  Before that there were
Identify/dedupe processes, a DB backup (upstream to one of the ISP servers
at my physical location.  The Earth server is offsite used solely for DB
backups and replication target).

As my SAN person said, maybe we are expecting too much from the ISILON/NFS.
Unfortunately, it was/is the cheapest solution since I need the 500TB
(almost always at 90% used even with dedup).

We have been working with networking since we are also addressing the issue
of seeing lots of completely unrelated TCP traffic/broadcasts on the same
VLAN as the NFS storage.  However, a few days ago they moved it to a new
VLAN and the extraneous
noise" has stopped.

On Wed, Sep 5, 2018 at 7:29 PM Skylar Thompson  wrote:

> Yep, you're right, I misread that (shouldn't send email pre-coffee).
>
> Are the timeouts repeatable enough that you can get a packet capture in
> there before and while they're happening?
>
> On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote:
> > Skylar,
> >
> > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got
> > this feedback:
> >
> > I assume what they are talking about is this:
> >
> > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,
> > *mountproto=udp*,local_lock=none,addr=192.168.19.12)
> >
> > Looks like this is the default setting (also on all the other servers to
> > initiate a conversation with the NFS server). However, if you read the
> > documentation on this option it goes into detail about how this option
> > differs from proto (which is also defined):
> >
> > https://access.redhat.com/solutions/183583
> >
> > "mountproto differs from proto as it defines what protocol (TCP or UDP)
> the
> > client will use to initiate the connection and conduct the mount and
> > umountoperations.
> > This differs from the proto option which sets the protocol that the
> initial
> > connection *and* the actual transportation will use."
> >
> > The proto option (set to TCP in the mount) appears to be determining how
> > the actual connection and transport of data is conducted.
> >
> > When running a tcpdump on Earth I see NFS TCP traffic running over the 23
> > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak
> of.
> >
> > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson  wrote:
> >
> > > It looks like you're using UDP as a transport - have you tried
> switching to
> > > TCP? Especially with large NFS payload sizes, you're going to get lots
> of
> > > fragmentation with UDP's 512-byte packet limit.
> > >
> > > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote:
> > > > A pair of 10G links bonded - CISCO switches.
> > > >
> > > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson 
> wrote:
> > > >
> > > > > Quick question - what's the data link protocol (Ethernet, IB,
> etc.) and
> > > > > link rate
> > > > > that you're using?
> > > > >
> > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > > > > We are still fighting issues with ISILON storage. Our current
> issue
> > > is
> > > > > with
> > > > > > NFS timeouts for the storage a server is using.  We see message
> like
> > > > > these
> > > > > > in the server /var/log
> > > > > >
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:14 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:15 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > >
> > > > > > OS folks say the NFS mount is setup as IBM recommends in various
> > 

Re: [EXTERNAL] Re: nfstimeout on server ISILON storage

2018-09-06 Thread Rick Adamson
Zoltan,
Here is the info on the LACP issue. I recently ran into it during a 
firmware/OneFS upgrade.
Be advised that there is a fix for it, it may be helpful to reference the topic 
in OneFS release notes. This example is 8.0.0.7: 
https://emcservice.force.com/CustomersPartners/kA5f100L0KCCA0  .

In OneFS 8.0.0.6, 8.0.1.2, 8.1.0.2, and 8.1.1.1, when some of the ports of a 
lagg interface in LACP mode go down with other ports still active, OneFS treats 
the lagg interface as down with No Carrier status.  If this lagg is the only 
interface for front end network, connection to the node from outside the 
cluster will be lost.

These are the possible options for this upgrade:

1.  Postpone the upgrade until the next Maintenance Release is available, which 
will resolve this issue.

2.  Remove LACP as an interim step and proceed with the scheduled upgrade. A 
patch will be made available in the near future at which time you can re-enable 
LACP.

3.   Proceed with the upgrade inclusive of LACP configuration, taking into 
consideration the potential connectivity issues, being aware of the risk of 
DU... not recommended.

I initially put the upgrade on hold and they recently notified me that code 
addressing the issue had been released, we completed the upgrade and so far 
there have been no issues.
If your Isilon is vulnerable to the issue it should surface in the pre-upgrade 
assessment.

Rick Adamson

Information Technology
Southeastern Grocers LLC


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Zoltan Forray
Sent: Wednesday, September 5, 2018 3:07 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] [EXTERNAL] Re: nfstimeout on server ISILON storage

* This email originated outside of the organization. Use caution when opening 
attachments or clicking links. *

--
Rick,

Thanks for the reply.  I passed your comments to my SAN guy and he said:

*Earth is connected to HHIsilon and it is running 8.1.0.4  **All the others are 
connected to ISPIsilon and it is running 8.0.0.4*

*I'm pretty sure all the Cisco switchports are LACP and Isilon networking is 
configured for LACP. Can that person elaborate on that bug?*

On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. < 
rrho...@firstenergycorp.com> wrote:

> We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover.
> When we put them in OneFS had a bug with LACP and wouldn't work,
> forcing us to go active/passive.  Due to other problems, we just
> discussed with them converting from active/passive to LACP, but EMC
> said there is still a bug in LACP support.
>
> Rick
>
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of
> Zoltan Forray
> Sent: Wednesday, September 5, 2018 9:03 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage
>
> A pair of 10G links bonded - CISCO switches.
>
> On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
>
> > Quick question - what's the data link protocol (Ethernet, IB, etc.)
> > and link rate that you're using?
> >
> > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > We are still fighting issues with ISILON storage. Our current
> > > issue is
> > with
> > > NFS timeouts for the storage a server is using.  We see message
> > > like
> > these
> > > in the server /var/log
> > >
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:14 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:15 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > >
> > > OS folks say the NFS mount is setup as IBM recommends in various
> > documents.
> > > So they asked us to implement the nfstimeout option from this
> > > document
> (
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=

Re: [EXTERNAL] Re: nfstimeout on server ISILON storage

2018-09-06 Thread Rhodes, Richard L.
I'm not sure of the specifics, and our isilon person is out on vacation this 
week.  Sorry.



-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Zoltan Forray
Sent: Wednesday, September 5, 2018 3:07 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [EXTERNAL] Re: nfstimeout on server ISILON storage

Rick,

Thanks for the reply.  I passed your comments to my SAN guy and he said:

*Earth is connected to HHIsilon and it is running 8.1.0.4  **All the others
are connected to ISPIsilon and it is running 8.0.0.4*

*I'm pretty sure all the Cisco switchports are LACP and Isilon networking
is configured for LACP. Can that person elaborate on that bug?*

On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. <
rrho...@firstenergycorp.com> wrote:

> We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover.
> When we put them in OneFS had a bug with LACP and wouldn't work, forcing us
> to go active/passive.  Due to other problems, we just discussed with them
> converting from active/passive to LACP, but EMC said there is still a bug
> in LACP support.
>
> Rick
>
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Zoltan
> Forray
> Sent: Wednesday, September 5, 2018 9:03 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage
>
> A pair of 10G links bonded - CISCO switches.
>
> On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
>
> > Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> > link rate
> > that you're using?
> >
> > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > We are still fighting issues with ISILON storage. Our current issue is
> > with
> > > NFS timeouts for the storage a server is using.  We see message like
> > these
> > > in the server /var/log
> > >
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:14 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:15 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > >
> > > OS folks say the NFS mount is setup as IBM recommends in various
> > documents.
> > > So they asked us to implement the nfstimeout option from this document
> (
> > >
> >
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> > ).
> > > Yes I realize it is primarily for a client backup of an NFS mount, but
> > the
> > > statement:
> > >
> > > Supported Clients This option is for all UNIX and Linux clients. *The
> > > server can also define this option*.
> > >
> > > throws us - kind-of implying I can use this from the server
> perspective?
> > > But I can't find any documentation to support using it from the server.
> > >
> > > For you Linux guru's - this is what the mount says:
> > >
> > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > >
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > >
> > > Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
> > >
> > > My OS person also asks why ISP is so slow to write to NFS?  When they
> > did a
> > > test copy of a large file to the NFS mount, they were getting upwards
> of
> > 8G/s
> > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > V

Re: nfstimeout on server ISILON storage

2018-09-05 Thread Skylar Thompson
Yep, you're right, I misread that (shouldn't send email pre-coffee).

Are the timeouts repeatable enough that you can get a packet capture in
there before and while they're happening?

On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote:
> Skylar,
>
> I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got
> this feedback:
>
> I assume what they are talking about is this:
>
> hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,
> *mountproto=udp*,local_lock=none,addr=192.168.19.12)
>
> Looks like this is the default setting (also on all the other servers to
> initiate a conversation with the NFS server). However, if you read the
> documentation on this option it goes into detail about how this option
> differs from proto (which is also defined):
>
> https://access.redhat.com/solutions/183583
>
> "mountproto differs from proto as it defines what protocol (TCP or UDP) the
> client will use to initiate the connection and conduct the mount and
> umountoperations.
> This differs from the proto option which sets the protocol that the initial
> connection *and* the actual transportation will use."
>
> The proto option (set to TCP in the mount) appears to be determining how
> the actual connection and transport of data is conducted.
>
> When running a tcpdump on Earth I see NFS TCP traffic running over the 23
> VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak of.
>
> On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson  wrote:
>
> > It looks like you're using UDP as a transport - have you tried switching to
> > TCP? Especially with large NFS payload sizes, you're going to get lots of
> > fragmentation with UDP's 512-byte packet limit.
> >
> > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote:
> > > A pair of 10G links bonded - CISCO switches.
> > >
> > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
> > >
> > > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> > > > link rate
> > > > that you're using?
> > > >
> > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > > > We are still fighting issues with ISILON storage. Our current issue
> > is
> > > > with
> > > > > NFS timeouts for the storage a server is using.  We see message like
> > > > these
> > > > > in the server /var/log
> > > > >
> > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:22:14 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:22:15 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > > not responding, still trying
> > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > OK
> > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > OK
> > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > hhisilonnfs23.rams.adp.vcu.edu
> > > > OK
> > > > >
> > > > > OS folks say the NFS mount is setup as IBM recommends in various
> > > > documents.
> > > > > So they asked us to implement the nfstimeout option from this
> > document (
> > > > >
> > > >
> > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> > > > ).
> > > > > Yes I realize it is primarily for a client backup of an NFS mount,
> > but
> > > > the
> > > > > statement:
> > > > >
> > > > > Supported Clients This option is for all UNIX and Linux clients. *The
> > > > > server can also define this option*.
> > > > >
> > > > > throws us - kind-of implying I can use this from the server
> > perspective?
> > > > > But I can't find any documentation to support using it from the
> > server.
> > > > >
> > > > > For you Linux guru's - this is what the mount says:
> > > > >
> > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > > > >
> > > >
> > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > > > >
> > > > > Any thoughts?  Suggestion?   Are we simply expecting too much from
> > NFS?
> > > > >
> > > > > My OS person also asks why ISP is so slow to write to NFS?  When they
> > > > did a
> > > > > test copy of a large file to the NFS mount, they were getting
> > upwards of
> > > > 

Re: nfstimeout on server ISILON storage

2018-09-05 Thread Zoltan Forray
Skylar,

I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got
this feedback:

I assume what they are talking about is this:

hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
(rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,
*mountproto=udp*,local_lock=none,addr=192.168.19.12)

Looks like this is the default setting (also on all the other servers to
initiate a conversation with the NFS server). However, if you read the
documentation on this option it goes into detail about how this option
differs from proto (which is also defined):

https://access.redhat.com/solutions/183583

"mountproto differs from proto as it defines what protocol (TCP or UDP) the
client will use to initiate the connection and conduct the mount and
umountoperations.
This differs from the proto option which sets the protocol that the initial
connection *and* the actual transportation will use."

The proto option (set to TCP in the mount) appears to be determining how
the actual connection and transport of data is conducted.

When running a tcpdump on Earth I see NFS TCP traffic running over the 23
VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak of.

On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson  wrote:

> It looks like you're using UDP as a transport - have you tried switching to
> TCP? Especially with large NFS payload sizes, you're going to get lots of
> fragmentation with UDP's 512-byte packet limit.
>
> On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote:
> > A pair of 10G links bonded - CISCO switches.
> >
> > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
> >
> > > Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> > > link rate
> > > that you're using?
> > >
> > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > > We are still fighting issues with ISILON storage. Our current issue
> is
> > > with
> > > > NFS timeouts for the storage a server is using.  We see message like
> > > these
> > > > in the server /var/log
> > > >
> > > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:22:14 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:22:15 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > > not responding, still trying
> > > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > OK
> > > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > OK
> > > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > OK
> > > >
> > > > OS folks say the NFS mount is setup as IBM recommends in various
> > > documents.
> > > > So they asked us to implement the nfstimeout option from this
> document (
> > > >
> > >
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> > > ).
> > > > Yes I realize it is primarily for a client backup of an NFS mount,
> but
> > > the
> > > > statement:
> > > >
> > > > Supported Clients This option is for all UNIX and Linux clients. *The
> > > > server can also define this option*.
> > > >
> > > > throws us - kind-of implying I can use this from the server
> perspective?
> > > > But I can't find any documentation to support using it from the
> server.
> > > >
> > > > For you Linux guru's - this is what the mount says:
> > > >
> > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > > >
> > >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > > >
> > > > Any thoughts?  Suggestion?   Are we simply expecting too much from
> NFS?
> > > >
> > > > My OS person also asks why ISP is so slow to write to NFS?  When they
> > > did a
> > > > test copy of a large file to the NFS mount, they were getting
> upwards of
> > > 8G/s
> > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> > > >
> > > > --
> > > > *Zoltan Forray*
> > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > Xymon Monitor Administrator
> > > > VMware Administrator
> > > > Virginia Commonwealth University
> > > > UCC/Office of Technology Services
> > > > www.ucc.vcu.edu
> > > > zfor...@vcu.edu - 804-828-4807
> > > > Don't be a phishing victim - VCU and other reputable 

Re: [EXTERNAL] Re: nfstimeout on server ISILON storage

2018-09-05 Thread Zoltan Forray
Rick,

Thanks for the reply.  I passed your comments to my SAN guy and he said:

*Earth is connected to HHIsilon and it is running 8.1.0.4  **All the others
are connected to ISPIsilon and it is running 8.0.0.4*

*I'm pretty sure all the Cisco switchports are LACP and Isilon networking
is configured for LACP. Can that person elaborate on that bug?*

On Wed, Sep 5, 2018 at 12:00 PM Rhodes, Richard L. <
rrho...@firstenergycorp.com> wrote:

> We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover.
> When we put them in OneFS had a bug with LACP and wouldn't work, forcing us
> to go active/passive.  Due to other problems, we just discussed with them
> converting from active/passive to LACP, but EMC said there is still a bug
> in LACP support.
>
> Rick
>
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Zoltan
> Forray
> Sent: Wednesday, September 5, 2018 9:03 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage
>
> A pair of 10G links bonded - CISCO switches.
>
> On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
>
> > Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> > link rate
> > that you're using?
> >
> > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > We are still fighting issues with ISILON storage. Our current issue is
> > with
> > > NFS timeouts for the storage a server is using.  We see message like
> > these
> > > in the server /var/log
> > >
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:14 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:15 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server
> hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > >
> > > OS folks say the NFS mount is setup as IBM recommends in various
> > documents.
> > > So they asked us to implement the nfstimeout option from this document
> (
> > >
> >
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> > ).
> > > Yes I realize it is primarily for a client backup of an NFS mount, but
> > the
> > > statement:
> > >
> > > Supported Clients This option is for all UNIX and Linux clients. *The
> > > server can also define this option*.
> > >
> > > throws us - kind-of implying I can use this from the server
> perspective?
> > > But I can't find any documentation to support using it from the server.
> > >
> > > For you Linux guru's - this is what the mount says:
> > >
> > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > >
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > >
> > > Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
> > >
> > > My OS person also asks why ISP is so slow to write to NFS?  When they
> > did a
> > > test copy of a large file to the NFS mount, they were getting upwards
> of
> > 8G/s
> > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zfor...@vcu.edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations will
> > > never use email to request that you reply with your pa

Re: [EXTERNAL] Re: nfstimeout on server ISILON storage

2018-09-05 Thread Rhodes, Richard L.
We run our Isilon systems (OneFS 8.0.0.6) with active/passive failover.  When 
we put them in OneFS had a bug with LACP and wouldn't work, forcing us to go 
active/passive.  Due to other problems, we just discussed with them converting 
from active/passive to LACP, but EMC said there is still a bug in LACP support. 
 

Rick



-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Zoltan Forray
Sent: Wednesday, September 5, 2018 9:03 AM
To: ADSM-L@VM.MARIST.EDU
Subject: [EXTERNAL] Re: nfstimeout on server ISILON storage

A pair of 10G links bonded - CISCO switches.

On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:

> Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> link rate
> that you're using?
>
> On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > We are still fighting issues with ISILON storage. Our current issue is
> with
> > NFS timeouts for the storage a server is using.  We see message like
> these
> > in the server /var/log
> >
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> >
> > OS folks say the NFS mount is setup as IBM recommends in various
> documents.
> > So they asked us to implement the nfstimeout option from this document (
> >
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> ).
> > Yes I realize it is primarily for a client backup of an NFS mount, but
> the
> > statement:
> >
> > Supported Clients This option is for all UNIX and Linux clients. *The
> > server can also define this option*.
> >
> > throws us - kind-of implying I can use this from the server perspective?
> > But I can't find any documentation to support using it from the server.
> >
> > For you Linux guru's - this is what the mount says:
> >
> > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> >
> > Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
> >
> > My OS person also asks why ISP is so slow to write to NFS?  When they
> did a
> > test copy of a large file to the NFS mount, they were getting upwards of
> 8G/s
> > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zfor...@vcu.edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://phishing.vcu.edu/
>
> --
> -- Skylar Thompson (skyl...@u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
>


--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/
--

The information contained in t

Re: nfstimeout on server ISILON storage

2018-09-05 Thread Skylar Thompson
It looks like you're using UDP as a transport - have you tried switching to
TCP? Especially with large NFS payload sizes, you're going to get lots of
fragmentation with UDP's 512-byte packet limit.

On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote:
> A pair of 10G links bonded - CISCO switches.
>
> On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:
>
> > Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> > link rate
> > that you're using?
> >
> > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > We are still fighting issues with ISILON storage. Our current issue is
> > with
> > > NFS timeouts for the storage a server is using.  We see message like
> > these
> > > in the server /var/log
> > >
> > > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > > not responding, still trying
> > > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > OK
> > >
> > > OS folks say the NFS mount is setup as IBM recommends in various
> > documents.
> > > So they asked us to implement the nfstimeout option from this document (
> > >
> > https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> > ).
> > > Yes I realize it is primarily for a client backup of an NFS mount, but
> > the
> > > statement:
> > >
> > > Supported Clients This option is for all UNIX and Linux clients. *The
> > > server can also define this option*.
> > >
> > > throws us - kind-of implying I can use this from the server perspective?
> > > But I can't find any documentation to support using it from the server.
> > >
> > > For you Linux guru's - this is what the mount says:
> > >
> > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > >
> > (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > >
> > > Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
> > >
> > > My OS person also asks why ISP is so slow to write to NFS?  When they
> > did a
> > > test copy of a large file to the NFS mount, they were getting upwards of
> > 8G/s
> > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zfor...@vcu.edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations will
> > > never use email to request that you reply with your password, social
> > > security number or confidential personal information. For more details
> > > visit http://phishing.vcu.edu/
> >
> > --
> > -- Skylar Thompson (skyl...@u.washington.edu)
> > -- Genome Sciences Department, System Administrator
> > -- Foege Building S046, (206)-685-7354
> > -- University of Washington School of Medicine
> >
>
>
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zfor...@vcu.edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://phishing.vcu.edu/

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine


Re: nfstimeout on server ISILON storage

2018-09-05 Thread Zoltan Forray
A pair of 10G links bonded - CISCO switches.

On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson  wrote:

> Quick question - what's the data link protocol (Ethernet, IB, etc.) and
> link rate
> that you're using?
>
> On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > We are still fighting issues with ISILON storage. Our current issue is
> with
> > NFS timeouts for the storage a server is using.  We see message like
> these
> > in the server /var/log
> >
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> > not responding, still trying
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> > Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> OK
> >
> > OS folks say the NFS mount is setup as IBM recommends in various
> documents.
> > So they asked us to implement the nfstimeout option from this document (
> >
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html
> ).
> > Yes I realize it is primarily for a client backup of an NFS mount, but
> the
> > statement:
> >
> > Supported Clients This option is for all UNIX and Linux clients. *The
> > server can also define this option*.
> >
> > throws us - kind-of implying I can use this from the server perspective?
> > But I can't find any documentation to support using it from the server.
> >
> > For you Linux guru's - this is what the mount says:
> >
> > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> >
> > Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
> >
> > My OS person also asks why ISP is so slow to write to NFS?  When they
> did a
> > test copy of a large file to the NFS mount, they were getting upwards of
> 8G/s
> > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zfor...@vcu.edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://phishing.vcu.edu/
>
> --
> -- Skylar Thompson (skyl...@u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
>


--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/


Re: nfstimeout on server ISILON storage

2018-09-04 Thread Skylar Thompson
Quick question - what's the data link protocol (Ethernet, IB, etc.) and link 
rate
that you're using?

On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> We are still fighting issues with ISILON storage. Our current issue is with
> NFS timeouts for the storage a server is using.  We see message like these
> in the server /var/log
>
> Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
> not responding, still trying
> Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK
> Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK
> Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK
>
> OS folks say the NFS mount is setup as IBM recommends in various documents.
> So they asked us to implement the nfstimeout option from this document (
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html).
> Yes I realize it is primarily for a client backup of an NFS mount, but the
> statement:
>
> Supported Clients This option is for all UNIX and Linux clients. *The
> server can also define this option*.
>
> throws us - kind-of implying I can use this from the server perspective?
> But I can't find any documentation to support using it from the server.
>
> For you Linux guru's - this is what the mount says:
>
> hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
>
> Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?
>
> My OS person also asks why ISP is so slow to write to NFS?  When they did a
> test copy of a large file to the NFS mount, they were getting upwards of 8G/s
> vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
>
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zfor...@vcu.edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://phishing.vcu.edu/

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine


nfstimeout on server ISILON storage

2018-09-04 Thread Zoltan Forray
We are still fighting issues with ISILON storage. Our current issue is with
NFS timeouts for the storage a server is using.  We see message like these
in the server /var/log

Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:21:49 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:22:14 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:22:15 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu
not responding, still trying
Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK
Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK
Sep  4 13:22:16 earth kernel: nfs: server hhisilonnfs23.rams.adp.vcu.edu OK

OS folks say the NFS mount is setup as IBM recommends in various documents.
So they asked us to implement the nfstimeout option from this document (
https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.0/com.ibm.itsm.client.doc/r_opt_nfstimeout.html).
Yes I realize it is primarily for a client backup of an NFS mount, but the
statement:

Supported Clients This option is for all UNIX and Linux clients. *The
server can also define this option*.

throws us - kind-of implying I can use this from the server perspective?
But I can't find any documentation to support using it from the server.

For you Linux guru's - this is what the mount says:

hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
(rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)

Any thoughts?  Suggestion?   Are we simply expecting too much from NFS?

My OS person also asks why ISP is so slow to write to NFS?  When they did a
test copy of a large file to the NFS mount, they were getting upwards of 8G/s
vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).

--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/