Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client

2018-03-10 Thread NAGY Andreas
Thanks! Please keep me updated if you find put more or when a updated version 
is available.

As I now know it is working, I will start tomorrow to build up a testsystem 
with 3 NFS servers (two of them in a ha with CARP and HAST) and several ESXi 
hosts which will all access there NFS datastores over 4 uplinks with NICs on 
different subnets.
It should always be possible to do there some testing.

andi



Von: Rick Macklem 
Gesendet: 10.03.2018 11:20 nachm.
An: NAGY Andreas; 'freebsd-stable@freebsd.org'
Betreff: Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi 
client

NAGY Andreas wrote:
>Thanks, the not issuing delegation warnings disappeared with this patch.
>
>But now there are some new warnings I haven't seen so far:
>2018-03-10T13:01:39.441Z cpu8:68046)WARNING: NFS41: NFS41FSOpGetObject:2148: 
>Failed to >get object 0x43910e71b386 [36 c6b10167 9b157f95 5aa100fb 8ffcf2c1 c 
>2 9f22ad6d 0 0 0 0 0]: >Stale file handle
I doubt these would be related to the patch. A stale FH means that the client 
tried to
access a file via its FH after it was removed. (Normally this is a client bug, 
but hopefully
not one that will cause grief.)
>These only appear several times after a the NFS share is mounted or remounted 
>after a >connection loss.
>Everything works fine, but haven't seen them till I applied the last patch.
>
>andi
Ok. Thanks for testing all of these patches. I will probably get cleaned up 
versions of
them committed in April.

The main outstanding issue is the Readdir one about directory changing too much.
Hopefully I can find out something about it via email.

Have fun with it, rick
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 23:42 +, Pete French wrote:
> > 
> > It looks like r330745 applies fine to stable-11 without any changes,
> > and there's plenty of value in testing that as well, if you're already
> > set up for that world.
> > 
> 
> Ive been running the patch from the PR in production since the original 
> bug report and it works fine. I havent looked at r330745 yes, but can 
> replace the PR patch with that and give it a whirl will take a look 
> Monday at whats possible.
> 
> -pete.
> 

I based my fix heavily on that patch from the PR, but I rewrote it
enough that I might've made any number of mistakes, so it needs fresh
testing.  The main change I made was to make it a lot less noisy while
waiting (it only mentions the wait once, unless bootverbose is set, in
which case it's once per second).  I also removed the logic that
limited the retries to nfs and zfs, because I think we can remove all
the old code related to waiting that only worked for ufs and let this
new retry be the way it waits for all filesystems.  But that's a bigger
change we can do separately; I didn't want to hold up this fix any
longer.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Pete French



It looks like r330745 applies fine to stable-11 without any changes,
and there's plenty of value in testing that as well, if you're already
set up for that world.




Ive been running the patch from the PR in production since the original 
bug report and it works fine. I havent looked at r330745 yes, but can 
replace the PR patch with that and give it a whirl will take a look 
Monday at whats possible.


-pete.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 23:08 +, Pete French wrote:
> Ah, thankyou! I haven;t run current before, but as this is such an issue 
> for us I;ll setup an Azure machine running it and have it reboot every 
> five minutes or so to check it works OK. Unfortunately the error doesnt 
> show up consisntently, as its a race condition. Will let you know if it
> fails for any reason.
> 
> -pete. [time to take a dive into the exiting world of current]

It looks like r330745 applies fine to stable-11 without any changes,
and there's plenty of value in testing that as well, if you're already
set up for that world.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Pete French
Ah, thankyou! I haven;t run current before, but as this is such an issue 
for us I;ll setup an Azure machine running it and have it reboot every 
five minutes or so to check it works OK. Unfortunately the error doesnt 
show up consisntently, as its a race condition. Will let you know if it

fails for any reason.

-pete. [time to take a dive into the exiting world of current]

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-03 at 16:19 +, Pete French wrote:
> 
> > 
> > That won't work for the boot drive.
> > 
> > When no boot drive is detected early enough, the kernel goes to the
> > mountroot prompt.  That seems to hold a Giant lock which inhibits
> > further progress being made.  Sometimes progress can be made by
> > trying
> > to mount unmountable partitions on other drives, but this usually
> > goes
> > too fast, especially if the USB drive often times out.
> 
> 
> We have this problem in Azure with a ZFS root, was fixed by the pacth
> in 
> this bug report, which actually starts off being about USB.
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882
> 
> You can then set the mountroot timeout as normal and it works.
> 
> I wold really like this patch to be applied, but it seems to have 
> languished since last summer. We use this as standard on all our
> cloud 
> machines now, and it works very nicely.
> 
> -pete.

I've committed a fix to -current (r330745) based on that patch.  It
would be good if people running -current who've had this problem could
give it some testing.  I'd like to get it merged back to 11 before the
11.1 release (and back to 10-stable as well).

With r330745 in place, the only setting that should be needed if your
rootfs is on a device that is slow to arrive is vfs.mountroot.timeout=
in loader.conf; the value is the number of seconds to wait before
giving up and going to the mountroot prompt.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client

2018-03-10 Thread Rick Macklem
NAGY Andreas wrote:
>Thanks, the not issuing delegation warnings disappeared with this patch.
>
>But now there are some new warnings I haven't seen so far:
>2018-03-10T13:01:39.441Z cpu8:68046)WARNING: NFS41: NFS41FSOpGetObject:2148: 
>Failed to >get object 0x43910e71b386 [36 c6b10167 9b157f95 5aa100fb 8ffcf2c1 c 
>2 9f22ad6d 0 0 0 0 0]: >Stale file handle
I doubt these would be related to the patch. A stale FH means that the client 
tried to
access a file via its FH after it was removed. (Normally this is a client bug, 
but hopefully
not one that will cause grief.)
>These only appear several times after a the NFS share is mounted or remounted 
>after a >connection loss.
>Everything works fine, but haven't seen them till I applied the last patch.
>
>andi
Ok. Thanks for testing all of these patches. I will probably get cleaned up 
versions of
them committed in April.

The main outstanding issue is the Readdir one about directory changing too much.
Hopefully I can find out something about it via email.

Have fun with it, rick
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client

2018-03-10 Thread NAGY Andreas
Thanks, the not issuing delegation warnings disappeared with this patch.

But now there are some new warnings I haven't seen so far:
2018-03-10T13:01:39.441Z cpu8:68046)WARNING: NFS41: NFS41FSOpGetObject:2148: 
Failed to get object 0x43910e71b386 [36 c6b10167 9b157f95 5aa100fb 8ffcf2c1 c 2 
9f22ad6d 0 0 0 0 0]: Stale file handle

These only appear several times after a the NFS share is mounted or remounted 
after a connection loss. 
Everything works fine, but haven't seen them till I applied the last patch.

andi

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"