Or maybe somebody can point me to the right place for submitting this?
Thanks. :)

---



 Monday, January 22, 2018, 14:10:53:

> This is test environment, running Centos 7.4, oVirt 4.2.0, kernel 
> 3.10.0-693.11.6.el7.x86_64 (3.10.0-693.11.1 and 3.10.0-693 have same bugs)
> 
> 
> 1. Can't force NFS to 4.0.
> Some time ago, I've set my NFS version for all storage domains to V4, because 
> there was a bug with Netapp data ontap 8.x
> and RHEL, using NFS 4.1(NFS mount started to hang after a while, STATEID 
> problems) v4 on centos 7.2 and 7.3 was mounting NFS as 4.0,
> so there were no problems related to NFS, after some time Centos 7.4 was 
> released, and I've noticed that mount points started to hang again,
> NFS was mounted with vers=4.1, and it's not possible to change to 4.0, both 
> options "V4" and "V4.1" mounts as 4.1. Looks like V4 option is 
> system default version for 4.X, and as I know it was changed in Centos 7.4 
> from 4.0 to 4.1, maybe 4.0 option should be added
> to force 4.0 version? because adding vers=/nfsvers= in "Additional mount 
> options" is denied by ovirt.
> I know, I can turn it off on netapp side, but there may be situations where 
> storage is out of control. And 4.0 version can't be
> set on ovirt side.
> 
> 2. This bug isn't directly related to ovirt, but affects it.
> Don't really shure that this is right place to report.
> As I've said before there were bug with NFS 4.1, Netapp data ontap 8 and RHEL 
> 7.x, but it was fixed in otap 9.x,
> Now we have 9.x ONTAP on Netapp and it brought new bugs with RHEL 7.4 :D
> After updating to centos 7.4 nfs domains in ovirt started to hang\lock again, 
> This happens randomly, on random hosts, after few
> days of uptime, entire datacenter goes offline, hosts down, storage domains 
> down, some vms in UP and some in unknown state, but
> actually VMs are working, HostedEngine also working, but I can't control the 
> environment.
> There are many hanging ioprocess(>1300) and vdsm processes(>1300) on some 
> hosts, also there are some dd commands, that are checking
> storage hanging:
>         ├─vdsmd─┬─2*[dd]
>         │       ├─1304*[ioprocess───{ioprocess}]
>         │       ├─12*[ioprocess───4*[{ioprocess}]]
>         │       └─1365*[{vdsmd}]
> vdsm     19470  0.0  0.0   4360   348 ?        D<   Jan21   0:00 /usr/bin/dd 
> if=/rhev/data-center/mnt/10.xx.xx.xx:_test__nfs__sas_iso/6cd147b4-8039-4f8a-8aa7-5fd444454d81/dom_md/metadata
>  of=/dev/null bs=4096 count=1 iflag=direct
> vdsm     40707  0.0  0.0   4360   348 ?        D<   00:44   0:00 /usr/bin/dd 
> if=/rhev/data-center/mnt/10.xx.xx.xx:_test__nfs__sas_export/58d9e2c2-8fef-4abc-be13-a273d6af320f/dom_md/metadata
>  of=/dev/null bs=4096 count=1 iflag=direct
> 
> vdsm is hanging at 100% cpu load
> If I'll try to ls this files ls will hang.
> 
> I've made some dump of traffic, so looks like problem with STATID, I've found 
> 2 issues on RedHat web site, but they aren't
> publically available, so i can't read the solution:
> https://access.redhat.com/solutions/3214331   (in my case I have STATEID test)
> https://access.redhat.com/solutions/3164451   (in my case there is no manager 
> thread)
> But it looks' that I've another issue with stateid,
> According to dumps my hosts are sending: TEST_STATEID
> netapp reply is: Status: NFS4ERR_BAD_STATEID (10025)
> After this host sends: Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, 
> ACCESS, GETATTR
> Reply: V4 Reply (Call In 17) OPEN StateID: 0xa205
> Request: V4 Call (Reply In 22) READ StateID: 0xca5f Offset: 0 Len: 4096
> Reply: V4 Reply (Call In 19) READ Status: NFS4ERR_BAD_STATEID
> 
> 
> Entire conversaion looks like:
> No.     Time           Source             Destination       Protocol  Length 
> Info
>       1 0.000000       10._host_          10._netapp_        NFS      238    
> V4 Call (Reply In 2) TEST_STATEID
>       2 0.000251       10._netapp_        10._host_          NFS      170    
> V4 Reply (Call In 1) TEST_STATEID (here is Status: NFS4ERR_BAD_STATEID 
> (10025))
>       3 0.000352       10._host_          10._netapp_        NFS      338    
> V4 Call (Reply In 4) OPEN DH: 0xa2c3ad28/
>       4 0.000857       10._netapp_        10._host_          NFS      394    
> V4 Reply (Call In 3) OPEN StateID: 0xa205
>       5 0.000934       10._host_          10._netapp_        NFS      302    
> V4 Call (Reply In 8) READ StateID: 0xca5f Offset: 0 Len: 4096
>       6 0.000964       10._host_          10._netapp_        NFS      302    
> V4 Call (Reply In 9) READ StateID: 0xca5f Offset: 0 Len: 4096
>       7 0.001133       10._netapp_        10._host_          TCP      70     
> 2049 → 683 [ACK] Seq=425 Ack=901 Win=10240 Len=0 TSval=225608100 
> TSecr=302215289
>       8 0.001258       10._netapp_        10._host_          NFS      170    
> V4 Reply (Call In 5) READ Status: NFS4ERR_BAD_STATEID
>       9 0.001320       10._netapp_        10._host_          NFS      170    
> V4 Reply (Call In 6) READ Status: NFS4ERR_BAD_STATEID
> 
> Sometimes clearing locks on netapp(vserver locks break) and killing 
> dd\ioprocess will help for a while.
> Right now I've my test setup in this state, looks like lock problem is always 
> with metadata\disk check, but not domain itself,
> I can read and write other files in this mountpoint from the same host.
> 
> Hosts have 3.10.0-693.11.6.el7.x86_64 kernel, ovirt 4.2.0
> can't find out If it's Netapp or Centos bug.
> If somebody wants to look closer on dumps, I can mail them directly.

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to