On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <[email protected]> wrote:
> Hi, > > Thanks for the logs. So I have identified one issue from the logs for > which the fix is this: http://review.gluster.org/#/c/14669/. Because of a > bug in the code, ENOENT was getting converted to EPERM and being propagated > up the stack causing the reads to bail out early with 'Operation not > permitted' errors. > I still need to find out two things: > i) why there was a readv() sent on a non-existent (ENOENT) file (this is > important since some of the other users have not faced or reported this > issue on gluster-users with 3.7.13) > ii) need to see if there's a way to work around this issue. > > Do you mind sharing the steps needed to be executed to run into this > issue? This is so that we can apply our patches, test and ensure they fix > the problem. > Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours. I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally. -Krutika > > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < > [email protected]> wrote: > >> Trimmed out the logs to just about when I was shutting down ovirt servers >> for updates which was 14:30 UTC 2016-07-09 >> >> Pre-update settings were >> >> Volume Name: GLUSTER1 >> Type: Replicate >> Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >> Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 >> Options Reconfigured: >> performance.readdir-ahead: on >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> cluster.eager-lock: enable >> network.remote-dio: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> server.allow-insecure: on >> cluster.self-heal-window-size: 1024 >> cluster.background-self-heal-count: 16 >> performance.strict-write-ordering: off >> nfs.disable: on >> nfs.addr-namelookup: off >> nfs.enable-ino32: off >> >> At the time of updates ccgl3 was offline from bad nic on server but had >> been so for about a week with no issues in volume >> >> Shortly after update I added these settings to enable sharding but did >> not as of yet have any VM images sharded. >> features.shard-block-size: 64MB >> features.shard: on >> >> >> >> >> *David Gossage* >> *Carousel Checks Inc. | System Administrator* >> *Office* 708.613.2284 >> >> On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <[email protected]> >> wrote: >> >>> Hi David, >>> >>> Could you also share the brick logs from the affected volume? They're >>> located at >>> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log. >>> >>> Also, could you share the volume configuration (output of `gluster >>> volume info <VOL>`) for the affected volume(s) AND at the time you actually >>> saw this issue? >>> >>> -Krutika >>> >>> >>> >>> >>> On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < >>> [email protected]> wrote: >>> >>>> On Thu, Jul 21, 2016 at 11:47 AM, Scott <[email protected]> wrote: >>>> >>>>> Hi David, >>>>> >>>>> My backend storage is ZFS. >>>>> >>>>> I thought about moving from FUSE to NFS mounts for my Gluster volumes >>>>> to help test. But since I use hosted engine this would be a real pain. >>>>> Its difficult to modify the storage domain type/path in the >>>>> hosted-engine.conf. And I don't want to go through the process of >>>>> re-deploying hosted engine. >>>>> >>>>> >>>> I found this >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1347553 >>>> >>>> Not sure if related. >>>> >>>> But I also have zfs backend, another user in gluster mailing list had >>>> issues and used zfs backend although she used proxmox and got it working by >>>> changing disk to writeback cache I think it was. >>>> >>>> I also use hosted engine, but I run my gluster volume for HE actually >>>> on a LVM separate from zfs on xfs and if i recall it did not have the >>>> issues my gluster on zfs did. I'm wondering now if the issue was zfs >>>> settings. >>>> >>>> Hopefully should have a test machone up soon I can play around with >>>> more. >>>> >>>> Scott >>>>> >>>>> On Thu, Jul 21, 2016 at 11:36 AM David Gossage < >>>>> [email protected]> wrote: >>>>> >>>>>> What back end storage do you run gluster on? xfs/zfs/ext4 etc? >>>>>> >>>>>> *David Gossage* >>>>>> *Carousel Checks Inc. | System Administrator* >>>>>> *Office* 708.613.2284 >>>>>> >>>>>> On Thu, Jul 21, 2016 at 8:18 AM, Scott <[email protected]> wrote: >>>>>> >>>>>>> I get similar problems with oVirt 4.0.1 and hosted engine. After >>>>>>> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the >>>>>>> following: >>>>>>> >>>>>>> $ sudo hosted-engine --set-maintenance --mode=none >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib64/python2.7/runpy.py", line 162, in >>>>>>> _run_module_as_main >>>>>>> "__main__", fname, loader, pkg_name) >>>>>>> File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code >>>>>>> exec code in run_globals >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >>>>>>> line 73, in <module> >>>>>>> if not maintenance.set_mode(sys.argv[1]): >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >>>>>>> line 61, in set_mode >>>>>>> value=m_global, >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >>>>>>> line 259, in set_maintenance_mode >>>>>>> str(value)) >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >>>>>>> line 204, in set_global_md_flag >>>>>>> all_stats = broker.get_stats_from_storage(service) >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>> line 232, in get_stats_from_storage >>>>>>> result = self._checked_communicate(request) >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>> line 260, in _checked_communicate >>>>>>> .format(message or response)) >>>>>>> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: >>>>>>> failed to read metadata: [Errno 1] Operation not permitted >>>>>>> >>>>>>> If I only upgrade one host, then things will continue to work but my >>>>>>> nodes are constantly healing shards. My logs are also flooded with: >>>>>>> >>>>>>> [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274714: READ => -1 gfid=4 >>>>>>> 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >>>>>>> permitted) >>>>>>> The message "W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >>>>>>> operation failed [Operation not permitted]" repeated 6 times between >>>>>>> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] >>>>>>> The message "W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >>>>>>> operation failed [Operation not permitted]" repeated 8 times between >>>>>>> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] >>>>>>> The message "W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >>>>>>> operation failed [Operation not permitted]" repeated 7 times between >>>>>>> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] >>>>>>> [2016-07-21 13:15:24.134647] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >>>>>>> operation failed [Operation not permitted] >>>>>>> [2016-07-21 13:15:24.134764] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >>>>>>> operation failed [Operation not permitted] >>>>>>> [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274741: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274756: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274818: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> [2016-07-21 13:15:54.133582] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >>>>>>> operation failed [Operation not permitted] >>>>>>> [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274853: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274879: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 274894: READ => -1 >>>>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation >>>>>>> not >>>>>>> permitted) >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hey Devid, >>>>>>>> >>>>>>>> I have the very same problem on my test-cluster, despite on running >>>>>>>> ovirt 4.0. >>>>>>>> If you access your volumes via NFS all is fine, problem is FUSE. I >>>>>>>> stayed on 3.7.13, but have no solution yet, now I use NFS. >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >>>>>>>> >>>>>>>> Anyone running one of recent 3.6.x lines and gluster using 3.7.13? >>>>>>>> I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug >>>>>>>> fixes, but >>>>>>>> have been told by users on gluster mail list due to some gluster >>>>>>>> changes >>>>>>>> I'd need to change the disk parameters to use writeback cache. >>>>>>>> Something >>>>>>>> to do with aio support being removed. >>>>>>>> >>>>>>>> I believe this could be done with custom parameters? But I believe >>>>>>>> strage tests are done using dd and would they fail with current >>>>>>>> settings >>>>>>>> then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to >>>>>>>> stability >>>>>>>> isues where gluster storage would go into down state and always show >>>>>>>> N/A as >>>>>>>> space available/used. Even if hosts saw storage still and VM's were >>>>>>>> running on it on all 3 hosts. >>>>>>>> >>>>>>>> Saw a lot of messages like these that went away once gluster >>>>>>>> rollback finished >>>>>>>> >>>>>>>> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 >>>>>>>> kernel >>>>>>>> 7.22 >>>>>>>> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote >>>>>>>> operation failed [Operation not permitted] >>>>>>>> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote >>>>>>>> operation failed [Operation not permitted] >>>>>>>> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>>> 0-glusterfs-fuse: 80: READ => -1 >>>>>>>> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >>>>>>>> fd=0x7f5224002f68 (Operation not permitted) >>>>>>>> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote >>>>>>>> operation failed [Operation not permitted] >>>>>>>> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote >>>>>>>> operation failed [Operation not permitted] >>>>>>>> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>>> 0-glusterfs-fuse: 168: READ => -1 >>>>>>>> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >>>>>>>> fd=0x7f5224002f68 (Operation not permitted) >>>>>>>> >>>>>>>> *David Gossage* >>>>>>>> *Carousel Checks Inc. | System Administrator* >>>>>>>> *Office* 708.613.2284 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing >>>>>>>> [email protected]http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ______________________________________________________________________________ >>>>>>>> BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >>>>>>>> Sandhufe 2 >>>>>>>> 18311 Ribnitz-Damgarten >>>>>>>> >>>>>>>> Telefon: 03821-700-0 >>>>>>>> Fax: 03821-700-240 >>>>>>>> >>>>>>>> E-Mail: [email protected] Internet: >>>>>>>> http://www.bodden-kliniken.de >>>>>>>> >>>>>>>> >>>>>>>> Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: >>>>>>>> 079/133/40188 >>>>>>>> >>>>>>>> Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko >>>>>>>> Milski >>>>>>>> >>>>>>>> >>>>>>>> Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten >>>>>>>> Adressaten bestimmt. Wenn Sie nicht der vorge- >>>>>>>> >>>>>>>> sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, >>>>>>>> beachten Sie bitte, dass jede Form der Veröf- >>>>>>>> >>>>>>>> fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser >>>>>>>> E-Mail unzulässig ist. Wir bitten Sie, sofort den >>>>>>>> Absender zu informieren und die E-Mail zu löschen. >>>>>>>> >>>>>>>> >>>>>>>> Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >>>>>>>> *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

