[ovirt-users] Re: [Gluster-devel] oVirt Survey 2019 results

2019-04-02 Thread Atin Mukherjee
Thanks Sahina for including Gluster community mailing lists.

As Sahina already mentioned we had a strong focus on upgrade testing path
before releasing glusterfs-6. We conducted test day and along with
functional pieces, tested upgrade paths like from 3.12, 4 & 5 to release-6,
we encountered problems but we fixed them before releasing glusterfs-6. So
overall this experience should definitely improve with glusterfs-6.

On Tue, 2 Apr 2019 at 15:16, Sahina Bose  wrote:

>
>
> On Tue, Apr 2, 2019 at 12:07 PM Sandro Bonazzola 
> wrote:
>
>> Thanks to the 143 participants to oVirt Survey 2019!
>> The survey is now closed and results are publicly available at
>> https://bit.ly/2JYlI7U
>> We'll analyze collected data in order to improve oVirt thanks to your
>> feedback.
>>
>> As a first step after reading the results I'd like to invite the 30
>> persons who replied they're willing to contribute code to send an email to
>> de...@ovirt.org introducing themselves: we'll be more than happy to
>> welcome them and helping them getting started.
>>
>> I would also like to invite the 17 people who replied they'd like to help
>> organizing oVirt events in their area to either get in touch with me or
>> introduce themselves to users@ovirt.org so we can discuss about events
>> organization.
>>
>> Last but not least I'd like to invite the 38 people willing to contribute
>> documentation and the one willing to contribute localization to introduce
>> themselves to de...@ovirt.org.
>>
>
> Thank you all for the feedback.
> I was looking at the feedback specific to Gluster. While it's
> disheartening to see "Gluster weakest link in oVirt", I can understand
> where the feedback and frustration is coming from.
>
> Over the past month and in this survey, the common themes that have come up
> - Ensure smoother upgrades for the hyperconverged deployments with
> GlusterFS.  The oVirt 4.3 release with upgrade to gluster 5.3 caused
> disruption for many users and we want to ensure this does not happen again.
> To this end, we are working on adding upgrade tests to OST based CI .
> Contributions are welcome.
>
> - improve performance on gluster storage domain. While we have seen
> promising results with gluster 6 release this is an ongoing effort. Please
> help this effort with inputs on the specific workloads and usecases that
> you run, gathering data and running tests.
>
> - deployment issues. We have worked to improve the deployment flow in 4.3
> by adding pre-checks and changing to gluster-ansible role based deployment.
> We would love to hear specific issues that you're facing around this -
> please raise bugs if you haven't already (
> https://bugzilla.redhat.com/enter_bug.cgi?product=cockpit-ovirt)
>
>
>
>> Thanks!
>> --
>>
>> SANDRO BONAZZOLA
>>
>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>
>> Red Hat EMEA 
>>
>> sbona...@redhat.com
>> 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4N5DYCXY2S6ZAUI7BWD4DEKZ6JL6MSGN/
>>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

-- 
--Atin
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5QM4H6IWFK2ISWU4DEJV7KPVRXWLAJR/


[ovirt-users] Re: [Gluster-users] Re: Announcing Gluster release 5.5

2019-03-29 Thread Atin Mukherjee
On Fri, Mar 29, 2019 at 12:47 PM Krutika Dhananjay 
wrote:

> Questions/comments inline ...
>
> On Thu, Mar 28, 2019 at 10:18 PM  wrote:
>
>> Dear All,
>>
>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While
>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a
>> different experience. After first trying a test upgrade on a 3 node setup,
>> which went fine. i headed to upgrade the 9 node production platform,
>> unaware of the backward compatibility issues between gluster 3.12.15 ->
>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start.
>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata
>> was missing or couldn't be accessed. Restoring this file by getting a good
>> copy of the underlying bricks, removing the file from the underlying bricks
>> where the file was 0 bytes and mark with the stickybit, and the
>> corresponding gfid's. Removing the file from the mount point, and copying
>> back the file on the mount point. Manually mounting the engine domain,  and
>> manually creating the corresponding symbolic links in /rhev/data-center and
>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was
>> root.root), i was able to start the HA engine again. Since the engine was
>> up again, and things seemed rather unstable i decided to continue the
>> upgrade on the other nodes suspecting an incompatibility in gluster
>> versions, i thought would be best to have them all on the same version
>> rather soonish. However things went from bad to worse, the engine stopped
>> again, and all vm’s stopped working as well.  So on a machine outside the
>> setup and restored a backup of the engine taken from version 4.2.8 just
>> before the upgrade. With this engine I was at least able to start some vm’s
>> again, and finalize the upgrade. Once the upgraded, things didn’t stabilize
>> and also lose 2 vm’s during the process due to image corruption. After
>> figuring out gluster 5.3 had quite some issues I was as lucky to see
>> gluster 5.5 was about to be released, on the moment the RPM’s were
>> available I’ve installed those. This helped a lot in terms of stability,
>> for which I’m very grateful! However the performance is unfortunate
>> terrible, it’s about 15% of what the performance was running gluster
>> 3.12.15. It’s strange since a simple dd shows ok performance, but our
>> actual workload doesn’t. While I would expect the performance to be better,
>> due to all improvements made since gluster version 3.12. Does anybody share
>> the same experience?
>> I really hope gluster 6 will soon be tested with ovirt and released, and
>> things start to perform and stabilize again..like the good old days. Of
>> course when I can do anything, I’m happy to help.
>>
>> I think the following short list of issues we have after the migration;
>> Gluster 5.5;
>> -   Poor performance for our workload (mostly write dependent)
>>
>
> For this, could you share the volume-profile output specifically for the
> affected volume(s)? Here's what you need to do -
>
> 1. # gluster volume profile $VOLNAME stop
> 2. # gluster volume profile $VOLNAME start
> 3. Run the test inside the vm wherein you see bad performance
> 4. # gluster volume profile $VOLNAME info # save the output of this
> command into a file
> 5. # gluster volume profile $VOLNAME stop
> 6. and attach the output file gotten in step 4
>
> -   VM’s randomly pause on un
>>
> known storage errors, which are “stale file’s”. corresponding log; Lookup
>> on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8
>> [Stale file handle]
>>
>
> Could you share the complete gluster client log file (it would be a
> filename matching the pattern rhev-data-center-mnt-glusterSD-*)
> Also the output of `gluster volume info $VOLNAME`
>
>
>
>> -   Some files are listed twice in a directory (probably related the
>> stale file issue?)
>> Example;
>> ls -la
>> /rhev/data-center/59cd53a9-0003-02d7-00eb-01e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/
>> total 3081
>> drwxr-x---.  2 vdsm kvm4096 Mar 18 11:34 .
>> drwxr-xr-x. 13 vdsm kvm4096 Mar 19 09:42 ..
>> -rw-rw.  1 vdsm kvm 1048576 Mar 28 12:55
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>> -rw-rw.  1 vdsm kvm 1048576 Mar 28 12:55
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>> -rw-rw.  1 vdsm kvm 1048576 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease
>> -rw-r--r--.  1 vdsm kvm 290 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>> -rw-r--r--.  1 vdsm kvm 290 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>>
>
> Adding DHT and readdir-ahead maintainers regarding entries getting listed
> twice.
> @Nithya Balachandran  ^^
> @Gowdappa, Raghavendra  ^^
> @Poornima Gurusiddaiah  ^^
>
>
>>
>> - brick processes sometimes starts multiple times. Sometimes I’ve 5 brick
>> processes for a single volume. Killing all glusterfsd’s fo

Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-10 Thread Atin Mukherjee
On Fri, Jul 7, 2017 at 2:23 PM, Gianluca Cecchi 
wrote:

> On Thu, Jul 6, 2017 at 3:22 PM, Gianluca Cecchi  > wrote:
>
>> On Thu, Jul 6, 2017 at 2:16 PM, Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Thu, Jul 6, 2017 at 5:26 PM, Gianluca Cecchi <
>>> gianluca.cec...@gmail.com> wrote:
>>>
>>>> On Thu, Jul 6, 2017 at 8:38 AM, Gianluca Cecchi <
>>>> gianluca.cec...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Eventually I can destroy and recreate this "export" volume again with
>>>>> the old names (ovirt0N.localdomain.local) if you give me the sequence of
>>>>> commands, then enable debug and retry the reset-brick command
>>>>>
>>>>> Gianluca
>>>>>
>>>>
>>>>
>>>> So it seems I was able to destroy and re-create.
>>>> Now I see that the volume creation uses by default the new ip, so I
>>>> reverted the hostnames roles in the commands after putting glusterd in
>>>> debug mode on the host where I execute the reset-brick command (do I have
>>>> to set debug for the the nodes too?)
>>>>
>>>
>>> You have to set the log level to debug for glusterd instance where the
>>> commit fails and share the glusterd log of that particular node.
>>>
>>>
>>
>> Ok, done.
>>
>> Command executed on ovirt01 with timestamp "2017-07-06 13:04:12" in
>> glusterd log files
>>
>> [root@ovirt01 export]# gluster volume reset-brick export
>> gl01.localdomain.local:/gluster/brick3/export start
>> volume reset-brick: success: reset-brick start operation successful
>>
>> [root@ovirt01 export]# gluster volume reset-brick export
>> gl01.localdomain.local:/gluster/brick3/export
>> ovirt01.localdomain.local:/gluster/brick3/export commit force
>> volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
>> Please check log file for details.
>> Commit failed on ovirt03.localdomain.local. Please check log file for
>> details.
>> [root@ovirt01 export]#
>>
>> See glusterd log files for the 3 nodes in debug mode here:
>> ovirt01: https://drive.google.com/file/d/0BwoPbcrMv8mvY1RTTG
>> p3RUhScm8/view?usp=sharing
>> ovirt02: https://drive.google.com/file/d/0BwoPbcrMv8mvSVpJUH
>> NhMzhMSU0/view?usp=sharing
>> ovirt03: https://drive.google.com/file/d/0BwoPbcrMv8mvT2xiWE
>> dQVmJNb0U/view?usp=sharing
>>
>> HIH debugging
>> Gianluca
>>
>>
> Hi Atin,
> did you have time to see the logs?
> Comparing debug enabled messages with previous ones, I see these added
> lines on nodes where commit failed after running the commands
>
> gluster volume reset-brick export 
> gl01.localdomain.local:/gluster/brick3/export
> start
> gluster volume reset-brick export 
> gl01.localdomain.local:/gluster/brick3/export
> ovirt01.localdomain.local:/gluster/brick3/export commit force
>
>
> [2017-07-06 13:04:30.221872] D [MSGID: 0] 
> [glusterd-peer-utils.c:674:gd_peerinfo_find_from_hostname]
> 0-management: Friend ovirt01.localdomain.local found.. state: 3
> [2017-07-06 13:04:30.221882] D [MSGID: 0] 
> [glusterd-peer-utils.c:167:glusterd_hostname_to_uuid]
> 0-management: returning 0
> [2017-07-06 13:04:30.221888] D [MSGID: 0] 
> [glusterd-utils.c:1039:glusterd_resolve_brick]
> 0-management: Returning 0
> [2017-07-06 13:04:30.221908] D [MSGID: 0] 
> [glusterd-utils.c:998:glusterd_brickinfo_new]
> 0-management: Returning 0
> [2017-07-06 13:04:30.221915] D [MSGID: 0] 
> [glusterd-utils.c:1195:glusterd_brickinfo_new_from_brick]
> 0-management: Returning 0
> [2017-07-06 13:04:30.222187] D [MSGID: 0] 
> [glusterd-peer-utils.c:167:glusterd_hostname_to_uuid]
> 0-management: returning 0
> [2017-07-06 13:04:30.01] D [MSGID: 0] 
> [glusterd-utils.c:1486:glusterd_volume_brickinfo_get]
> 0-management: Returning -1
>

The above log entry is the reason of the failure. GlusterD is unable to
find the old brick (src_brick) from its volinfo structure. FWIW, would you
be able to share the content of 'gluster get-state' output & gluster volume
info output after running reset-brick start? I'd need to check why glusterd
is unable to find out the old brick's details from its volinfo post
reset-brick start.



> [2017-07-06 13:04:30.07] D [MSGID: 0] 
> [store.c:459:gf_store_handle_destroy]
> 0-: Returning 0
> [2017-07-06 13:04:30.42] D [MSGID: 0] [glusterd-utils.c:1512:gluster
> d_volume_brickinfo_get_by_brick] 0-glusterd: Returning -1
> [2017-07-06 13:04:30.50] D [MSGID: 0] [glusterd-replace-brick.c:416:
> glusterd_op_perform_replace_brick] 0-glusterd: Returning -1
> [2017-07-06 13:04:30.57] C [MSGID: 106074]
> [glusterd-reset-brick.c:372:glusterd_op_reset_brick] 0-management: Unable
> to add dst-brick: ovirt01.localdomain.local:/gluster/brick3/export to
> volume: export
>
>
> Does it share up more light?
>
> Thanks,
> Gianluca
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-07 Thread Atin Mukherjee
You'd need to allow some more time to dig into the logs. I'll try to get
back on this by Monday.

On Fri, Jul 7, 2017 at 2:23 PM, Gianluca Cecchi 
wrote:

> On Thu, Jul 6, 2017 at 3:22 PM, Gianluca Cecchi  > wrote:
>
>> On Thu, Jul 6, 2017 at 2:16 PM, Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Thu, Jul 6, 2017 at 5:26 PM, Gianluca Cecchi <
>>> gianluca.cec...@gmail.com> wrote:
>>>
>>>> On Thu, Jul 6, 2017 at 8:38 AM, Gianluca Cecchi <
>>>> gianluca.cec...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Eventually I can destroy and recreate this "export" volume again with
>>>>> the old names (ovirt0N.localdomain.local) if you give me the sequence of
>>>>> commands, then enable debug and retry the reset-brick command
>>>>>
>>>>> Gianluca
>>>>>
>>>>
>>>>
>>>> So it seems I was able to destroy and re-create.
>>>> Now I see that the volume creation uses by default the new ip, so I
>>>> reverted the hostnames roles in the commands after putting glusterd in
>>>> debug mode on the host where I execute the reset-brick command (do I have
>>>> to set debug for the the nodes too?)
>>>>
>>>
>>> You have to set the log level to debug for glusterd instance where the
>>> commit fails and share the glusterd log of that particular node.
>>>
>>>
>>
>> Ok, done.
>>
>> Command executed on ovirt01 with timestamp "2017-07-06 13:04:12" in
>> glusterd log files
>>
>> [root@ovirt01 export]# gluster volume reset-brick export
>> gl01.localdomain.local:/gluster/brick3/export start
>> volume reset-brick: success: reset-brick start operation successful
>>
>> [root@ovirt01 export]# gluster volume reset-brick export
>> gl01.localdomain.local:/gluster/brick3/export
>> ovirt01.localdomain.local:/gluster/brick3/export commit force
>> volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
>> Please check log file for details.
>> Commit failed on ovirt03.localdomain.local. Please check log file for
>> details.
>> [root@ovirt01 export]#
>>
>> See glusterd log files for the 3 nodes in debug mode here:
>> ovirt01: https://drive.google.com/file/d/0BwoPbcrMv8mvY1RTTG
>> p3RUhScm8/view?usp=sharing
>> ovirt02: https://drive.google.com/file/d/0BwoPbcrMv8mvSVpJUH
>> NhMzhMSU0/view?usp=sharing
>> ovirt03: https://drive.google.com/file/d/0BwoPbcrMv8mvT2xiWE
>> dQVmJNb0U/view?usp=sharing
>>
>> HIH debugging
>> Gianluca
>>
>>
> Hi Atin,
> did you have time to see the logs?
> Comparing debug enabled messages with previous ones, I see these added
> lines on nodes where commit failed after running the commands
>
> gluster volume reset-brick export 
> gl01.localdomain.local:/gluster/brick3/export
> start
> gluster volume reset-brick export 
> gl01.localdomain.local:/gluster/brick3/export
> ovirt01.localdomain.local:/gluster/brick3/export commit force
>
>
> [2017-07-06 13:04:30.221872] D [MSGID: 0] 
> [glusterd-peer-utils.c:674:gd_peerinfo_find_from_hostname]
> 0-management: Friend ovirt01.localdomain.local found.. state: 3
> [2017-07-06 13:04:30.221882] D [MSGID: 0] 
> [glusterd-peer-utils.c:167:glusterd_hostname_to_uuid]
> 0-management: returning 0
> [2017-07-06 13:04:30.221888] D [MSGID: 0] 
> [glusterd-utils.c:1039:glusterd_resolve_brick]
> 0-management: Returning 0
> [2017-07-06 13:04:30.221908] D [MSGID: 0] 
> [glusterd-utils.c:998:glusterd_brickinfo_new]
> 0-management: Returning 0
> [2017-07-06 13:04:30.221915] D [MSGID: 0] [glusterd-utils.c:1195:
> glusterd_brickinfo_new_from_brick] 0-management: Returning 0
> [2017-07-06 13:04:30.222187] D [MSGID: 0] 
> [glusterd-peer-utils.c:167:glusterd_hostname_to_uuid]
> 0-management: returning 0
> [2017-07-06 13:04:30.01] D [MSGID: 0] 
> [glusterd-utils.c:1486:glusterd_volume_brickinfo_get]
> 0-management: Returning -1
> [2017-07-06 13:04:30.07] D [MSGID: 0] 
> [store.c:459:gf_store_handle_destroy]
> 0-: Returning 0
> [2017-07-06 13:04:30.42] D [MSGID: 0] [glusterd-utils.c:1512:
> glusterd_volume_brickinfo_get_by_brick] 0-glusterd: Returning -1
> [2017-07-06 13:04:30.50] D [MSGID: 0] [glusterd-replace-brick.c:416:
> glusterd_op_perform_replace_brick] 0-glusterd: Returning -1
> [2017-07-06 13:04:30.57] C [MSGID: 106074] 
> [glusterd-reset-brick.c:372:glusterd_op_reset_brick]
> 0-management: Unable to add dst-brick: 
> ovirt01.localdomain.local:/gluster/brick3/export
> to volume: export
>
>
> Does it share up more light?
>
> Thanks,
> Gianluca
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-06 Thread Atin Mukherjee
On Thu, Jul 6, 2017 at 5:26 PM, Gianluca Cecchi 
wrote:

> On Thu, Jul 6, 2017 at 8:38 AM, Gianluca Cecchi  > wrote:
>
>>
>> Eventually I can destroy and recreate this "export" volume again with the
>> old names (ovirt0N.localdomain.local) if you give me the sequence of
>> commands, then enable debug and retry the reset-brick command
>>
>> Gianluca
>>
>
>
> So it seems I was able to destroy and re-create.
> Now I see that the volume creation uses by default the new ip, so I
> reverted the hostnames roles in the commands after putting glusterd in
> debug mode on the host where I execute the reset-brick command (do I have
> to set debug for the the nodes too?)
>

You have to set the log level to debug for glusterd instance where the
commit fails and share the glusterd log of that particular node.


>
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> gl01.localdomain.local:/gluster/brick3/export start
> volume reset-brick: success: reset-brick start operation successful
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> gl01.localdomain.local:/gluster/brick3/export 
> ovirt01.localdomain.local:/gluster/brick3/export
> commit force
> volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
> Please check log file for details.
> Commit failed on ovirt03.localdomain.local. Please check log file for
> details.
> [root@ovirt01 ~]#
>
> See here the glusterd.log in zip format:
> https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/
> view?usp=sharing
>
> Time of the reset-brick operation in logfile is 2017-07-06 11:42
> (BTW: can I have time in log not in UTC format, as I'm using CEST date in
> my system?)
>
> I see a difference, because the brick doesn't seems isolated as before...
>
> [root@ovirt01 glusterfs]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: e278a830-beed-4255-b9ca-587a630cbdbf
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt01.localdomain.local:/gluster/brick3/export
> Brick2: 10.10.2.103:/gluster/brick3/export
> Brick3: 10.10.2.104:/gluster/brick3/export (arbiter)
>
> [root@ovirt02 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: e278a830-beed-4255-b9ca-587a630cbdbf
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt01.localdomain.local:/gluster/brick3/export
> Brick2: 10.10.2.103:/gluster/brick3/export
> Brick3: 10.10.2.104:/gluster/brick3/export (arbiter)
>
> And also in oVirt I see all 3 bricks online
>
> Gianluca
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
On Thu, Jul 6, 2017 at 3:47 AM, Gianluca Cecchi 
wrote:

> On Wed, Jul 5, 2017 at 6:39 PM, Atin Mukherjee 
> wrote:
>
>> OK, so the log just hints to the following:
>>
>> [2017-07-05 15:04:07.178204] E [MSGID: 106123]
>> [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit
>> failed for operation Reset Brick on local node
>> [2017-07-05 15:04:07.178214] E [MSGID: 106123]
>> [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
>> 0-management: Commit Op Failed
>>
>> While going through the code, glusterd_op_reset_brick () failed resulting
>> into these logs. Now I don't see any error logs generated from
>> glusterd_op_reset_brick () which makes me thing that have we failed from a
>> place where we log the failure in debug mode. Would you be able to restart
>> glusterd service with debug log mode and reran this test and share the log?
>>
>>
> Do you mean to run the reset-brick command for another volume or for the
> same? Can I run it against this "now broken" volume?
>
> Or perhaps can I modify /usr/lib/systemd/system/glusterd.service and
> change in [service] section
>
> from
> Environment="LOG_LEVEL=INFO"
>
> to
> Environment="LOG_LEVEL=DEBUG"
>
> and then
> systemctl daemon-reload
> systemctl restart glusterd
>

Yes, that's how you can run glusterd in debug log mode.

>
> I think it would be better to keep gluster in debug mode the less time
> possible, as there are other volumes active right now, and I want to
> prevent fill the log files file system
> Best to put only some components in debug mode if possible as in the
> example commands above.
>

You can switch back to info mode the moment this is hit one more time with
the debug log enabled. What I'd need here is the glusterd log (with debug
mode) to figure out the exact cause of the failure.


>
> Let me know,
> thanks
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
OK, so the log just hints to the following:

[2017-07-05 15:04:07.178204] E [MSGID: 106123]
[glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed
for operation Reset Brick on local node
[2017-07-05 15:04:07.178214] E [MSGID: 106123]
[glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
0-management: Commit Op Failed

While going through the code, glusterd_op_reset_brick () failed resulting
into these logs. Now I don't see any error logs generated from
glusterd_op_reset_brick () which makes me thing that have we failed from a
place where we log the failure in debug mode. Would you be able to restart
glusterd service with debug log mode and reran this test and share the log?


On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi 
wrote:

>
>
> On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee 
> wrote:
>
>> And what does glusterd log indicate for these failures?
>>
>
>
> See here in gzip format
>
> https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/
> view?usp=sharing
>
> It seems that on each host the peer files have been updated with a new
> entry "hostname2":
>
> [root@ovirt01 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root@ovirt01 ~]#
>
> [root@ovirt02 ~]# cat /var/lib/glusterd/peers/*
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root@ovirt02 ~]#
>
> [root@ovirt03 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> [root@ovirt03 ~]#
>
>
> But not the gluster info on the second and third node that have lost the
> ovirt01/gl01 host brick information...
>
> Eg on ovirt02
>
>
> [root@ovirt02 peers]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root@ovirt02 peers]#
>
> And on ovirt03
>
> [root@ovirt03 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root@ovirt03 ~]#
>
> While on ovirt01 it seems isolated...
>
> [root@ovirt01 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47

Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
And what does glusterd log indicate for these failures?

On Wed, Jul 5, 2017 at 8:43 PM, Gianluca Cecchi 
wrote:

>
>
> On Wed, Jul 5, 2017 at 5:02 PM, Sahina Bose  wrote:
>
>>
>>
>> On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi <
>> gianluca.cec...@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>>>


> ...
>
> then the commands I need to run would be:
>
> gluster volume reset-brick export 
> ovirt01.localdomain.local:/gluster/brick3/export
> start
> gluster volume reset-brick export 
> ovirt01.localdomain.local:/gluster/brick3/export
> gl01.localdomain.local:/gluster/brick3/export commit force
>
> Correct?
>

 Yes, correct. gl01.localdomain.local should resolve correctly on all 3
 nodes.

>>>
>>>
>>> It fails at first step:
>>>
>>>  [root@ovirt01 ~]# gluster volume reset-brick export
>>> ovirt01.localdomain.local:/gluster/brick3/export start
>>> volume reset-brick: failed: Cannot execute command. The cluster is
>>> operating at version 30712. reset-brick command reset-brick start is
>>> unavailable in this version.
>>> [root@ovirt01 ~]#
>>>
>>> It seems somehow in relation with this upgrade not of the commercial
>>> solution Red Hat Gluster Storage
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Storag
>>> e/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>>>
>>> So ti seems I have to run some command of type:
>>>
>>> gluster volume set all cluster.op-version X
>>>
>>> with X > 30712
>>>
>>> It seems that latest version of commercial Red Hat Gluster Storage is
>>> 3.1 and its op-version is indeed 30712..
>>>
>>> So the question is which particular op-version I have to set and if the
>>> command can be set online without generating disruption
>>>
>>
>> It should have worked with the glusterfs 3.10 version from Centos repo.
>> Adding gluster-users for help on the op-version
>>
>>
>>>
>>> Thanks,
>>> Gianluca
>>>
>>
>>
>
> It seems op-version is not updated automatically by default, so that it
> can manage mixed versions while you update one by one...
>
> I followed what described here:
> https://gluster.readthedocs.io/en/latest/Upgrade-Guide/op_version/
>
>
> - Get current version:
>
> [root@ovirt01 ~]# gluster volume get all cluster.op-version
> Option  Value
>
> --  -
>
> cluster.op-version  30712
>
> [root@ovirt01 ~]#
>
>
> - Get maximum version I can set for current setup:
>
> [root@ovirt01 ~]# gluster volume get all cluster.max-op-version
> Option  Value
>
> --  -
>
> cluster.max-op-version  31000
>
> [root@ovirt01 ~]#
>
>
> - Get op version information for all the connected clients:
>
> [root@ovirt01 ~]# gluster volume status all clients | grep ":49" | awk
> '{print $4}' | sort | uniq -c
>  72 31000
> [root@ovirt01 ~]#
>
> --> ok
>
>
> - Update op-version
>
> [root@ovirt01 ~]# gluster volume set all cluster.op-version 31000
> volume set: success
> [root@ovirt01 ~]#
>
>
> - Verify:
> [root@ovirt01 ~]# gluster volume get all cluster.op-versionOption
>  Value
> --  -
>
> cluster.op-version  31000
>
> [root@ovirt01 ~]#
>
> --> ok
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> ovirt01.localdomain.local:/gluster/brick3/export start
> volume reset-brick: success: reset-brick start operation successful
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> ovirt01.localdomain.local:/gluster/brick3/export 
> gl01.localdomain.local:/gluster/brick3/export
> commit force
> volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
> Please check log file for details.
> Commit failed on ovirt03.localdomain.local. Please check log file for
> details.
> [root@ovirt01 ~]#
>
> [root@ovirt01 bricks]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gl01.localdomain.local:/gluster/brick3/export
> Brick2: ovirt02.localdomain.local:/gluster/brick3/export
> Brick3: ovirt03.localdomain.local:/gluster/brick3/export (arbiter)
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster

Re: [ovirt-users] [Gluster-users] op-version for reset-brick (Was: Re: Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
On Wed, Jul 5, 2017 at 8:32 PM, Sahina Bose  wrote:

>
>
> On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi  > wrote:
>
>>
>>
>> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>>
>>>
>>>
 ...

 then the commands I need to run would be:

 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 start
 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 gl01.localdomain.local:/gluster/brick3/export commit force

 Correct?

>>>
>>> Yes, correct. gl01.localdomain.local should resolve correctly on all 3
>>> nodes.
>>>
>>
>>
>> It fails at first step:
>>
>>  [root@ovirt01 ~]# gluster volume reset-brick export
>> ovirt01.localdomain.local:/gluster/brick3/export start
>> volume reset-brick: failed: Cannot execute command. The cluster is
>> operating at version 30712. reset-brick command reset-brick start is
>> unavailable in this version.
>> [root@ovirt01 ~]#
>>
>> It seems somehow in relation with this upgrade not of the commercial
>> solution Red Hat Gluster Storage
>> https://access.redhat.com/documentation/en-US/Red_Hat_Storag
>> e/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>>
>> So ti seems I have to run some command of type:
>>
>> gluster volume set all cluster.op-version X
>>
>> with X > 30712
>>
>> It seems that latest version of commercial Red Hat Gluster Storage is 3.1
>> and its op-version is indeed 30712..
>>
>> So the question is which particular op-version I have to set and if the
>> command can be set online without generating disruption
>>
>
> It should have worked with the glusterfs 3.10 version from Centos repo.
> Adding gluster-users for help on the op-version
>

This definitely means your cluster op-version is running < 3.9.0

 if (conf->op_version < GD_OP_VERSION_3_9_0
&&
strcmp (cli_op, "GF_REPLACE_OP_COMMIT_FORCE"))
{
snprintf (msg, sizeof (msg), "Cannot execute command. The
"
  "cluster is operating at version %d. reset-brick
"
  "command %s is unavailable in this
version.",

conf->op_version,
  gd_rb_op_to_str
(cli_op));
ret =
-1;
goto
out;
}

What's the version of gluster bits are you running across the gluster
cluster? Please note cluster.op-version is not exactly the same as of rpm
version and with every upgrades it's recommended to bump up the op-version.


>
>>
>> Thanks,
>> Gianluca
>>
>
>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] Gluster issue with /var/lib/glusterd/peers/ file

2017-07-03 Thread Atin Mukherjee
Please attach glusterd & cmd_history log files from all the nodes.

On Mon, Jul 3, 2017 at 2:55 PM, Sahina Bose  wrote:

>
>
> On Sun, Jul 2, 2017 at 5:38 AM, Mike DePaulo  wrote:
>
>> Hi everyone,
>>
>> I have ovirt 4.1.1/4.1.2 running on 3 hosts with a gluster hosted engine.
>>
>> I was working on setting up a network for gluster storage and
>> migration. The addresses for it will be 10.0.20.x, rather than
>> 192.168.1.x for the management network.  However, I switched gluster
>> storage and migration back over to the management network.
>>
>> I updated and rebooted one of my hosts (death-star, 10.0.20.52) and on
>> reboot, the glusterd service would start, but wouldn't seem to work.
>> The engine webgui reported that its bricks were down, and commands
>> like this would fail:
>>
>> [root@death-star glusterfs]# gluster pool list
>> pool list: failed
>> [root@death-star glusterfs]# gluster peer status
>> peer status: failed
>>
>> Upon further investigation, I had under /var/lib/glusterd/peers/ the 2
>> existing UUID files, plus a new 3rd one:
>> [root@death-star peers]# cat 10.0.20.53
>> uuid=----
>> state=0
>> hostname1=10.0.20.53
>>
>
> [Adding gluster-users]
>
> How did you add this peer "10.0.20.53"? Is this another interface for an
> existing peer?
>
>
>> I moved that file out of there, restarted glusterd, and now gluster is
>> working again.
>>
>> I am guessing that this is a bug. Let me know if I should attach other
>> log files; I am not sure which ones.
>>
>> And yes, 10.0.20.53 is the IP of one of the other hosts.
>>
>> -Mike
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] timeouts

2015-11-26 Thread Atin Mukherjee


On 11/27/2015 10:52 AM, Sahina Bose wrote:
> [+ gluster-users]
> 
> On 11/26/2015 08:37 PM, p...@email.cz wrote:
>> Hello,
>> can anybody  help me with this timeouts ??
>> Volumes are not active yes ( bricks down )
>>
>> desc. of gluster bellow ...
>>
>> */var/log/glusterfs/**etc-glusterfs-glusterd.vol.log*
>> [2015-11-26 14:44:47.174221] I [MSGID: 106004]
>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
>> Peer <1hp1-SAN> (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state
>> , has disconnected from glusterd.
>> [2015-11-26 14:44:47.174354] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7fb7039d44dc]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7fb7039de542]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not held
>> [2015-11-26 14:44:47.17] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7fb7039d44dc]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7fb7039de542]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not held
>> [2015-11-26 14:44:47.174521] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7fb7039d44dc]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7fb7039de542]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not held
>> [2015-11-26 14:44:47.174662] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7fb7039d44dc]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7fb7039de542]
>> -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not held
>> [2015-11-26 14:44:47.174532] W [MSGID: 106118]
>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>> Lock not released for 2HP12-P1
>> [2015-11-26 14:44:47.174675] W [MSGID: 106118]
>> [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management:
>> Lock not released for 2HP12-P3
>> [2015-11-26 14:44:49.423334] I [MSGID: 106488]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> The message "I [MSGID: 106488]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req" repeated 4 times between [2015-11-26
>> 14:44:49.423334] and [2015-11-26 14:44:49.429781]
>> [2015-11-26 14:44:51.148711] I [MSGID: 106163]
>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 30702
>> [2015-11-26 14:44:52.177266] W [socket.c:869:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Invalid
>> argument
>> [2015-11-26 14:44:52.177291] E [socket.c:2965:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2015-11-26 14:44:53.180426] W [socket.c:869:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 17, Invalid
>> argument
>> [2015-11-26 14:44:53.180447] E [socket.c:2965:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2015-11-26 14:44:52.395468] I [MSGID: 106163]
>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 30702
>> [2015-11-26 14:44:54.851958] I [MSGID: 106488]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> [2015-11-26 14:44:57.183969] W [socket.c:869:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 19, Invalid
>> argument
>> [2015-11-26 14:44:57.183990] E [socket.c:2965:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>>
>> After volumes creation all works fine ( volumes up ) , but then, after
>> several reboots ( yum updates) volumes failed due timeouts .
>>
>> Gluster description:
>>
>> 4 nodes with 4 volumes replica 2
>> oVirt 3.6 - the last
>> gluster 3.7.6 - the last
>> vdsm 4.17.999 - from git repo
>> oVirt - mgmt.nodes 172.16.0.0
>> oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)
>> Network works fine, no lost packets
>>
>> # gluster volume status
>> Staging failed on 2hp1-SAN. Please check log file for details.
>> Staging failed on 1hp2-SAN. Please check log file for details.
>> Staging failed on 2hp2-SAN. Please check log fi

Re: [ovirt-users] [Gluster-users] Centos 7.1 failed to start glusterd after upgrading to ovirt 3.6

2015-11-05 Thread Atin Mukherjee
>> [glusterd-store.c:4243:glusterd_resolve_all_bricks] 0-glusterd:
>> resolve brick failed in restore
The above log is the culprit here. Generally this function fails when
GlusterD fails to resolve the associated host of a brick. Has any of the
node undergone an IP change during the upgrade process?

~Atin

On 11/06/2015 09:59 AM, Sahina Bose wrote:
> Did you upgrade all the nodes too?
> Are some of your nodes not-reachable?
> 
> Adding gluster-users for glusterd error.
> 
> On 11/06/2015 12:00 AM, Stefano Danzi wrote:
>>
>> After upgrading oVirt from 3.5 to 3.6, glusterd fail to start when the
>> host boot.
>> Manual start of service after boot works fine.
>>
>> gluster log:
>>
>> [2015-11-04 13:37:55.360876] I [MSGID: 100030]
>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running
>> /usr/sbin/glusterd version 3.7.5 (args: /usr/sbin/glusterd -p
>> /var/run/glusterd.pid)
>> [2015-11-04 13:37:55.447413] I [MSGID: 106478] [glusterd.c:1350:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2015-11-04 13:37:55.447477] I [MSGID: 106479] [glusterd.c:1399:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2015-11-04 13:37:55.464540] W [MSGID: 103071]
>> [rdma.c:4592:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [Nessun device corrisponde]
>> [2015-11-04 13:37:55.464559] W [MSGID: 103055] [rdma.c:4899:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2015-11-04 13:37:55.464566] W
>> [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma'
>> initialization failed
>> [2015-11-04 13:37:55.464616] W [rpcsvc.c:1597:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2015-11-04 13:37:55.464624] E [MSGID: 106243] [glusterd.c:1623:init]
>> 0-management: creation of 1 listeners failed, continuing with
>> succeeded transport
>> [2015-11-04 13:37:57.663862] I [MSGID: 106513]
>> [glusterd-store.c:2036:glusterd_restore_op_version] 0-glusterd:
>> retrieved op-version: 30600
>> [2015-11-04 13:37:58.284522] I [MSGID: 106194]
>> [glusterd-store.c:3465:glusterd_store_retrieve_missed_snaps_list]
>> 0-management: No missed snaps list.
>> [2015-11-04 13:37:58.287477] E [MSGID: 106187]
>> [glusterd-store.c:4243:glusterd_resolve_all_bricks] 0-glusterd:
>> resolve brick failed in restore
>> [2015-11-04 13:37:58.287505] E [MSGID: 101019]
>> [xlator.c:428:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again
>> [2015-11-04 13:37:58.287513] E [graph.c:322:glusterfs_graph_init]
>> 0-management: initializing translator failed
>> [2015-11-04 13:37:58.287518] E [graph.c:661:glusterfs_graph_activate]
>> 0-graph: init failed
>> [2015-11-04 13:37:58.287799] W [glusterfsd.c:1236:cleanup_and_exit]
>> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7f29b876524d]
>> -->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7f29b87650f6]
>> -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f29b87646d9] ) 0-:
>> received signum (0), shutting down
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot mount gluster storage data

2015-09-27 Thread Atin Mukherjee


On 09/25/2015 01:25 PM, Ravishankar N wrote:
> 
> 
> On 09/25/2015 12:32 PM, Jean-Michel FRANCOIS wrote:
>> Hi Ovirt users,
>>
>> I'm running ovirt hosted 3.4 with gluster data storage.
>> When I add a new host (Centos 6.6) the data storage (as a glsuterfs)
>> cannot be mount.
>> I have the following errors in gluster client log file :
>> [2015-09-24 12:27:22.636221] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>> thread with index 1
>> [2015-09-24 12:27:22.636588] W [socket.c:588:__socket_rwv]
>> 0-glusterfs: readv on 172.16.0.5:24007 failed (No data available)
>> [2015-09-24 12:27:22.637307] E [rpc-clnt.c:362:saved_frames_unwind]
>> (-->
>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f427fb3063b]
>> (-->
>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f427f8fc1d7]
>> (-->
>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f427f8fc2ee]
>> (-->
>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7f427f8fc3bb]
>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7f427f8fc9f2]
>> ) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake)
>> op(GETSPEC(2)) called at 2015-09-24 12:27:22.636344 (xid=0x1)
>> [2015-09-24 12:27:22.637333] E
>> [glusterfsd-mgmt.c:1604:mgmt_getspec_cbk] 0-mgmt: failed to fetch
>> volume file (key:/data)
>> [2015-09-24 12:27:22.637360] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x20e)
>> [0x7f427f8fc1fe] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3f2)
>> [0x40d5d2] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] )
>> 0-: received signum (0), shutting down
>> [2015-09-24 12:27:22.637375] I [fuse-bridge.c:5595:fini] 0-fuse:
>> Unmounting '/rhev/data-center/mnt/glusterSD/172.16.0.5:_data'.
>> [2015-09-24 12:27:22.646246] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f427ec18a51]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-:
>> received signum (15), shutting down
>> [2015-09-24 12:27:22.646246] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f427ec18a51]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-:
>> received signum (15), shutting down
>> And nothing server side.
>>
> 
> This does look like an op-version issue. Adding Atin for any possible help.
Yes this does look an op-version issue. The current version of the
client is not supported. What client and server version of gluster are
you using?

~Atin
> -Ravi
> 
>> I suppose it is a version issue since on server side I have
>> glusterfs-api-3.6.3-1.el6.x86_64
>> glusterfs-fuse-3.6.3-1.el6.x86_64
>> glusterfs-libs-3.6.3-1.el6.x86_64
>> glusterfs-3.6.3-1.el6.x86_64
>> glusterfs-cli-3.6.3-1.el6.x86_64
>> glusterfs-rdma-3.6.3-1.el6.x86_64
>> glusterfs-server-3.6.3-1.el6.x86_64
>>
>> and on the new host :
>> glusterfs-3.7.4-2.el6.x86_64
>> glusterfs-api-3.7.4-2.el6.x86_64
>> glusterfs-libs-3.7.4-2.el6.x86_64
>> glusterfs-fuse-3.7.4-2.el6.x86_64
>> glusterfs-cli-3.7.4-2.el6.x86_64
>> glusterfs-server-3.7.4-2.el6.x86_64
>> glusterfs-client-xlators-3.7.4-2.el6.x86_64
>> glusterfs-rdma-3.7.4-2.el6.x86_64
>>
>> But since it is a production system, i'm not confident about
>> performing gluster server upgrade.
>> Mounting a gluster volume as NFS is possible (the engine data storage
>> has been mounted succesfully).
>>
>> I'm asking here because glusterfs comes from the ovirt3.4 rpm repository.
>>
>> If anyone have a hint to this problem
>>
>> thanks
>> Jean-Michel
>>
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] Failed to create volume in OVirt with gluster

2015-01-14 Thread Atin Mukherjee
Punit,

cli log wouldn't help much here. To debug this issue further can you
please let us know the following:

1. gluster peer status output
2. gluster volume status output
3. gluster --version output.
4. Which command got failed
5. glusterd log file of all the nodes

~Atin


On 01/13/2015 07:48 AM, Punit Dambiwal wrote:
> Hi,
> 
> Please find the more details on this can anybody from gluster will help
> me here :-
> 
> 
> Gluster CLI Logs :- /var/log/glusterfs/cli.log
> 
> [2015-01-13 02:06:23.071969] T [cli.c:264:cli_rpc_notify] 0-glusterfs: got
> RPC_CLNT_CONNECT
> [2015-01-13 02:06:23.072012] T [cli-quotad-client.c:94:cli_quotad_notify]
> 0-glusterfs: got RPC_CLNT_CONNECT
> [2015-01-13 02:06:23.072024] I [socket.c:2344:socket_event_handler]
> 0-transport: disconnecting now
> [2015-01-13 02:06:23.072055] T [cli-quotad-client.c:100:cli_quotad_notify]
> 0-glusterfs: got RPC_CLNT_DISCONNECT
> [2015-01-13 02:06:23.072131] T [rpc-clnt.c:1381:rpc_clnt_record]
> 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2015-01-13 02:06:23.072176] T
> [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 128, payload: 64, rpc hdr: 64
> [2015-01-13 02:06:23.072572] T [socket.c:2863:socket_connect] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fed02f15420] (-->
> /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x7293)[0x7fed001a4293]
> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x468)[0x7fed0266df98] (-->
> /usr/sbin/gluster(cli_submit_request+0xdb)[0x40a9bb] (-->
> /usr/sbin/gluster(cli_cmd_submit+0x8e)[0x40b7be] ) 0-glusterfs: connect
> () called on transport already connected
> [2015-01-13 02:06:23.072616] T [rpc-clnt.c:1573:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2,
> Proc: 27) to rpc-transport (glusterfs)
> [2015-01-13 02:06:23.072633] D [rpc-clnt-ping.c:231:rpc_clnt_start_ping]
> 0-glusterfs: ping timeout is 0, returning
> [2015-01-13 02:06:23.075930] T [rpc-clnt.c:660:rpc_clnt_reply_init]
> 0-glusterfs: received rpc message (RPC XID: 0x1 Program: Gluster CLI,
> ProgVers: 2, Proc: 27) from rpc-transport (glusterfs)
> [2015-01-13 02:06:23.075976] D [cli-rpc-ops.c:6548:gf_cli_status_cbk]
> 0-cli: Received response to status cmd
> [2015-01-13 02:06:23.076025] D [cli-cmd.c:384:cli_cmd_submit] 0-cli:
> Returning 0
> [2015-01-13 02:06:23.076049] D [cli-rpc-ops.c:6811:gf_cli_status_volume]
> 0-cli: Returning: 0
> [2015-01-13 02:06:23.076192] D [cli-xml-output.c:84:cli_begin_xml_output]
> 0-cli: Returning 0
> [2015-01-13 02:06:23.076244] D [cli-xml-output.c:131:cli_xml_output_common]
> 0-cli: Returning 0
> [2015-01-13 02:06:23.076256] D
> [cli-xml-output.c:1375:cli_xml_output_vol_status_begin] 0-cli: Returning 0
> [2015-01-13 02:06:23.076437] D [cli-xml-output.c:108:cli_end_xml_output]
> 0-cli: Returning 0
> [2015-01-13 02:06:23.076459] D
> [cli-xml-output.c:1398:cli_xml_output_vol_status_end] 0-cli: Returning 0
> [2015-01-13 02:06:23.076490] I [input.c:36:cli_batch] 0-: Exiting with: 0
> 
> Command log :- /var/log/glusterfs/.cmd_log_history
> 
> Staging failed on ----. Please check log
> file for details.
> Staging failed on ----. Please check log
> file for details.
> [2015-01-13 01:10:35.836676]  : volume status all tasks : FAILED : Staging
> failed on ----. Please check log file for
> details.
> Staging failed on ----. Please check log
> file for details.
> Staging failed on ----. Please check log
> file for details.
> [2015-01-13 01:16:25.956514]  : volume status all tasks : FAILED : Staging
> failed on ----. Please check log file for
> details.
> Staging failed on ----. Please check log
> file for details.
> Staging failed on ----. Please check log
> file for details.
> [2015-01-13 01:17:36.977833]  : volume status all tasks : FAILED : Staging
> failed on ----. Please check log file for
> details.
> Staging failed on ----. Please check log
> file for details.
> Staging failed on ----. Please check log
> file for details.
> [2015-01-13 01:21:07.048053]  : volume status all tasks : FAILED : Staging
> failed on ----. Please check log file for
> details.
> Staging failed on ----. Please check log
> file for details.
> Staging failed on ----. Please check log
> file for details.
> [2015-01-13 01:26:57.168661]  : volume status all tasks : FAILED : Staging
> failed on ----. Please check log file for
> details.
> Staging failed on ----. Please check log
> file for details.
> Staging failed on --00

Re: [ovirt-users] [Gluster-users] Failed to create volume in OVirt with gluster

2015-01-14 Thread Atin Mukherjee


On 01/13/2015 12:12 PM, Punit Dambiwal wrote:
> Hi Atin,
> 
> Please find the output from here :- http://ur1.ca/jf4bs
> 
Looks like http://review.gluster.org/#/c/9269/ should solve this issue.
Please note this patch has not been taken in 3.6 release. Would you be
able to apply this patch on the source and re-test?

~Atin
> On Tue, Jan 13, 2015 at 12:37 PM, Atin Mukherjee 
> wrote:
> 
>> Punit,
>>
>> cli log wouldn't help much here. To debug this issue further can you
>> please let us know the following:
>>
>> 1. gluster peer status output
>> 2. gluster volume status output
>> 3. gluster --version output.
>> 4. Which command got failed
>> 5. glusterd log file of all the nodes
>>
>> ~Atin
>>
>>
>> On 01/13/2015 07:48 AM, Punit Dambiwal wrote:
>>> Hi,
>>>
>>> Please find the more details on this can anybody from gluster will
>> help
>>> me here :-
>>>
>>>
>>> Gluster CLI Logs :- /var/log/glusterfs/cli.log
>>>
>>> [2015-01-13 02:06:23.071969] T [cli.c:264:cli_rpc_notify] 0-glusterfs:
>> got
>>> RPC_CLNT_CONNECT
>>> [2015-01-13 02:06:23.072012] T [cli-quotad-client.c:94:cli_quotad_notify]
>>> 0-glusterfs: got RPC_CLNT_CONNECT
>>> [2015-01-13 02:06:23.072024] I [socket.c:2344:socket_event_handler]
>>> 0-transport: disconnecting now
>>> [2015-01-13 02:06:23.072055] T
>> [cli-quotad-client.c:100:cli_quotad_notify]
>>> 0-glusterfs: got RPC_CLNT_DISCONNECT
>>> [2015-01-13 02:06:23.072131] T [rpc-clnt.c:1381:rpc_clnt_record]
>>> 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:
>>> [2015-01-13 02:06:23.072176] T
>>> [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request
>> fraglen
>>> 128, payload: 64, rpc hdr: 64
>>> [2015-01-13 02:06:23.072572] T [socket.c:2863:socket_connect] (-->
>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fed02f15420]
>> (-->
>>>
>> /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x7293)[0x7fed001a4293]
>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x468)[0x7fed0266df98] (-->
>>> /usr/sbin/gluster(cli_submit_request+0xdb)[0x40a9bb] (-->
>>> /usr/sbin/gluster(cli_cmd_submit+0x8e)[0x40b7be] ) 0-glusterfs:
>> connect
>>> () called on transport already connected
>>> [2015-01-13 02:06:23.072616] T [rpc-clnt.c:1573:rpc_clnt_submit]
>>> 0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers:
>> 2,
>>> Proc: 27) to rpc-transport (glusterfs)
>>> [2015-01-13 02:06:23.072633] D [rpc-clnt-ping.c:231:rpc_clnt_start_ping]
>>> 0-glusterfs: ping timeout is 0, returning
>>> [2015-01-13 02:06:23.075930] T [rpc-clnt.c:660:rpc_clnt_reply_init]
>>> 0-glusterfs: received rpc message (RPC XID: 0x1 Program: Gluster CLI,
>>> ProgVers: 2, Proc: 27) from rpc-transport (glusterfs)
>>> [2015-01-13 02:06:23.075976] D [cli-rpc-ops.c:6548:gf_cli_status_cbk]
>>> 0-cli: Received response to status cmd
>>> [2015-01-13 02:06:23.076025] D [cli-cmd.c:384:cli_cmd_submit] 0-cli:
>>> Returning 0
>>> [2015-01-13 02:06:23.076049] D [cli-rpc-ops.c:6811:gf_cli_status_volume]
>>> 0-cli: Returning: 0
>>> [2015-01-13 02:06:23.076192] D [cli-xml-output.c:84:cli_begin_xml_output]
>>> 0-cli: Returning 0
>>> [2015-01-13 02:06:23.076244] D
>> [cli-xml-output.c:131:cli_xml_output_common]
>>> 0-cli: Returning 0
>>> [2015-01-13 02:06:23.076256] D
>>> [cli-xml-output.c:1375:cli_xml_output_vol_status_begin] 0-cli: Returning
>> 0
>>> [2015-01-13 02:06:23.076437] D [cli-xml-output.c:108:cli_end_xml_output]
>>> 0-cli: Returning 0
>>> [2015-01-13 02:06:23.076459] D
>>> [cli-xml-output.c:1398:cli_xml_output_vol_status_end] 0-cli: Returning 0
>>> [2015-01-13 02:06:23.076490] I [input.c:36:cli_batch] 0-: Exiting with: 0
>>>
>>> Command log :- /var/log/glusterfs/.cmd_log_history
>>>
>>> Staging failed on ----. Please check log
>>> file for details.
>>> Staging failed on ----. Please check log
>>> file for details.
>>> [2015-01-13 01:10:35.836676]  : volume status all tasks : FAILED :
>> Staging
>>> failed on ----. Please check log file for
>>> details.
>>> Staging failed on ----. Please check log
>>> file for details.
>>> Staging failed on ----. Please chec