[ovirt-users] Re: Scheduling a Snapshot of a Gluster volume not working within Ovirt

Sahina Bose Thu, 10 May 2018 23:35:38 -0700

On Thu, May 3, 2018 at 4:37 PM, Mark Betham <[email protected]>
wrote:


> Hi Ovirt community,
>
> I am hoping you will be able to help with a problem I am experiencing when
> trying to schedule a snapshot of my Gluster volumes using the Ovirt portal.
>
> Below is an overview of the environment;
>
> I have an Ovirt instance running which is managing our Gluster storage.
> We are running Ovirt version "4.2.2.6-1.el7.centos", Gluster version
> "glusterfs-3.13.2-2.el7" on a base OS of "CentOS Linux release 7.4.1708
> (Core)", Kernel "3.10.0 - 693.21.1.el7.x86_64", VDSM version
> "vdsm-4.20.23-1.el7.centos".  All of the versions of software are the
> latest release and have been fully patched where necessary.
>
> Ovirt has been installed and configured in "Gluster" mode only, no
> virtualisation.  The Ovirt platform runs from one of the Gluster storage
> nodes.
>
> Gluster runs with 2 clusters, each located at a different physical site
> (UK and DE).  Each of the storage clusters contain 3 storage nodes.  Each
> storage cluster contains a single  gluster volume.  The Gluster volume is 3
> * Replicated.  The Gluster volume runs on top of a LVM thin vol which has
> been provisioned with a XFS filesystem.  The system is running a Geo-rep
> between the 2 geo-diverse clusters.
>
> The host servers running at the primary site are of specification 1 *
> Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz (8 core with HT), 64GB Ram, LSI
> MegaRAID SAS 9271 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise
> drives configured in a RAID 10 array to give 6.52TB of useable space.  The
> host servers running at the secondary site are of specification 1 *
> Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz (8 core with HT), 32GB Ram, LSI
> MegaRAID SAS 9260 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise
> drives configured in a RAID 10 array to give 6.52TB of useable space.  The
> secondary site is for DR use only.
>
> When I first starting experiencing the issue and was unable to resolve it,
> I carried out a full rebuild from scratch across the two storage clusters.
> I had spent some time troubleshooting the issue but felt it worthwhile to
> ensure I had a clean platform, void of any potential issues which may be
> there due to some of the previous work carried out.  The platform was
> rebuilt and data re-ingested.  It is probably worth mentioning that this
> environment will become our new production platform, we will be migrating
> data and services to this new platform from our existing Gluster storage
> cluster.  The date for the migration activity is getting closer so
> available time has become an issue and will not permit another full rebuild
> of the platform without impacting delivery date.
>
> After the rebuild with both storage clusters online, available and managed
> within the Ovirt platform I conducted some basic commissioning checks and I
> found no issues.  The next step I took at this point was to setup the
> Geo-replication.  This was brought online with no issues and data was seen
> to be synchronised without any problems.  At this point the data
> re-ingestion was started and the new data was synchronised by the
> Geo-replication.
>
> The first step in bringing the snapshot schedule online was to validate
> that snapshots could be taken outside of the scheduler.  Taking a manual
> snapshot via the OVirt portal worked without issue.  Several were taken on
> both primary and secondary clusters.  At this point a schedule was created
> on the primary site cluster via the Ovirt portal to create a snapshot of
> the storage at hourly intervals.  The schedule was created successfully
> however no snapshots were ever created.  Examining the logs did not show
> anything which I believed was a direct result of the faulty schedule but it
> is quite possible I missed something.
>

How was the schedule created - is this using the Remote Data Sync Setup
under Storage domain?


> I reviewed many online articles, bug reports and application manuals in
> relation to snapshotting.  There were several loosely related support
> articles around snapshotting but none of the recommendations seemed to
> work.  I did the same with manuals and again nothing that seemed to work.
> What I did find were several references to running snapshots along with
> geo-replication and that the geo-replication should be paused when
> creating.  So I removed all existing references to any snapshot schedule,
> paused the Geo-repl and recreated the snapshot schedule.  The schedule was
> never actioned and no snapshots were created.  Removed Geo-repl entirely,
> remove all schedules and carried out a reboot of the entire platform.  When
> the system was fully back online and no pending heal operations the
> schedule was re-added for the primary site only.  No difference in the
> results and no snapshots were created from the schedule.
>
> I have now reached the point where I feel I require assistance and hence
> this email request.
>
> If you require any further data then please let me know and I will do my
> best to get it for you.
>

Could you please provide the engine.log from the time the schedule was
setup and including the time the schedule was supposed to run?



> Any help you can give would be greatly appreciated.
>
> Many thanks,
>
> Mark Betham
>
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ovirt.org/mailman/listinfo/users
>
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ovirt-users] Re: Scheduling a Snapshot of a Gluster volume not working within Ovirt

Reply via email to