On Thu, May 3, 2018 at 4:37 PM, Mark Betham <mark.bet...@googlemail.com> wrote:
> Hi Ovirt community, > > I am hoping you will be able to help with a problem I am experiencing when > trying to schedule a snapshot of my Gluster volumes using the Ovirt portal. > > Below is an overview of the environment; > > I have an Ovirt instance running which is managing our Gluster storage. > We are running Ovirt version "4.2.2.6-1.el7.centos", Gluster version > "glusterfs-3.13.2-2.el7" on a base OS of "CentOS Linux release 7.4.1708 > (Core)", Kernel "3.10.0 - 693.21.1.el7.x86_64", VDSM version > "vdsm-4.20.23-1.el7.centos". All of the versions of software are the > latest release and have been fully patched where necessary. > > Ovirt has been installed and configured in "Gluster" mode only, no > virtualisation. The Ovirt platform runs from one of the Gluster storage > nodes. > > Gluster runs with 2 clusters, each located at a different physical site > (UK and DE). Each of the storage clusters contain 3 storage nodes. Each > storage cluster contains a single gluster volume. The Gluster volume is 3 > * Replicated. The Gluster volume runs on top of a LVM thin vol which has > been provisioned with a XFS filesystem. The system is running a Geo-rep > between the 2 geo-diverse clusters. > > The host servers running at the primary site are of specification 1 * > Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz (8 core with HT), 64GB Ram, LSI > MegaRAID SAS 9271 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise > drives configured in a RAID 10 array to give 6.52TB of useable space. The > host servers running at the secondary site are of specification 1 * > Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz (8 core with HT), 32GB Ram, LSI > MegaRAID SAS 9260 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise > drives configured in a RAID 10 array to give 6.52TB of useable space. The > secondary site is for DR use only. > > When I first starting experiencing the issue and was unable to resolve it, > I carried out a full rebuild from scratch across the two storage clusters. > I had spent some time troubleshooting the issue but felt it worthwhile to > ensure I had a clean platform, void of any potential issues which may be > there due to some of the previous work carried out. The platform was > rebuilt and data re-ingested. It is probably worth mentioning that this > environment will become our new production platform, we will be migrating > data and services to this new platform from our existing Gluster storage > cluster. The date for the migration activity is getting closer so > available time has become an issue and will not permit another full rebuild > of the platform without impacting delivery date. > > After the rebuild with both storage clusters online, available and managed > within the Ovirt platform I conducted some basic commissioning checks and I > found no issues. The next step I took at this point was to setup the > Geo-replication. This was brought online with no issues and data was seen > to be synchronised without any problems. At this point the data > re-ingestion was started and the new data was synchronised by the > Geo-replication. > > The first step in bringing the snapshot schedule online was to validate > that snapshots could be taken outside of the scheduler. Taking a manual > snapshot via the OVirt portal worked without issue. Several were taken on > both primary and secondary clusters. At this point a schedule was created > on the primary site cluster via the Ovirt portal to create a snapshot of > the storage at hourly intervals. The schedule was created successfully > however no snapshots were ever created. Examining the logs did not show > anything which I believed was a direct result of the faulty schedule but it > is quite possible I missed something. > How was the schedule created - is this using the Remote Data Sync Setup under Storage domain? > I reviewed many online articles, bug reports and application manuals in > relation to snapshotting. There were several loosely related support > articles around snapshotting but none of the recommendations seemed to > work. I did the same with manuals and again nothing that seemed to work. > What I did find were several references to running snapshots along with > geo-replication and that the geo-replication should be paused when > creating. So I removed all existing references to any snapshot schedule, > paused the Geo-repl and recreated the snapshot schedule. The schedule was > never actioned and no snapshots were created. Removed Geo-repl entirely, > remove all schedules and carried out a reboot of the entire platform. When > the system was fully back online and no pending heal operations the > schedule was re-added for the primary site only. No difference in the > results and no snapshots were created from the schedule. > > I have now reached the point where I feel I require assistance and hence > this email request. > > If you require any further data then please let me know and I will do my > best to get it for you. > Could you please provide the engine.log from the time the schedule was setup and including the time the schedule was supposed to run? > Any help you can give would be greatly appreciated. > > Many thanks, > > Mark Betham > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org