Re: [ceph-users] Cephfs snapshot work
On Tue, Nov 7, 2017 at 3:28 PM, Dan van der Sterwrote: > On Tue, Nov 7, 2017 at 4:15 PM, John Spray wrote: >> On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Ster wrote: >>> On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: > My organization has a production cluster primarily used for cephfs > upgraded > from jewel to luminous. We would very much like to have snapshots on that > filesystem, but understand that there are risks. > > What kind of work could cephfs admins do to help the devs stabilize this > feature? If you have a disposable test system, then you could install the latest master branch of Ceph (which has a stream of snapshot fixes in it) and run a replica of your intended workload. If you can find snapshot bugs (especially crashes) on master then they will certainly attract interest. >>> >>> Hi John. Are those fixes in master expected to make snapshots stable >>> with rados namespaces (a la Manila) ? >> >> Was there a specific bug with snapshots+namespaces? I may be having a >> failure of memory or have not been paying enough attention :-) >> > > I might be misremembering, but I recall a mail from Greg? a few months > ago saying that snapshots would surely not work with multiple > namespaces. > Can't find the mail, though I think that was about not being able to safely use snapshots when multiple filesystems share the same data pool (and are separated by only namespaces). If you have a single filesystem using multiple namespaces within the same data pool then I don't think there's an issue. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Tue, Nov 7, 2017 at 4:15 PM, John Spraywrote: > On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Ster wrote: >> On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: >>> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: My organization has a production cluster primarily used for cephfs upgraded from jewel to luminous. We would very much like to have snapshots on that filesystem, but understand that there are risks. What kind of work could cephfs admins do to help the devs stabilize this feature? >>> >>> If you have a disposable test system, then you could install the >>> latest master branch of Ceph (which has a stream of snapshot fixes in >>> it) and run a replica of your intended workload. If you can find >>> snapshot bugs (especially crashes) on master then they will certainly >>> attract interest. >> >> Hi John. Are those fixes in master expected to make snapshots stable >> with rados namespaces (a la Manila) ? > > Was there a specific bug with snapshots+namespaces? I may be having a > failure of memory or have not been paying enough attention :-) > I might be misremembering, but I recall a mail from Greg? a few months ago saying that snapshots would surely not work with multiple namespaces. Can't find the mail, though -- dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Sterwrote: > On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: >> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >>> My organization has a production cluster primarily used for cephfs upgraded >>> from jewel to luminous. We would very much like to have snapshots on that >>> filesystem, but understand that there are risks. >>> >>> What kind of work could cephfs admins do to help the devs stabilize this >>> feature? >> >> If you have a disposable test system, then you could install the >> latest master branch of Ceph (which has a stream of snapshot fixes in >> it) and run a replica of your intended workload. If you can find >> snapshot bugs (especially crashes) on master then they will certainly >> attract interest. > > Hi John. Are those fixes in master expected to make snapshots stable > with rados namespaces (a la Manila) ? Was there a specific bug with snapshots+namespaces? I may be having a failure of memory or have not been paying enough attention :-) John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Tue, Nov 7, 2017 at 12:57 PM, John Spraywrote: > On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >> My organization has a production cluster primarily used for cephfs upgraded >> from jewel to luminous. We would very much like to have snapshots on that >> filesystem, but understand that there are risks. >> >> What kind of work could cephfs admins do to help the devs stabilize this >> feature? > > If you have a disposable test system, then you could install the > latest master branch of Ceph (which has a stream of snapshot fixes in > it) and run a replica of your intended workload. If you can find > snapshot bugs (especially crashes) on master then they will certainly > attract interest. Hi John. Are those fixes in master expected to make snapshots stable with rados namespaces (a la Manila) ? -- dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Tue, Nov 7, 2017 at 2:40 PM, Brady Deetzwrote: > Are there any existing fuzzing tools you'd recommend? I know about ceph osd > thrash, which could be tested against, but what about on the client side? I > could just use something pre-built for posix, but that wouldn't coordinate > simulated failures on the storage side with actions against the fs. If there > is not any current tooling for coordinating server and client simulation, > maybe that's where I start. We do have "thrasher" classes for randomly failing MDS daemons in the main test suite, but those will only be useful for you if you're working on automated tests that run inside that framework (https://github.com/ceph/ceph/blob/master/qa/tasks/mds_thrash.py) If you're working on a local cluster, then it's pretty simple (and useful) to write a small shell script that e.g. sleeps a few minutes, then does a "ceph mds fail" on a random rank. Something more sophisticated that would include client failures is a long standing wishlist item for the automated testing. John > > On Nov 7, 2017 5:57 AM, "John Spray" wrote: >> >> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >> > My organization has a production cluster primarily used for cephfs >> > upgraded >> > from jewel to luminous. We would very much like to have snapshots on >> > that >> > filesystem, but understand that there are risks. >> > >> > What kind of work could cephfs admins do to help the devs stabilize this >> > feature? >> >> If you have a disposable test system, then you could install the >> latest master branch of Ceph (which has a stream of snapshot fixes in >> it) and run a replica of your intended workload. If you can find >> snapshot bugs (especially crashes) on master then they will certainly >> attract interest. >> >> John >> >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
Are there any existing fuzzing tools you'd recommend? I know about ceph osd thrash, which could be tested against, but what about on the client side? I could just use something pre-built for posix, but that wouldn't coordinate simulated failures on the storage side with actions against the fs. If there is not any current tooling for coordinating server and client simulation, maybe that's where I start. On Nov 7, 2017 5:57 AM, "John Spray"wrote: > On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: > > My organization has a production cluster primarily used for cephfs > upgraded > > from jewel to luminous. We would very much like to have snapshots on that > > filesystem, but understand that there are risks. > > > > What kind of work could cephfs admins do to help the devs stabilize this > > feature? > > If you have a disposable test system, then you could install the > latest master branch of Ceph (which has a stream of snapshot fixes in > it) and run a replica of your intended workload. If you can find > snapshot bugs (especially crashes) on master then they will certainly > attract interest. > > John > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetzwrote: > My organization has a production cluster primarily used for cephfs upgraded > from jewel to luminous. We would very much like to have snapshots on that > filesystem, but understand that there are risks. > > What kind of work could cephfs admins do to help the devs stabilize this > feature? If you have a disposable test system, then you could install the latest master branch of Ceph (which has a stream of snapshot fixes in it) and run a replica of your intended workload. If you can find snapshot bugs (especially crashes) on master then they will certainly attract interest. John > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs snapshot work
On Sun, Nov 5, 2017 at 8:19 AM, Brady Deetzwrote: > My organization has a production cluster primarily used for cephfs upgraded > from jewel to luminous. We would very much like to have snapshots on that > filesystem, but understand that there are risks. > > What kind of work could cephfs admins do to help the devs stabilize this > feature? I would say from admin point of view have an equivalent smaller non production cluster with some decent workloads(or a subset of data can be copied to this cluster), you can then try the features you want to try on non production cluster, it will have smaller data set but would still help to characterize how it would perform and if you encounter issues you can use http://tracker.ceph.com/ to file for an issue and get some attention around it and know more. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com