Re: [ceph-users] v10.1.2 Jewel release candidate release
On Thu, Apr 14, 2016 at 6:32 AM, John Spray <jsp...@redhat.com> wrote: > On Thu, Apr 14, 2016 at 8:31 AM, Vincenzo Pii > <vincenzo@teralytics.ch> wrote: >> >> On 14 Apr 2016, at 00:09, Gregory Farnum <gfar...@redhat.com> wrote: >> >> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil <s...@redhat.com> wrote: >> >> Hi everyone, >> >> The third (and likely final) Jewel release candidate is out. We have a >> very small number of remaining blocker issues and a bit of final polish >> before we publish Jewel 10.2.0, probably next week. >> >> There are no known issues with this release that are serious enough to >> warn about here. Greg is adding some CephFS checks so that admins don't >> accidentally start using less-stable features, >> >> >> s/is adding/has added/ >> >>http://docs.ceph.com/docs/master/release-notes/ >> >> >> As noted in another thread, there's still a big CephFS warning in the >> online docs. We'll be cleaning those up, since we now have the >> recovery tools we desire! Some things are known to still be slow or >> sub-optimal, but we consider CephFS stable and safe at this time when >> run in the default single-MDS configuration. (It won't let you do >> anything bad without very explicitly setting flags and acknowledging >> they're dangerous.) >> :) >> -Greg >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> Hi Greg, >> >> A clarification: >> >> When you say that things will be safe in “single-MDS” configuration, do you >> also exclude the HA setup with one active MDS and some passive (standby) >> ones? Or this would be safe as well? > > Yes, having standbys is fine (including "standby replay" daemons). We > should really say "single active MDS configuration", but it's a bit of > a mouthful! master - hot standby(s) is okay multi-master is not supported I feel like this is terminology that's more familiar (to me) from other systems (eg. databases). > > John > >> >> Vincenzo Pii | TERALYTICS >> DevOps Engineer >> >> >> _______ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] O_DIRECT on deep-scrub read
On Thu, Oct 8, 2015 at 4:11 AM, Paweł Sadowski <c...@sadziu.pl> wrote: > > On 10/07/2015 10:52 PM, Sage Weil wrote: > > On Wed, 7 Oct 2015, David Zafman wrote: > >> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after > >> deep-scrub reads for objects not recently accessed by clients. > > Yeah, it's the 'except for stuff already in cache' part that we don't do > > (and the kernel doesn't give us a good interface for). IIRC there was a > > patch that guessed based on whether the obc was already in cache, which > > seems like a pretty decent heuristic, but I forget if that was in the > > final version. > > I've run some tests and it look like on XFS cache is discarded on > O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write. > I've found some patches to add support for "read only if in page cache" > (preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe > Milosz Tanski can tell more about that. I think it could help a bit > during deep scrub. After a fair amount of bike shedding on the API (and removing pwritev2) it looked like we (me and Christoph) had enough consensus to get it upstream. But sadly it died, akpm preferred different approach (fincore) and with enough roadblocks it died :/ > > > >> I see the NewStore objectstore sometimes using the O_DIRECT flag for > >> writes. > >> This concerns me because the open(2) man pages says: > >> > >> "Applications should avoid mixing O_DIRECT and normal I/O to the same file, > >> and especially to overlapping byte regions in the same file. Even when the > >> filesystem correctly handles the coherency issues in this situation, > >> overall > >> I/O throughput is likely to be slower than using either mode alone." > > Yeah: an O_DIRECT write will do a cache flush on the write range, so if > > there was already dirty data in cache you'll write twice. There's > > similarly an invalidate on read. I need to go back through the newstore > > code and see how the modes are being mixed and how it can be avoided... > > > > sage > > > > > >> On 10/7/15 7:50 AM, Sage Weil wrote: > >>> It's not, but it would not be ahrd to do this. There are fadvise-style > >>> hints being passed down that could trigger O_DIRECT reads in this case. > >>> That may not be the best choice, though--it won't use data that happens > >>> to be in cache and it'll also throw it out.. > >>> > >>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote: > >>> > >>>> Hi, > >>>> > >>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm > >>>> not able to verify that in source code. > >>>> > >>>> If not would it be possible to add such feature (maybe config option) to > >>>> help keeping Linux page cache in better shape? > >>>> > >>>> Thanks, > > -- > PS > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] O_DIRECT on deep-scrub read
On Wed, Oct 7, 2015 at 10:50 AM, Sage Weil <s...@newdream.net> wrote: > It's not, but it would not be ahrd to do this. There are fadvise-style > hints being passed down that could trigger O_DIRECT reads in this case. > That may not be the best choice, though--it won't use data that happens > to be in cache and it'll also throw it out.. > > On Wed, 7 Oct 2015, Pawe? Sadowski wrote: > >> Hi, >> >> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm >> not able to verify that in source code. >> >> If not would it be possible to add such feature (maybe config option) to >> help keeping Linux page cache in better shape? >> >> Thanks, When I was working on preadv2 somebody brought up a per operation O_DIRECT flag. There wasn't a clear use case at the time (outside of to saying Linus would "love that"). >> >> -- >> PS >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Discuss: New default recovery config settings
On Fri, May 29, 2015 at 5:47 PM, Samuel Just sj...@redhat.com wrote: Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam Sam I was thinking about this recently. We recently recently we ended up hitting a recovery story a scrub storm both happened at a time of high client activity. While changing the defaults down will make these kinds of disruptions less likely to occur, it also makes recovery (rebalancing) very slow. What I'd like to see What I would be happy to see is more of a QOS style tunable along the lines of networking traffic shaping. Where can guarantee a minimum amount of recovery load (and I say it in quotes since there's more the one resource involved) when the cluster is busy with client IO. Or vice versa there's a minimum amount of client IO that's guaranteed. Then when there's lower periods of client activity the recovery (and other background work) can proceed at full speed. Many workloads are cyclical or seasonal (in the statistics term of it, eg. intra/infra day seasonality). QOS style managment should lead to a more dynamic system where we can maximize available utilization, minimize disruptions, and not play wack-a-mole with many conf knobs. I'm aware that this is much harder to implement but thankfully there's a lot of literature, implementation and practical experience out there to draw upon. - Milosz -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
On Fri, Apr 24, 2015 at 12:38 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I have finished to rebuild ceph with jemalloc, all seem to working fine. I got a constant 300k iops for the moment, so no speed regression. I'll do more long benchmark next week. Regards, Alexandre In my experience jemalloc is much more proactive at returning memory to the OS, vs. tcmalloc in the default setting is much greedier with keeping/reusing memory. jemalloc tends to do better if you application benefits from a large page cache. Also, jemalloc's aggressive behavior is better if you're running a lot of applications per host because you're less likely to trigger a kernel dirty write out when allocating space (because you're not keeping large free cached around per application). Howard of Symas and LMDB fame did some benchmarking and comparison here: http://symas.com/mdb/inmem/malloc/ He came to somewhat similar conclusions. It would be helpful if you can reproduce the issue with tcmalloc... Turn on tcmalloc stats logging (every 1GB allocated or so), then compare the size to claimed by tcmalloc to process RSS size. If you can account for a large difference, esp. multipled times a number of OSD that may be the culprit. I know things have gotten better in tcmalloc. As in they fixed a few bugs where really large allocations were never returned to the OS and the turned down the default greediness. Sadly, distros have slow at picking these up in the past. If this is a problem it might be worth to have an option to build tcmalloc (using a version know to be good) into Ceph at build time. -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Establishing the Ceph Board
Patrick, I'm not sure where you are at with forming this. I would love to be considered. Previously I've contributed commits (mostly CephFS kernel side) and I'd love to contribute more there. Usually the hardest part about these kinds of things is finding the time to participate. Since Ceph is a project I believe in / care about it's success rest assured I can make the time for this to happen. I currently am the CTO of Adfin, but for the prepuces of this I rather have it unaffiliated / be a independent community member. Thanks, - Milosz On Fri, Mar 6, 2015 at 12:00 PM, Patrick McGarry pmcga...@redhat.com wrote: Hey Cephers, It looks like the governance work that has been going on for so long is (finally!) getting ready to come to fruition. If you would like more information feel free to check out the docs from CDS available at: http://wiki.ceph.com/Planning/Blueprints/Infernalis/Ceph_Governance What I need now is to start gathering the people/orgs that would be interested in participating in a Ceph Board. My goal is to involve the founding members in finalizing the documents and procedures so that it isn't just arbitrary stuff made up by one monkey. If you are interested in being a part of the Ceph Board please email me the following information: Name Organization/Affiliation Resources you might contribute (money, time, developers, your own commits, etc) That should be enough to start a dialogue. My goal here is to get a small group of people that can help refine the governance docs, build the board, and ensure the longterm health of Ceph. As always if you have any questions, please don't hesitate to contact me. Thanks! -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee monthly meeting #1 : executive summary
Loic, The writeup has been helpful. What I'm curious about (and hasn't been mentioned) is can we use erasure with CephFS? What steps have to be taken in order to setup erasure coding for CephFS? In our case we'd like to take advantage of the savings since a large chunk of our data is written once, read many times, after a while seldom accessed. It looks like by default MDS alredy sets default pools for data and metadata. So I'm guessing this requires some preparation in advance. Best, - Milosz On Fri, Apr 4, 2014 at 12:34 PM, Loic Dachary l...@dachary.org wrote: Hi Ceph, This month Ceph User Committee meeting was about: Tiering, erasure code Using http://tracker.ceph.com/ CephFS Miscellaneous You will find an executive summary at: https://wiki.ceph.com/Community/Meetings/Ceph_User_Committee_meeting_2014-04-03 The full log of the IRC conversation is also included to provide more context when needed. Feel free to edit if you see a mistake. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] firefly timing
I think it's good now, explicit (and detailed). On Tue, Mar 18, 2014 at 4:12 PM, Sage Weil s...@inktank.com wrote: On Tue, 18 Mar 2014, Sage Weil wrote: On Tue, 18 Mar 2014, Milosz Tanski wrote: Is this statement in the documentation still valid: Stale data is expired from the cache pools based on some as-yet undetermined policy. As that sounds a bit scary. I'll update the docs :). The policy is pretty simply but not described anywhere yet. I've updated the doc; please let me know what is/isn't clear so we can make sure the final doc is useful. John, we need to figure out where this is going to fit in the overall IA... sage sage - Milosz On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil s...@inktank.com wrote: On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote: Hi Sage, i really would like to test the tiering. Is there any detailed documentation about it and how it works? Great! Here is a quick synopiss on how to set it up: http://ceph.com/docs/master/dev/cache-pool/ sage Greets, Stefan Am 18.03.2014 05:45, schrieb Sage Weil: Hi everyone, It's taken longer than expected, but the tests for v0.78 are calming down and it looks like we'll be able to get the release out this week. However, we've decided NOT to make this release firefly. It will be a normal development release. This will be the first release that includes some key new functionality (erasure coding and cache tiering) and although it is passing our tests we'd like to have some operational experience with it in more users' hands before we commit to supporting it long term. The tentative plan is to freeze and then release v0.79 after a normal two week cycle. This will serve as a 'release candidate' that shaves off a few rough edges from the pending release (including some improvements with the API for setting up erasure coded pools). It is possible that 0.79 will turn into firefly, but more likely that we will opt for another two weeks of hardening and make 0.80 the release we name firefly and maintain for the long term. Long story short: 0.78 will be out soon, and you should test it! It is will vary from the final firefly in a few subtle ways, but any feedback or usability and bug reports at this point will be very helpful in shaping things. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: mil...@adfin.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com