Re: infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting
Sorry,my fault, I had an old --without-lttng flag in my build packages. - Mail original - De: "aderumier"À: "ceph-devel" Envoyé: Mardi 10 Novembre 2015 15:06:19 Objet: infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting Hi, I'm trying to build infernalis packages on debian jessie, and I have this error on package build dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting I think it's related to lltng change from here https://github.com/ceph/ceph/pull/6135 Maybe is it missing an option in debian rules to generate libos_tp.so ? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
data-at-rest compression
Hi All, a while ago we had some conversations here about adding compression support for EC pools. Here is corresponding pull request implementing this feature: https://github.com/ceph/ceph/pull/6524/commits Appropriate blueprint is at: http://tracker.ceph.com/projects/ceph/wiki/Rados_-_at-rest_compression All comments and reviews are highly appreciated. Thanks, Igor. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Backlog for the Ceph tracker
Hi Sam, I crafted a custom query that could be used as a replacement for the backlog plugin http://tracker.ceph.com/projects/ceph/issues?query_id=86 It displays issues that are features or tasks, grouped by target version and ordered by priority. I also created a v10.0.0 version so we can assign features we want for this next version to it. If you feel that's not good enough, we can just throw it away, it's merely a proposal ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
why ShardedWQ in osd using smart pointer for PG?
hi, all: op_wq is declared as ShardedThreadPool::ShardedWQ < pair> _wq. I do not know why we should use PGRef in this? Because the overhead of the smart pointer is not small. Maybe the raw point PG* is also OK? If op_wq is changed to ShardedThreadPool::ShardedWQ < pair > _wq (using raw point) the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us Is this make sense to you? -- Regards, xinze -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backlog for the Ceph tracker
But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-) On 10/11/2015 16:28, Loic Dachary wrote: > Hi Sam, > > I crafted a custom query that could be used as a replacement for the backlog > plugin > >http://tracker.ceph.com/projects/ceph/issues?query_id=86 > > It displays issues that are features or tasks, grouped by target version and > ordered by priority. > > I also created a v10.0.0 version so we can assign features we want for this > next version to it. > > If you feel that's not good enough, we can just throw it away, it's merely a > proposal ;-) > > Cheers > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Backlog for the Ceph tracker
On 10/11/2015 16:34, Loic Dachary wrote: > But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-) It appears to be a crippled version of a proprietary product http://www.redminecrm.com/projects/agile/pages/last My vote would be to de-install it since it is even less flexible to use than the custom query below. It is disapointing to loose a plugin because it is no longer maintained, but that's not something we can always forsee. IMHO, relying on a proprietary redmine plugin is not a safe bet and it would be wise to not become dependent on it. Cheers > On 10/11/2015 16:28, Loic Dachary wrote: >> Hi Sam, >> >> I crafted a custom query that could be used as a replacement for the backlog >> plugin >> >>http://tracker.ceph.com/projects/ceph/issues?query_id=86 >> >> It displays issues that are features or tasks, grouped by target version and >> ordered by priority. >> >> I also created a v10.0.0 version so we can assign features we want for this >> next version to it. >> >> If you feel that's not good enough, we can just throw it away, it's merely a >> proposal ;-) >> >> Cheers >> > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: why ShardedWQ in osd using smart pointer for PG?
On Tue, Nov 10, 2015 at 7:19 AM, 池信泽wrote: > hi, all: > > op_wq is declared as ShardedThreadPool::ShardedWQ < pair OpRequestRef> > _wq. I do not know why we should use PGRef in this? > > Because the overhead of the smart pointer is not small. Maybe the > raw point PG* is also OK? > > If op_wq is changed to ShardedThreadPool::ShardedWQ < pair OpRequestRef> > _wq (using raw point) > > the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us > > the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us > > Is this make sense to you? In general we use PGRefs rather than PG pointers. I think we actually rely on the references here to keep the PG from going out of scope at an inopportune time, but if it halves the cost of queuing actions it might be worth the effort of avoiding that. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Preparing infernalis v9.2.1
Hi Abhishek, I created the issue to track the progress of infernalis v9.2.1 at http://tracker.ceph.com/issues/13750 and assigned it to you. There are a dozen issues waiting to be backported and another dozen waiting to be tested in an integration branch. Good luck with driving your first point release :-) Enjoy Diwali ! -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: [ceph-users] Permanent MDS restarting under load
On Tue, Nov 10, 2015 at 6:32 AM, Oleksandr Natalenkowrote: > Hello. > > We have CephFS deployed over Ceph cluster (0.94.5). > > We experience constant MDS restarting under high IOPS workload (e.g. > rsyncing lots of small mailboxes from another storage to CephFS using > ceph-fuse client). First, cluster health goes to HEALTH_WARN state with the > following disclaimer: > > === > mds0: Behind on trimming (321/30) > === > > Also, slow requests start to appear: > > === > 2 requests are blocked > 32 sec > === Which requests are they? Are these MDS operations or OSD ones? > > Then, after a while, one of MDSes fails with the following log: > > === > лис 10 16:07:41 baikal bash[10122]: 2015-11-10 16:07:41.915540 7f2484f13700 > -1 MDSIOContextBase: blacklisted! Restarting... > лис 10 16:07:41 baikal bash[10122]: starting mds.baikal at :/0 > лис 10 16:07:42 baikal bash[10122]: 2015-11-10 16:07:42.003189 7f82b477e7c0 > -1 mds.-1.0 log_to_monitors {default=true} > === So that "blacklisted" means that the monitors decided the MDS was nonresponsive, failed over to another daemon, and blocked this one off from the cluster. > I guess writing lots of small files bloats MDS log, and MDS doesn't catch > trimming in time. That's why it is marked as failed and is replaced by > standby MDS. We tried to limit mds_log_max_events to 30 events, but that > caused MDS to fail very quickly with the following stacktrace: > > === > Stacktrace: https://gist.github.com/4c8a89682e81b0049f3e > === > > Is that normal situation, or one could rate-limit client requests? May be > there should be additional knobs to tune CephFS for handling such a > workload? Yeah, the MDS doesn't really do a good job back-pressuring clients right now when it or the OSDs aren't keeping up with the workload. That's something we need to work on once fsck stuff is behaving. rsync is also (sadly) a workload that frequently exposes these problems, but I'm not used to seeing the MDS daemon get stuck quite that quickly. How frequently is it actually getting swapped? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
non-fast-forward merges prevented for some branches in GitHub
GitHub.com now has an option in its UI for users to "protect" certain branches. I've enabled the "Disable force-pushes to this branch and prevent it from being deleted" setting for the following repos and branches: ceph.git and ceph-qa-suite.git: - "master" - "jewel" - "infernalis" - "hammer" - "firefly" ceph-deploy.git and teuthology.git: - "master" If we ever have to force-push in an emergency we can disable this in GitHub's UI, eg https://github.com/ceph/ceph/settings/branches . Otherwise, in normal operation this will prevent certain branches from going backwards in time by accident. - Ken -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why ShardedWQ in osd using smart pointer for PG?
I wonder if we want to keep the PG from going out of scope at an inopportune time, why snap_trim_queue and scrub_queue declared as xlistinstead of xlist? 2015-11-11 2:28 GMT+08:00 Gregory Farnum : > On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote: >> hi, all: >> >> op_wq is declared as ShardedThreadPool::ShardedWQ < pair > OpRequestRef> > _wq. I do not know why we should use PGRef in this? >> >> Because the overhead of the smart pointer is not small. Maybe the >> raw point PG* is also OK? >> >> If op_wq is changed to ShardedThreadPool::ShardedWQ < pair > OpRequestRef> > _wq (using raw point) >> >> the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us >> >> the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us >> >> Is this make sense to you? > > In general we use PGRefs rather than PG pointers. I think we actually > rely on the references here to keep the PG from going out of scope at > an inopportune time, but if it halves the cost of queuing actions it > might be worth the effort of avoiding that. > -Greg -- Regards, xinze -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why ShardedWQ in osd using smart pointer for PG?
The xlist has means of efficiently removing entries from a list. I think you'll find those in the path where we start tearing down a PG, and membership on this list is a bit different from membership in the ShardedThreadPool. It's all about the particulars of each design, and I don't have that in my head — you'd need to examine it. -Greg On Tue, Nov 10, 2015 at 4:20 PM, 池信泽wrote: > I wonder if we want to keep the PG from going out of scope at an > inopportune time, why snap_trim_queue and scrub_queue declared as > xlist instead of xlist? > > 2015-11-11 2:28 GMT+08:00 Gregory Farnum : >> On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote: >>> hi, all: >>> >>> op_wq is declared as ShardedThreadPool::ShardedWQ < pair >> OpRequestRef> > _wq. I do not know why we should use PGRef in this? >>> >>> Because the overhead of the smart pointer is not small. Maybe the >>> raw point PG* is also OK? >>> >>> If op_wq is changed to ShardedThreadPool::ShardedWQ < pair >> OpRequestRef> > _wq (using raw point) >>> >>> the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> >>> 1.89us >>> >>> the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> >>> 1.65us >>> >>> Is this make sense to you? >> >> In general we use PGRefs rather than PG pointers. I think we actually >> rely on the references here to keep the PG from going out of scope at >> an inopportune time, but if it halves the cost of queuing actions it >> might be worth the effort of avoiding that. >> -Greg > > > > -- > Regards, > xinze -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] v9.2.0 Infernalis released
On Sun, Nov 8, 2015 at 10:41 PM, Alexandre DERUMIERwrote: > Hi, > > debian repository seem to miss librbd1 package for debian jessie > > http://download.ceph.com/debian-infernalis/pool/main/c/ceph/ > > (ubuntu trusty librbd1 is present) This is now fixed and should be now available. > > > - Mail original - > De: "Sage Weil" > À: ceph-annou...@ceph.com, "ceph-devel" , > "ceph-users" , ceph-maintain...@ceph.com > Envoyé: Vendredi 6 Novembre 2015 23:05:54 > Objet: [ceph-users] v9.2.0 Infernalis released > > [I'm going to break my own rule and do this on a Friday only because this > has been built and in the repos for a couple of days now; I've just been > traveling and haven't had time to announce it.] > > This major release will be the foundation for the next stable series. > There have been some major changes since v0.94.x Hammer, and the > upgrade process is non-trivial. Please read these release notes carefully. > > Major Changes from Hammer > - > > - General: > > * Ceph daemons are now managed via systemd (with the exception of > Ubuntu Trusty, which still uses upstart). > * Ceph daemons run as 'ceph' user instead root. > * On Red Hat distros, there is also an SELinux policy. > > - RADOS: > > * The RADOS cache tier can now proxy write operations to the base > tier, allowing writes to be handled without forcing migration of > an object into the cache. > * The SHEC erasure coding support is no longer flagged as > experimental. SHEC trades some additional storage space for faster > repair. > * There is now a unified queue (and thus prioritization) of client > IO, recovery, scrubbing, and snapshot trimming. > * There have been many improvements to low-level repair tooling > (ceph-objectstore-tool). > * The internal ObjectStore API has been significantly cleaned up in order > to faciliate new storage backends like NewStore. > > - RGW: > > * The Swift API now supports object expiration. > * There are many Swift API compatibility improvements. > > - RBD: > > * The ``rbd du`` command shows actual usage (quickly, when > object-map is enabled). > * The object-map feature has seen many stability improvements. > * Object-map and exclusive-lock features can be enabled or disabled > dynamically. > * You can now store user metadata and set persistent librbd options > associated with individual images. > * The new deep-flatten features allows flattening of a clone and all > of its snapshots. (Previously snapshots could not be flattened.) > * The export-diff command command is now faster (it uses aio). There is also > a new fast-diff feature. > * The --size argument can be specified with a suffix for units > (e.g., ``--size 64G``). > * There is a new ``rbd status`` command that, for now, shows who has > the image open/mapped. > > - CephFS: > > * You can now rename snapshots. > * There have been ongoing improvements around administration, diagnostics, > and the check and repair tools. > * The caching and revocation of client cache state due to unused > inodes has been dramatically improved. > * The ceph-fuse client behaves better on 32-bit hosts. > > Distro compatibility > > > We have decided to drop support for many older distributions so that we can > move to a newer compiler toolchain (e.g., C++11). Although it is still > possible > to build Ceph on older distributions by installing backported development > tools, > we are not building and publishing release packages for ceph.com. > > We now build packages for: > > * CentOS 7 or later. We have dropped support for CentOS 6 (and other > RHEL 6 derivatives, like Scientific Linux 6). > * Debian Jessie 8.x or later. Debian Wheezy 7.x's g++ has incomplete > support for C++11 (and no systemd). > * Ubuntu Trusty 14.04 or later. Ubuntu Precise 12.04 is no longer > supported. > * Fedora 22 or later. > > Upgrading from Firefly > -- > > Upgrading directly from Firefly v0.80.z is not recommended. It is > possible to do a direct upgrade, but not without downtime. We > recommend that clusters are first upgraded to Hammer v0.94.4 or a > later v0.94.z release; only then is it possible to upgrade to > Infernalis 9.2.z for an online upgrade (see below). > > To do an offline upgrade directly from Firefly, all Firefly OSDs must > be stopped and marked down before any Infernalis OSDs will be allowed > to start up. This fencing is enforced by the Infernalis monitor, so > use an upgrade procedure like: > > 1. Upgrade Ceph on monitor hosts > 2. Restart all ceph-mon daemons > 3. Upgrade Ceph on all OSD hosts > 4. Stop all ceph-osd daemons > 5. Mark all OSDs down with something like:: > ceph osd down `seq 0 1000` > 6. Start all ceph-osd daemons > 7. Upgrade and restart remaining daemons (ceph-mds, radosgw) > > Upgrading from Hammer > - > > * For all distributions that support systemd (CentOS 7, Fedora, Debian
[CEPH][Crush][Tunables] issue when updating tunables
Hi all, Context: Firefly 0.80.9 Ubuntu 14.04.1 Almost a production platform in an openstack environment 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 rooms, 3 monitors on openstack controllers Usage: Rados Gateway for object service and RBD as back-end for Cinder and Glance The Ceph cluster was installed by Mirantis procedures (puppet/fuel/ceph-deploy): I noticed that tunables were curiously set. ceph osd crush show-tunables ==> { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "straw_calc_version": 1, "profile": "unknown", "optimal_tunables": 0, "legacy_tunables": 0, "require_feature_tunables": 1, "require_feature_tunables2": 1, "require_feature_tunables3": 1, "has_v2_rules": 0, "has_v3_rules": 0} I tried to update them ceph osd crush tunables optimal ==> adjusted tunables profile to optimal But when checking ceph osd crush show-tunables ==> { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "straw_calc_version": 1, "profile": "unknown", "optimal_tunables": 0, "legacy_tunables": 0, "require_feature_tunables": 1, "require_feature_tunables2": 1, "require_feature_tunables3": 1, "has_v2_rules": 0, "has_v3_rules": 0} Nothing has changed. I finally did ceph osd crush set-tunable straw_calc_version 0 and ceph osd crush show-tunables ==> { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "straw_calc_version": 0, "profile": "firefly", "optimal_tunables": 1, "legacy_tunables": 0, "require_feature_tunables": 1, "require_feature_tunables2": 1, "require_feature_tunables3": 1, "has_v2_rules": 0, "has_v3_rules": 0} It's OK My question: Does the "ceph osd crush tunables " command change all the requested parameters in order to set the tunables to the right profile? Brgds _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorization. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange shall not be liable if this message was modified, changed or falsified. Thank you. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to modify affiliation?
Hi, You can submit a patch to https://github.com/ceph/ceph/blob/master/.organizationmap Cheers On 10/11/2015 09:21, chen kael wrote: > Hi,ceph-dev > who can tell me how to modify my affiliation? > Thanks! > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: [CEPH][Crush][Tunables] issue when updating tunables
On Tue, 10 Nov 2015, ghislain.cheval...@orange.com wrote: > Hi all, > > Context: > Firefly 0.80.9 > Ubuntu 14.04.1 > Almost a production platform in an openstack environment > 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 > rooms, 3 monitors on openstack controllers > Usage: Rados Gateway for object service and RBD as back-end for Cinder and > Glance > > The Ceph cluster was installed by Mirantis procedures > (puppet/fuel/ceph-deploy): > > I noticed that tunables were curiously set. > ceph osd crush show-tunables ==> > { "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "straw_calc_version": 1, > "profile": "unknown", > "optimal_tunables": 0, > "legacy_tunables": 0, > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "require_feature_tunables3": 1, > "has_v2_rules": 0, > "has_v3_rules": 0} > > I tried to update them > ceph osd crush tunables optimal ==> > adjusted tunables profile to optimal > > But when checking > ceph osd crush show-tunables ==> > { "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "straw_calc_version": 1, > "profile": "unknown", > "optimal_tunables": 0, > "legacy_tunables": 0, > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "require_feature_tunables3": 1, > "has_v2_rules": 0, > "has_v3_rules": 0} > > Nothing has changed. > > I finally did > ceph osd crush set-tunable straw_calc_version 0 You actually want straw_calc_version 1. This is just confusing output from the 'firefly' tunable detection... the straw_calc_version does not have any client dependencies. sage > > and > ceph osd crush show-tunables ==> > { "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "straw_calc_version": 0, > "profile": "firefly", > "optimal_tunables": 1, > "legacy_tunables": 0, > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "require_feature_tunables3": 1, > "has_v2_rules": 0, > "has_v3_rules": 0} > > It's OK > > My question: > Does the "ceph osd crush tunables " command change all the requested > parameters in order to set the tunables to the right profile? > > Brgds > > _ > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > France Telecom - Orange decline toute responsabilite si ce message a ete > altere, deforme ou falsifie. Merci > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorization. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, France Telecom - Orange shall not be liable if this > message was modified, changed or falsified. > Thank you. > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to modify affiliation?
Hi,ceph-dev who can tell me how to modify my affiliation? Thanks! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NIC with Erasure offload feature support and Ceph
03-Nov-15 18:07, Gregory Farnum пишет: On Tue, Nov 3, 2015 at 3:15 AM, Mikewrote: Hello! In our project we planing build a petabayte cluster with Erasure pool. Also we looking on Mellanox ConnectX-4 Lx EN Cards/ConnectX-4 EN Cards for using its a offloading erasure code feature. Someone use this feature in test lab/prodaction? Nope. Ceph's erasure coding is very configurable (in terms of what kind of EC it's doing) but the offload features in NICs that we've seen aren't quite flexible enough for what Ceph is doing — it's an unusual use case and set of requirements where these offload cards are concerned. (We need to take an incoming stream, look at the raw stream, then erasure code it into an unknown set of pieces, and then send those pieces back out over the network to different addresses.) -Greg Thanks for reply. Mellanox said that offload Erasure Code (RedSolomon algorithm) feature support on NICs will be released around April 2016. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
why keep and update rollback info for ReplicatedPG?
Hi, all As I know, rollback is designed for ec-backend to rollback the partial committed transaction like append, stash and attrs. So why do we need to keep and update (can_rollback_to, rollback_info_trimmed_to) every time in _write_log() for ReplicatedBackend? Or it is related to other issues? We may avoid frequently updating those information based on the pool types? Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: a home for backport snippets
Hi, The new snippets home is at https://pypi.python.org/pypi/ceph-workbench and http://ceph-workbench.dachary.org/root/ceph-workbench. The first snippet was merged by Nathan yesterday[1], the backport documentation updated accordingly[2], and I used it after merging half a dozen hammer backport that were approved a few days ago. Integration tests should provide the best help against regression we can hope for (they spawn a redmine instance every time they run and use a dedicated github user to create and destroy projects, pull requests etc.) and they are run on every merge request[3]. When integrated in ceph-workbench, the snippet is documented[4] and the implementation[5] is tested in full[6]. The merits of 100% coverage are often disputed as overkill. IMHO it's better to remove an untested line of code rather than taking the chance that it grows into something that does not work (or possibly never worked). In the case of this snippet, there is a dozen of safe guards and four lines of code to modify the issue. It would be bad to discover, after modifying hundreds of issues in the Ceph tracker, that it never worked as expected. I'm sure we'll find ways to *not* do the right thing even with integration tests. But we'll hopefully do the right thing more often ;-) I'm not sure how much time it will take us to convert all the snippets we have, but it does not matter much as we can keep doing things manually in the meantime. Cheers P.S. We are using a GitLab instance, with an integrated CI, instead of github with a CI on jenkins.ceph.com roughly for the same reasons puppet-ceph is in https://github.com/openstack/puppet-ceph and uses the OpenStack gates. We have no expertise on jenkins-job-builder[7] and the learning curve is perceived as significantly higher than a GitLab with an integrated CI[8]. We also want to share administrative permissions on the CI with all members of the stable release team to share the maintenance workload. [1] backport-set-release http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8 [2] Resolving an issue http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_merge_commits_from_the_integration_branch#Resolving-the-matching-issue [3] Continuous integration http://ceph-workbench.dachary.org/dachary/ceph-workbench/builds/53 [4] Documentation http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#9f3ebf1fc38506b66593397f3baac514d515c496_73_75 [5] Implementation http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#070f4537c6cef8a2dacef1911a7d39acd0ce1387_0_75 [6] Testing http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#66bd83c5111f0ccc884ad791c4acaa926ab52c2a_0_64 [7] Jenkins Job Builder http://docs.openstack.org/infra/jenkins-job-builder/ [8] Configuration of your builds with .gitlab-ci.yml http://doc.gitlab.com/ci/yaml/README.html On 05/11/2015 14:20, Loic Dachary wrote: > Hi, > > Today, Nathan and I briefly discussed the idea of collecting the backport > snippets that are archived in the wiki at > http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO. We all have copies > on our local disks and although they don't diverge much, this is not very > sustainable. It was really good as we established the backport workflows. And > it would have been immensely painful to maintain a proper software while we > were changing the workflow on a regular basis. But it looks like we now have > something stable. > > Early this year ceph-workbench[1] was started with the idea of helping with > backports. It is a mostly empty shell we can now use to collect all the > snippets we have. Instead of adding set-release[2] to the script directory of > Ceph, it would be a subcommand of ceph-workbench, like so: > > ceph-workbench set-release --token $github_token --key $redmine_key > > What do you think ? > > Cheers > > [1] https://pypi.python.org/pypi/ceph-workbench > [2] https://github.com/ceph/ceph/pull/6466 > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: [ceph-users] Permanent MDS restarting under load
10.11.2015 22:38, Gregory Farnum wrote: Which requests are they? Are these MDS operations or OSD ones? Those requests appeared in ceph -w output and are the follows: https://gist.github.com/5045336f6fb7d532138f Is that correct that there are OSD operations blocked? osd.3 is one of data pool HDDs, and other OSDs also appear in slow requests warning besides osd.3 as well. I guess that may be related to replica 4 setup of our cluster and only 5 OSDs for each host. But we plan to add 6 more OSDs to each host after data migration is finished. Could that help in spreading load? So that "blacklisted" means that the monitors decided the MDS was nonresponsive, failed over to another daemon, and blocked this one off from the cluster. So, one could adjust blacklist timeout, but there is no way to rate-limit requests? Am I correct? Yeah, the MDS doesn't really do a good job back-pressuring clients right now when it or the OSDs aren't keeping up with the workload. That's something we need to work on once fsck stuff is behaving. rsync is also (sadly) a workload that frequently exposes these problems, but I'm not used to seeing the MDS daemon get stuck quite that quickly. How frequently is it actually getting swapped? Quite often. MDSes are swapped once per 1 minute or so under heavy load: === лис 10 10:40:47 data.la.net.ua bash[18112]: 2015-11-10 10:40:47.357633 7f76c42e2700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:41:49 data.la.net.ua bash[18112]: 2015-11-10 10:41:49.237962 7f1a939af700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:43:14 data.la.net.ua bash[18112]: 2015-11-10 10:43:14.899375 7f17f6eaa700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:44:11 data.la.net.ua bash[18112]: 2015-11-10 10:44:11.810116 7f693b64c700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:45:14 data.la.net.ua bash[18112]: 2015-11-10 10:45:14.761684 7f7616097700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:46:35 data.la.net.ua bash[18112]: 2015-11-10 10:46:35.927190 7fdfb7f62700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:47:41 data.la.net.ua bash[18112]: 2015-11-10 10:47:41.888064 7fb88139b700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:49:57 data.la.net.ua bash[18112]: 2015-11-10 10:49:57.542545 7fbb360eb700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:51:02 data.la.net.ua bash[18112]: 2015-11-10 10:51:02.486907 7fb488fa1700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:52:03 data.la.net.ua bash[18112]: 2015-11-10 10:52:03.871463 7f4cc0236700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:53:20 data.la.net.ua bash[18112]: 2015-11-10 10:53:20.290494 7f9dc48d3700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:54:17 data.la.net.ua bash[18112]: 2015-11-10 10:54:17.086940 7f45a9105700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:55:17 data.la.net.ua bash[18112]: 2015-11-10 10:55:17.547123 7f6c48f50700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:56:32 data.la.net.ua bash[18112]: 2015-11-10 10:56:32.558378 7f2bf0a70700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:57:34 data.la.net.ua bash[18112]: 2015-11-10 10:57:34.534306 7fc69b42c700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:58:37 data.la.net.ua bash[18112]: 2015-11-10 10:58:37.061903 7fea3de23700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:59:52 data.la.net.ua bash[18112]: 2015-11-10 10:59:52.579594 7fe23b468700 -1 MDSIOContextBase: blacklisted! Restarting... === Any idea? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
disabling buffer::raw crc cache
Hello, Guys! While running CPU bound 4k block workload, I found that disabling crc cache in the buffer::raw gives around 7% performance improvement. If there is no strong use case which benefit from that cache, we would remove it entirely, otherwise conditionally enable it based on the object size. ‹ Evgeniy -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Permanent MDS restarting under load
Hello. We have CephFS deployed over Ceph cluster (0.94.5). We experience constant MDS restarting under high IOPS workload (e.g. rsyncing lots of small mailboxes from another storage to CephFS using ceph-fuse client). First, cluster health goes to HEALTH_WARN state with the following disclaimer: === mds0: Behind on trimming (321/30) === Also, slow requests start to appear: === 2 requests are blocked > 32 sec === Then, after a while, one of MDSes fails with the following log: === лис 10 16:07:41 baikal bash[10122]: 2015-11-10 16:07:41.915540 7f2484f13700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 16:07:41 baikal bash[10122]: starting mds.baikal at :/0 лис 10 16:07:42 baikal bash[10122]: 2015-11-10 16:07:42.003189 7f82b477e7c0 -1 mds.-1.0 log_to_monitors {default=true} === I guess writing lots of small files bloats MDS log, and MDS doesn't catch trimming in time. That's why it is marked as failed and is replaced by standby MDS. We tried to limit mds_log_max_events to 30 events, but that caused MDS to fail very quickly with the following stacktrace: === Stacktrace: https://gist.github.com/4c8a89682e81b0049f3e === Is that normal situation, or one could rate-limit client requests? May be there should be additional knobs to tune CephFS for handling such a workload? Cluster info goes below. CentOS 7.1, Ceph 0.94.5. Cluster maps: === osdmap e5894: 20 osds: 20 up, 20 in pgmap v8959901: 1024 pgs, 12 pools, 5156 GB data, 23074 kobjects 20101 GB used, 30468 GB / 50570 GB avail 1024 active+clean === CephFS list: === name: myfs, metadata pool: mds_meta_storage, data pools: [mds_xattrs_storage fs_samba fs_pbx fs_misc fs_web fs_mail fs_ott ] === Both MDS data and metadata pools are located on PCI-E SSDs: === -9 0.44800 root pcie-ssd -7 0.22400 host data-pcie-ssd 7 0.22400 osd.7 up 1.0 1.0 -8 0.22400 host baikal-pcie-ssd 6 0.22400 osd.6 up 1.0 1.0 pool 20 'mds_meta_storage' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4333 flags hashpspool stripe_width 0 pool 21 'mds_xattrs_storage' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4337 flags hashpspool crash_replay_interval 45 stripe_width 0 mds_meta_storage 20 37422k 0 169G 234714 mds_xattrs_storage 21 0 0 169G 11271588 rule pcie-ssd { ruleset 2 type replicated min_size 1 max_size 2 step take pcie-ssd step chooseleaf firstn 0 type host step emit } === There is 1 active MDS as well as 1 stand-by MDS: === mdsmap e9035: 1/1/1 up {0=data=up:active}, 1 up:standby === Also we have 10 OSDs on HDDs for additional data pools: === -6 37.0 root sata-hdd-misc -4 18.5 host data-sata-hdd-misc 1 3.7 osd.1 up 1.0 1.0 3 3.7 osd.3 up 1.0 1.0 4 3.7 osd.4 up 1.0 1.0 5 3.7 osd.5 up 1.0 1.0 10 3.7 osd.10up 1.0 1.0 -5 18.5 host baikal-sata-hdd-misc 0 3.7 osd.0 up 1.0 1.0 11 3.7 osd.11up 1.0 1.0 12 3.7 osd.12up 1.0 1.0 13 3.7 osd.13up 1.0 1.0 14 3.7 osd.14up 1.0 1.0 fs_samba 22 2162G 4.28 3814G 1168619 fs_pbx 23 1551G 3.07 3814G 3908813 fs_misc24 436G 0.86 3814G 112114 fs_web 25 58642M 0.11 3814G 378946 fs_mail26 442G 0.88 3814G 6414073 fs_ott 27 0 0 3814G 0 rule sata-hdd-misc { ruleset 4 type replicated min_size 1 max_size 4 step take sata-hdd-misc step choose firstn 2 type host step chooseleaf firstn 2 type osd step emit } === CephFS folders pool affinity is done via setfattr. For example: === # file: mail ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=fs_mail" === -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to
infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting
Hi, I'm trying to build infernalis packages on debian jessie, and I have this error on package build dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting I think it's related to lltng change from here https://github.com/ceph/ceph/pull/6135 Maybe is it missing an option in debian rules to generate libos_tp.so ? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html