Re: infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting

2015-11-10 Thread Alexandre DERUMIER
Sorry,my fault, I had an old --without-lttng flag in my build packages.


- Mail original -
De: "aderumier" 
À: "ceph-devel" 
Envoyé: Mardi 10 Novembre 2015 15:06:19
Objet: infernalis build package on debian jessie : dh_install: ceph missing 
files (usr/lib/libos_tp.so.*), aborting

Hi, 

I'm trying to build infernalis packages on debian jessie, 
and I have this error on package build 


dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting 


I think it's related to lltng change from here 

https://github.com/ceph/ceph/pull/6135 


Maybe is it missing an option in debian rules to generate libos_tp.so ? 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


data-at-rest compression

2015-11-10 Thread Igor Fedotov

Hi All,

a while ago we had some conversations here about adding compression 
support for EC pools.

Here is corresponding pull request implementing this feature:

https://github.com/ceph/ceph/pull/6524/commits

Appropriate blueprint is at:
http://tracker.ceph.com/projects/ceph/wiki/Rados_-_at-rest_compression

All comments and reviews are highly appreciated.

Thanks,
Igor.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
Hi Sam,

I crafted a custom query that could be used as a replacement for the backlog 
plugin

   http://tracker.ceph.com/projects/ceph/issues?query_id=86

It displays issues that are features or tasks, grouped by target version and 
ordered by priority.

I also created a v10.0.0 version so we can assign features we want for this 
next version to it.

If you feel that's not good enough, we can just throw it away, it's merely a 
proposal ;-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre




signature.asc
Description: OpenPGP digital signature


why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread 池信泽
hi, all:

 op_wq is declared as ShardedThreadPool::ShardedWQ < pair  > _wq. I do not know why we should use PGRef in this?

 Because the overhead of the smart pointer is not small. Maybe the
raw point PG* is also OK?

 If op_wq is changed to ShardedThreadPool::ShardedWQ < pair  > _wq (using raw point)

 the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us

 the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us

 Is this make sense to you?

-- 
Regards,
xinze
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-)

On 10/11/2015 16:28, Loic Dachary wrote:
> Hi Sam,
> 
> I crafted a custom query that could be used as a replacement for the backlog 
> plugin
> 
>http://tracker.ceph.com/projects/ceph/issues?query_id=86
> 
> It displays issues that are features or tasks, grouped by target version and 
> ordered by priority.
> 
> I also created a v10.0.0 version so we can assign features we want for this 
> next version to it.
> 
> If you feel that's not good enough, we can just throw it away, it's merely a 
> proposal ;-)
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary


On 10/11/2015 16:34, Loic Dachary wrote:
> But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-)

It appears to be a crippled version of a proprietary product 
http://www.redminecrm.com/projects/agile/pages/last

My vote would be to de-install it since it is even less flexible to use than 
the custom query below. It is disapointing to loose a plugin because it is no 
longer maintained, but that's not something we can always forsee. IMHO, relying 
on a proprietary redmine plugin is not a safe bet and it would be wise to not 
become dependent on it.

Cheers

> On 10/11/2015 16:28, Loic Dachary wrote:
>> Hi Sam,
>>
>> I crafted a custom query that could be used as a replacement for the backlog 
>> plugin
>>
>>http://tracker.ceph.com/projects/ceph/issues?query_id=86
>>
>> It displays issues that are features or tasks, grouped by target version and 
>> ordered by priority.
>>
>> I also created a v10.0.0 version so we can assign features we want for this 
>> next version to it.
>>
>> If you feel that's not good enough, we can just throw it away, it's merely a 
>> proposal ;-)
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread Gregory Farnum
On Tue, Nov 10, 2015 at 7:19 AM, 池信泽  wrote:
> hi, all:
>
>  op_wq is declared as ShardedThreadPool::ShardedWQ < pair  OpRequestRef> > _wq. I do not know why we should use PGRef in this?
>
>  Because the overhead of the smart pointer is not small. Maybe the
> raw point PG* is also OK?
>
>  If op_wq is changed to ShardedThreadPool::ShardedWQ < pair  OpRequestRef> > _wq (using raw point)
>
>  the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us
>
>  the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us
>
>  Is this make sense to you?

In general we use PGRefs rather than PG pointers. I think we actually
rely on the references here to keep the PG from going out of scope at
an inopportune time, but if it halves the cost of queuing actions it
might be worth the effort of avoiding that.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Preparing infernalis v9.2.1

2015-11-10 Thread Loic Dachary
Hi Abhishek,

I created the issue to track the progress of infernalis v9.2.1 at 
http://tracker.ceph.com/issues/13750 and assigned it to you. There are a dozen 
issues waiting to be backported and another dozen waiting to be tested in an 
integration branch. 

Good luck with driving your first point release :-)

Enjoy Diwali !

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [ceph-users] Permanent MDS restarting under load

2015-11-10 Thread Gregory Farnum
On Tue, Nov 10, 2015 at 6:32 AM, Oleksandr Natalenko
 wrote:
> Hello.
>
> We have CephFS deployed over Ceph cluster (0.94.5).
>
> We experience constant MDS restarting under high IOPS workload (e.g.
> rsyncing lots of small mailboxes from another storage to CephFS using
> ceph-fuse client). First, cluster health goes to HEALTH_WARN state with the
> following disclaimer:
>
> ===
> mds0: Behind on trimming (321/30)
> ===
>
> Also, slow requests start to appear:
>
> ===
> 2 requests are blocked > 32 sec
> ===

Which requests are they? Are these MDS operations or OSD ones?

>
> Then, after a while, one of MDSes fails with the following log:
>
> ===
> лис 10 16:07:41 baikal bash[10122]: 2015-11-10 16:07:41.915540 7f2484f13700
> -1 MDSIOContextBase: blacklisted!  Restarting...
> лис 10 16:07:41 baikal bash[10122]: starting mds.baikal at :/0
> лис 10 16:07:42 baikal bash[10122]: 2015-11-10 16:07:42.003189 7f82b477e7c0
> -1 mds.-1.0 log_to_monitors {default=true}
> ===

So that "blacklisted" means that the monitors decided the MDS was
nonresponsive, failed over to another daemon, and blocked this one off
from the cluster.

> I guess writing lots of small files bloats MDS log, and MDS doesn't catch
> trimming in time. That's why it is marked as failed and is replaced by
> standby MDS. We tried to limit mds_log_max_events to 30 events, but that
> caused MDS to fail very quickly with the following stacktrace:
>
> ===
> Stacktrace: https://gist.github.com/4c8a89682e81b0049f3e
> ===
>
> Is that normal situation, or one could rate-limit client requests? May be
> there should be additional knobs to tune CephFS for handling such a
> workload?

Yeah, the MDS doesn't really do a good job back-pressuring clients
right now when it or the OSDs aren't keeping up with the workload.
That's something we need to work on once fsck stuff is behaving. rsync
is also (sadly) a workload that frequently exposes these problems, but
I'm not used to seeing the MDS daemon get stuck quite that quickly.
How frequently is it actually getting swapped?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


non-fast-forward merges prevented for some branches in GitHub

2015-11-10 Thread Ken Dreyer
GitHub.com now has an option in its UI for users to "protect" certain branches.

I've enabled the "Disable force-pushes to this branch and prevent it
from being deleted" setting for the following repos and branches:

ceph.git and ceph-qa-suite.git:
- "master"
- "jewel"
- "infernalis"
- "hammer"
- "firefly"

ceph-deploy.git and teuthology.git:
- "master"

If we ever have to force-push in an emergency we can disable this in
GitHub's UI, eg https://github.com/ceph/ceph/settings/branches .
Otherwise, in normal operation this will prevent certain branches from
going backwards in time by accident.

- Ken
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread 池信泽
I wonder if we want to keep the PG from going out of scope at an
inopportune time, why snap_trim_queue and scrub_queue declared as
xlist instead of xlist?

2015-11-11 2:28 GMT+08:00 Gregory Farnum :
> On Tue, Nov 10, 2015 at 7:19 AM, 池信泽  wrote:
>> hi, all:
>>
>>  op_wq is declared as ShardedThreadPool::ShardedWQ < pair > OpRequestRef> > _wq. I do not know why we should use PGRef in this?
>>
>>  Because the overhead of the smart pointer is not small. Maybe the
>> raw point PG* is also OK?
>>
>>  If op_wq is changed to ShardedThreadPool::ShardedWQ < pair > OpRequestRef> > _wq (using raw point)
>>
>>  the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 1.89us
>>
>>  the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 1.65us
>>
>>  Is this make sense to you?
>
> In general we use PGRefs rather than PG pointers. I think we actually
> rely on the references here to keep the PG from going out of scope at
> an inopportune time, but if it halves the cost of queuing actions it
> might be worth the effort of avoiding that.
> -Greg



-- 
Regards,
xinze
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread Gregory Farnum
The xlist has means of efficiently removing entries from a list. I
think you'll find those in the path where we start tearing down a PG,
and membership on this list is a bit different from membership in the
ShardedThreadPool. It's all about the particulars of each design, and
I don't have that in my head — you'd need to examine it.
-Greg

On Tue, Nov 10, 2015 at 4:20 PM, 池信泽  wrote:
> I wonder if we want to keep the PG from going out of scope at an
> inopportune time, why snap_trim_queue and scrub_queue declared as
> xlist instead of xlist?
>
> 2015-11-11 2:28 GMT+08:00 Gregory Farnum :
>> On Tue, Nov 10, 2015 at 7:19 AM, 池信泽  wrote:
>>> hi, all:
>>>
>>>  op_wq is declared as ShardedThreadPool::ShardedWQ < pair >> OpRequestRef> > _wq. I do not know why we should use PGRef in this?
>>>
>>>  Because the overhead of the smart pointer is not small. Maybe the
>>> raw point PG* is also OK?
>>>
>>>  If op_wq is changed to ShardedThreadPool::ShardedWQ < pair >> OpRequestRef> > _wq (using raw point)
>>>
>>>  the latency for PrioritizedQueue:;enqueue decrease from 3.38us -> 
>>> 1.89us
>>>
>>>  the latency for PrioritizedQueue:;dequeue decrease from 3.44us -> 
>>> 1.65us
>>>
>>>  Is this make sense to you?
>>
>> In general we use PGRefs rather than PG pointers. I think we actually
>> rely on the references here to keep the PG from going out of scope at
>> an inopportune time, but if it halves the cost of queuing actions it
>> might be worth the effort of avoiding that.
>> -Greg
>
>
>
> --
> Regards,
> xinze
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] v9.2.0 Infernalis released

2015-11-10 Thread Alfredo Deza
On Sun, Nov 8, 2015 at 10:41 PM, Alexandre DERUMIER  wrote:
> Hi,
>
> debian repository seem to miss librbd1 package for debian jessie
>
> http://download.ceph.com/debian-infernalis/pool/main/c/ceph/
>
> (ubuntu trusty librbd1 is present)

This is now fixed and should be now available.

>
>
> - Mail original -
> De: "Sage Weil" 
> À: ceph-annou...@ceph.com, "ceph-devel" , 
> "ceph-users" , ceph-maintain...@ceph.com
> Envoyé: Vendredi 6 Novembre 2015 23:05:54
> Objet: [ceph-users] v9.2.0 Infernalis released
>
> [I'm going to break my own rule and do this on a Friday only because this
> has been built and in the repos for a couple of days now; I've just been
> traveling and haven't had time to announce it.]
>
> This major release will be the foundation for the next stable series.
> There have been some major changes since v0.94.x Hammer, and the
> upgrade process is non-trivial. Please read these release notes carefully.
>
> Major Changes from Hammer
> -
>
> - General:
>
> * Ceph daemons are now managed via systemd (with the exception of
> Ubuntu Trusty, which still uses upstart).
> * Ceph daemons run as 'ceph' user instead root.
> * On Red Hat distros, there is also an SELinux policy.
>
> - RADOS:
>
> * The RADOS cache tier can now proxy write operations to the base
> tier, allowing writes to be handled without forcing migration of
> an object into the cache.
> * The SHEC erasure coding support is no longer flagged as
> experimental. SHEC trades some additional storage space for faster
> repair.
> * There is now a unified queue (and thus prioritization) of client
> IO, recovery, scrubbing, and snapshot trimming.
> * There have been many improvements to low-level repair tooling
> (ceph-objectstore-tool).
> * The internal ObjectStore API has been significantly cleaned up in order
> to faciliate new storage backends like NewStore.
>
> - RGW:
>
> * The Swift API now supports object expiration.
> * There are many Swift API compatibility improvements.
>
> - RBD:
>
> * The ``rbd du`` command shows actual usage (quickly, when
> object-map is enabled).
> * The object-map feature has seen many stability improvements.
> * Object-map and exclusive-lock features can be enabled or disabled
> dynamically.
> * You can now store user metadata and set persistent librbd options
> associated with individual images.
> * The new deep-flatten features allows flattening of a clone and all
> of its snapshots. (Previously snapshots could not be flattened.)
> * The export-diff command command is now faster (it uses aio). There is also
> a new fast-diff feature.
> * The --size argument can be specified with a suffix for units
> (e.g., ``--size 64G``).
> * There is a new ``rbd status`` command that, for now, shows who has
> the image open/mapped.
>
> - CephFS:
>
> * You can now rename snapshots.
> * There have been ongoing improvements around administration, diagnostics,
> and the check and repair tools.
> * The caching and revocation of client cache state due to unused
> inodes has been dramatically improved.
> * The ceph-fuse client behaves better on 32-bit hosts.
>
> Distro compatibility
> 
>
> We have decided to drop support for many older distributions so that we can
> move to a newer compiler toolchain (e.g., C++11). Although it is still 
> possible
> to build Ceph on older distributions by installing backported development 
> tools,
> we are not building and publishing release packages for ceph.com.
>
> We now build packages for:
>
> * CentOS 7 or later. We have dropped support for CentOS 6 (and other
> RHEL 6 derivatives, like Scientific Linux 6).
> * Debian Jessie 8.x or later. Debian Wheezy 7.x's g++ has incomplete
> support for C++11 (and no systemd).
> * Ubuntu Trusty 14.04 or later. Ubuntu Precise 12.04 is no longer
> supported.
> * Fedora 22 or later.
>
> Upgrading from Firefly
> --
>
> Upgrading directly from Firefly v0.80.z is not recommended. It is
> possible to do a direct upgrade, but not without downtime. We
> recommend that clusters are first upgraded to Hammer v0.94.4 or a
> later v0.94.z release; only then is it possible to upgrade to
> Infernalis 9.2.z for an online upgrade (see below).
>
> To do an offline upgrade directly from Firefly, all Firefly OSDs must
> be stopped and marked down before any Infernalis OSDs will be allowed
> to start up. This fencing is enforced by the Infernalis monitor, so
> use an upgrade procedure like:
>
> 1. Upgrade Ceph on monitor hosts
> 2. Restart all ceph-mon daemons
> 3. Upgrade Ceph on all OSD hosts
> 4. Stop all ceph-osd daemons
> 5. Mark all OSDs down with something like::
> ceph osd down `seq 0 1000`
> 6. Start all ceph-osd daemons
> 7. Upgrade and restart remaining daemons (ceph-mds, radosgw)
>
> Upgrading from Hammer
> -
>
> * For all distributions that support systemd (CentOS 7, Fedora, Debian

[CEPH][Crush][Tunables] issue when updating tunables

2015-11-10 Thread ghislain.chevalier
Hi all,

Context:
Firefly 0.80.9
Ubuntu 14.04.1
Almost a production platform  in an openstack environment
176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 
rooms, 3 monitors on openstack controllers
Usage: Rados Gateway for object service and RBD as back-end for Cinder and 
Glance

The Ceph cluster was installed by Mirantis procedures (puppet/fuel/ceph-deploy):

I noticed that tunables were curiously set.
ceph  osd crush show-tunables ==>
{ "choose_local_tries": 0,
  "choose_local_fallback_tries": 0,
  "choose_total_tries": 50,
  "chooseleaf_descend_once": 1,
  "chooseleaf_vary_r": 1,
  "straw_calc_version": 1,
  "profile": "unknown",
  "optimal_tunables": 0,
  "legacy_tunables": 0,
  "require_feature_tunables": 1,
  "require_feature_tunables2": 1,
  "require_feature_tunables3": 1,
  "has_v2_rules": 0,
  "has_v3_rules": 0}

I tried to update them
ceph  osd crush tunables optimal ==>
adjusted tunables profile to optimal

But when checking
ceph  osd crush show-tunables ==>
{ "choose_local_tries": 0,
  "choose_local_fallback_tries": 0,
  "choose_total_tries": 50,
  "chooseleaf_descend_once": 1,
  "chooseleaf_vary_r": 1,
  "straw_calc_version": 1,
  "profile": "unknown",
  "optimal_tunables": 0,
  "legacy_tunables": 0,
  "require_feature_tunables": 1,
  "require_feature_tunables2": 1,
  "require_feature_tunables3": 1,
  "has_v2_rules": 0,
  "has_v3_rules": 0}

Nothing has changed.

I finally did
ceph osd crush set-tunable straw_calc_version 0

and
ceph  osd crush show-tunables ==>
{ "choose_local_tries": 0,
  "choose_local_fallback_tries": 0,
  "choose_total_tries": 50,
  "chooseleaf_descend_once": 1,
  "chooseleaf_vary_r": 1,
  "straw_calc_version": 0,
  "profile": "firefly",
  "optimal_tunables": 1,
  "legacy_tunables": 0,
  "require_feature_tunables": 1,
  "require_feature_tunables2": 1,
  "require_feature_tunables3": 1,
  "has_v2_rules": 0,
  "has_v3_rules": 0}

It's OK

My question:
Does the "ceph osd crush tunables " command change all the requested 
parameters in order to set the tunables to the right profile?

Brgds

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorization.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange shall not be liable if this 
message was modified, changed or falsified.
Thank you.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to modify affiliation?

2015-11-10 Thread Loic Dachary
Hi,

You can submit a patch to 
https://github.com/ceph/ceph/blob/master/.organizationmap

Cheers

On 10/11/2015 09:21, chen kael wrote:
> Hi,ceph-dev
>  who can tell me how to modify my affiliation?
>  Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-10 Thread Sage Weil
On Tue, 10 Nov 2015, ghislain.cheval...@orange.com wrote:
> Hi all,
> 
> Context:
> Firefly 0.80.9
> Ubuntu 14.04.1
> Almost a production platform  in an openstack environment
> 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 
> rooms, 3 monitors on openstack controllers
> Usage: Rados Gateway for object service and RBD as back-end for Cinder and 
> Glance
> 
> The Ceph cluster was installed by Mirantis procedures 
> (puppet/fuel/ceph-deploy):
> 
> I noticed that tunables were curiously set.
> ceph  osd crush show-tunables ==>
> { "choose_local_tries": 0,
>   "choose_local_fallback_tries": 0,
>   "choose_total_tries": 50,
>   "chooseleaf_descend_once": 1,
>   "chooseleaf_vary_r": 1,
>   "straw_calc_version": 1,
>   "profile": "unknown",
>   "optimal_tunables": 0,
>   "legacy_tunables": 0,
>   "require_feature_tunables": 1,
>   "require_feature_tunables2": 1,
>   "require_feature_tunables3": 1,
>   "has_v2_rules": 0,
>   "has_v3_rules": 0}
> 
> I tried to update them
> ceph  osd crush tunables optimal ==>
> adjusted tunables profile to optimal
> 
> But when checking
> ceph  osd crush show-tunables ==>
> { "choose_local_tries": 0,
>   "choose_local_fallback_tries": 0,
>   "choose_total_tries": 50,
>   "chooseleaf_descend_once": 1,
>   "chooseleaf_vary_r": 1,
>   "straw_calc_version": 1,
>   "profile": "unknown",
>   "optimal_tunables": 0,
>   "legacy_tunables": 0,
>   "require_feature_tunables": 1,
>   "require_feature_tunables2": 1,
>   "require_feature_tunables3": 1,
>   "has_v2_rules": 0,
>   "has_v3_rules": 0}
> 
> Nothing has changed.
> 
> I finally did
> ceph osd crush set-tunable straw_calc_version 0

You actually want straw_calc_version 1.  This is just confusing output 
from the 'firefly' tunable detection... the straw_calc_version does not 
have any client dependencies.

sage


> 
> and
> ceph  osd crush show-tunables ==>
> { "choose_local_tries": 0,
>   "choose_local_fallback_tries": 0,
>   "choose_total_tries": 50,
>   "chooseleaf_descend_once": 1,
>   "chooseleaf_vary_r": 1,
>   "straw_calc_version": 0,
>   "profile": "firefly",
>   "optimal_tunables": 1,
>   "legacy_tunables": 0,
>   "require_feature_tunables": 1,
>   "require_feature_tunables2": 1,
>   "require_feature_tunables3": 1,
>   "has_v2_rules": 0,
>   "has_v3_rules": 0}
> 
> It's OK
> 
> My question:
> Does the "ceph osd crush tunables " command change all the requested 
> parameters in order to set the tunables to the right profile?
> 
> Brgds
> 
> _
> 
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
> 
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to modify affiliation?

2015-11-10 Thread chen kael
Hi,ceph-dev
 who can tell me how to modify my affiliation?
 Thanks!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NIC with Erasure offload feature support and Ceph

2015-11-10 Thread Mike Almateia

03-Nov-15 18:07, Gregory Farnum пишет:

On Tue, Nov 3, 2015 at 3:15 AM, Mike  wrote:

Hello!

In our project we planing build a petabayte cluster with Erasure pool.
Also we looking on Mellanox ConnectX-4 Lx EN Cards/ConnectX-4 EN Cards
for using its a offloading erasure code feature.

Someone use this feature in test lab/prodaction?


Nope. Ceph's erasure coding is very configurable (in terms of what
kind of EC it's doing) but the offload features in NICs that we've
seen aren't quite flexible enough for what Ceph is doing — it's an
unusual use case and set of requirements where these offload cards are
concerned. (We need to take an incoming stream, look at the raw
stream, then erasure code it into an unknown set of pieces, and then
send those pieces back out over the network to different addresses.)
-Greg



Thanks for reply.
Mellanox said that offload Erasure Code (RedSolomon algorithm) feature 
support

on NICs will be released around April 2016.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


why keep and update rollback info for ReplicatedPG?

2015-11-10 Thread Ning Yao
Hi, all

As I know, rollback is designed for ec-backend to rollback the partial
committed transaction like append, stash and attrs.
So why do we need to keep and update (can_rollback_to,
rollback_info_trimmed_to) every time in _write_log() for
ReplicatedBackend? Or it is related to other issues?
We may avoid frequently updating those information based on the pool types?

Regards
Ning Yao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: a home for backport snippets

2015-11-10 Thread Loic Dachary
Hi,

The new snippets home is at https://pypi.python.org/pypi/ceph-workbench and 
http://ceph-workbench.dachary.org/root/ceph-workbench.

The first snippet was merged by Nathan yesterday[1], the backport documentation 
updated accordingly[2], and I used it after merging half a dozen hammer 
backport that were approved a few days ago.

Integration tests should provide the best help against regression we can hope 
for (they spawn a redmine instance every time they run and use a dedicated 
github user to create and destroy projects, pull requests etc.) and they are 
run on every merge request[3]. When integrated in ceph-workbench, the snippet 
is documented[4] and the implementation[5] is tested in full[6]. The merits of 
100% coverage are often disputed as overkill. IMHO it's better to remove an 
untested line of code rather than taking the chance that it grows into 
something that does not work (or possibly never worked). In the case of this 
snippet, there is a dozen of safe guards and four lines of code to modify the 
issue. It would be bad to discover, after modifying hundreds of issues in the 
Ceph tracker, that it never worked as expected. I'm sure we'll find ways to 
*not* do the right thing even with integration tests. But we'll hopefully do 
the right thing more often ;-)

I'm not sure how much time it will take us to convert all the snippets we have, 
but it does not matter much as we can keep doing things manually in the 
meantime.

Cheers

P.S. We are using a GitLab instance, with an integrated CI, instead of github 
with a CI on jenkins.ceph.com roughly for the same reasons puppet-ceph is in 
https://github.com/openstack/puppet-ceph and uses the OpenStack gates. We have 
no expertise on jenkins-job-builder[7] and the learning curve is perceived as 
significantly higher than a GitLab with an integrated CI[8]. We also want to 
share administrative permissions on the CI with all members of the stable 
release team to share the maintenance workload.

[1] backport-set-release 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8
[2] Resolving an issue 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_merge_commits_from_the_integration_branch#Resolving-the-matching-issue
[3] Continuous integration 
http://ceph-workbench.dachary.org/dachary/ceph-workbench/builds/53
[4] Documentation 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#9f3ebf1fc38506b66593397f3baac514d515c496_73_75
[5] Implementation 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#070f4537c6cef8a2dacef1911a7d39acd0ce1387_0_75
[6] Testing 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#66bd83c5111f0ccc884ad791c4acaa926ab52c2a_0_64
[7] Jenkins Job Builder http://docs.openstack.org/infra/jenkins-job-builder/ 
[8] Configuration of your builds with .gitlab-ci.yml 
http://doc.gitlab.com/ci/yaml/README.html

On 05/11/2015 14:20, Loic Dachary wrote:
> Hi,
> 
> Today, Nathan and I briefly discussed the idea of collecting the backport 
> snippets that are archived in the wiki at 
> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO. We all have copies 
> on our local disks and although they don't diverge much, this is not very 
> sustainable. It was really good as we established the backport workflows. And 
> it would have been immensely painful to maintain a proper software while we 
> were changing the workflow on a regular basis. But it looks like we now have 
> something stable.
> 
> Early this year ceph-workbench[1] was started with the idea of helping with 
> backports. It is a mostly empty shell we can now use to collect all the 
> snippets we have. Instead of adding set-release[2] to the script directory of 
> Ceph, it would be a subcommand of ceph-workbench, like so:
> 
>   ceph-workbench set-release --token $github_token --key $redmine_key
> 
> What do you think ?
> 
> Cheers
> 
> [1] https://pypi.python.org/pypi/ceph-workbench
> [2] https://github.com/ceph/ceph/pull/6466
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [ceph-users] Permanent MDS restarting under load

2015-11-10 Thread Oleksandr Natalenko


10.11.2015 22:38, Gregory Farnum wrote:


Which requests are they? Are these MDS operations or OSD ones?


Those requests appeared in ceph -w output and are the follows:

https://gist.github.com/5045336f6fb7d532138f

Is that correct that there are OSD operations blocked? osd.3 is one of 
data pool HDDs, and other OSDs also appear in slow requests warning 
besides osd.3 as well.


I guess that may be related to replica 4 setup of our cluster and only 5 
OSDs for each host. But we plan to add 6 more OSDs to each host after 
data migration is finished. Could that help in spreading load?



So that "blacklisted" means that the monitors decided the MDS was
nonresponsive, failed over to another daemon, and blocked this one off
from the cluster.


So, one could adjust blacklist timeout, but there is no way to 
rate-limit requests? Am I correct?



Yeah, the MDS doesn't really do a good job back-pressuring clients
right now when it or the OSDs aren't keeping up with the workload.
That's something we need to work on once fsck stuff is behaving. rsync
is also (sadly) a workload that frequently exposes these problems, but
I'm not used to seeing the MDS daemon get stuck quite that quickly.
How frequently is it actually getting swapped?


Quite often. MDSes are swapped once per 1 minute or so under heavy load:

===
лис 10 10:40:47 data.la.net.ua bash[18112]: 2015-11-10 10:40:47.357633 
7f76c42e2700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:41:49 data.la.net.ua bash[18112]: 2015-11-10 10:41:49.237962 
7f1a939af700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:43:14 data.la.net.ua bash[18112]: 2015-11-10 10:43:14.899375 
7f17f6eaa700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:44:11 data.la.net.ua bash[18112]: 2015-11-10 10:44:11.810116 
7f693b64c700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:45:14 data.la.net.ua bash[18112]: 2015-11-10 10:45:14.761684 
7f7616097700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:46:35 data.la.net.ua bash[18112]: 2015-11-10 10:46:35.927190 
7fdfb7f62700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:47:41 data.la.net.ua bash[18112]: 2015-11-10 10:47:41.888064 
7fb88139b700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:49:57 data.la.net.ua bash[18112]: 2015-11-10 10:49:57.542545 
7fbb360eb700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:51:02 data.la.net.ua bash[18112]: 2015-11-10 10:51:02.486907 
7fb488fa1700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:52:03 data.la.net.ua bash[18112]: 2015-11-10 10:52:03.871463 
7f4cc0236700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:53:20 data.la.net.ua bash[18112]: 2015-11-10 10:53:20.290494 
7f9dc48d3700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:54:17 data.la.net.ua bash[18112]: 2015-11-10 10:54:17.086940 
7f45a9105700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:55:17 data.la.net.ua bash[18112]: 2015-11-10 10:55:17.547123 
7f6c48f50700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:56:32 data.la.net.ua bash[18112]: 2015-11-10 10:56:32.558378 
7f2bf0a70700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:57:34 data.la.net.ua bash[18112]: 2015-11-10 10:57:34.534306 
7fc69b42c700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:58:37 data.la.net.ua bash[18112]: 2015-11-10 10:58:37.061903 
7fea3de23700 -1 MDSIOContextBase: blacklisted!  Restarting...
лис 10 10:59:52 data.la.net.ua bash[18112]: 2015-11-10 10:59:52.579594 
7fe23b468700 -1 MDSIOContextBase: blacklisted!  Restarting...

===

Any idea?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


disabling buffer::raw crc cache

2015-11-10 Thread Evgeniy Firsov
Hello, Guys!

While running CPU bound 4k block workload, I found that disabling crc
cache in the buffer::raw gives around 7% performance improvement.

If there is no strong use case which benefit from that cache, we would
remove it entirely, otherwise conditionally enable it based on the object
size.

‹
Evgeniy

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Permanent MDS restarting under load

2015-11-10 Thread Oleksandr Natalenko

Hello.

We have CephFS deployed over Ceph cluster (0.94.5).

We experience constant MDS restarting under high IOPS workload (e.g. 
rsyncing lots of small mailboxes from another storage to CephFS using 
ceph-fuse client). First, cluster health goes to HEALTH_WARN state with 
the following disclaimer:


===
mds0: Behind on trimming (321/30)
===

Also, slow requests start to appear:

===
2 requests are blocked > 32 sec
===

Then, after a while, one of MDSes fails with the following log:

===
лис 10 16:07:41 baikal bash[10122]: 2015-11-10 16:07:41.915540 
7f2484f13700 -1 MDSIOContextBase: blacklisted!  Restarting...

лис 10 16:07:41 baikal bash[10122]: starting mds.baikal at :/0
лис 10 16:07:42 baikal bash[10122]: 2015-11-10 16:07:42.003189 
7f82b477e7c0 -1 mds.-1.0 log_to_monitors {default=true}

===

I guess writing lots of small files bloats MDS log, and MDS doesn't 
catch trimming in time. That's why it is marked as failed and is 
replaced by standby MDS. We tried to limit mds_log_max_events to 30 
events, but that caused MDS to fail very quickly with the following 
stacktrace:


===
Stacktrace: https://gist.github.com/4c8a89682e81b0049f3e
===

Is that normal situation, or one could rate-limit client requests? May 
be there should be additional knobs to tune CephFS for handling such a 
workload?


Cluster info goes below.

CentOS 7.1, Ceph 0.94.5.

Cluster maps:

===
 osdmap e5894: 20 osds: 20 up, 20 in
  pgmap v8959901: 1024 pgs, 12 pools, 5156 GB data, 23074 kobjects
20101 GB used, 30468 GB / 50570 GB avail
1024 active+clean
===

CephFS list:

===
name: myfs, metadata pool: mds_meta_storage, data pools: 
[mds_xattrs_storage fs_samba fs_pbx fs_misc fs_web fs_mail fs_ott ]

===

Both MDS data and metadata pools are located on PCI-E SSDs:

===
 -9  0.44800 root pcie-ssd
 -7  0.22400 host data-pcie-ssd
  7  0.22400 osd.7 up  1.0   
   1.0

 -8  0.22400 host baikal-pcie-ssd
  6  0.22400 osd.6 up  1.0   
   1.0


pool 20 'mds_meta_storage' replicated size 2 min_size 1 crush_ruleset 2 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 4333 flags 
hashpspool stripe_width 0
pool 21 'mds_xattrs_storage' replicated size 2 min_size 1 crush_ruleset 
2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4337 flags 
hashpspool crash_replay_interval 45 stripe_width 0


mds_meta_storage   20 37422k 0  169G   
234714
mds_xattrs_storage 21  0 0  169G 
11271588


rule pcie-ssd {
ruleset 2
type replicated
min_size 1
max_size 2
step take pcie-ssd
step chooseleaf firstn 0 type host
step emit
}
===

There is 1 active MDS as well as 1 stand-by MDS:

===
mdsmap e9035: 1/1/1 up {0=data=up:active}, 1 up:standby
===

Also we have 10 OSDs on HDDs for additional data pools:

===
 -6 37.0 root sata-hdd-misc
 -4 18.5 host data-sata-hdd-misc
  1  3.7 osd.1 up  1.0   
   1.0
  3  3.7 osd.3 up  1.0   
   1.0
  4  3.7 osd.4 up  1.0   
   1.0
  5  3.7 osd.5 up  1.0   
   1.0
 10  3.7 osd.10up  1.0   
   1.0

 -5 18.5 host baikal-sata-hdd-misc
  0  3.7 osd.0 up  1.0   
   1.0
 11  3.7 osd.11up  1.0   
   1.0
 12  3.7 osd.12up  1.0   
   1.0
 13  3.7 osd.13up  1.0   
   1.0
 14  3.7 osd.14up  1.0   
   1.0


fs_samba   22  2162G  4.28 3814G  
1168619
fs_pbx 23  1551G  3.07 3814G  
3908813
fs_misc24   436G  0.86 3814G   
112114
fs_web 25 58642M  0.11 3814G   
378946
fs_mail26   442G  0.88 3814G  
6414073
fs_ott 27  0 0 3814G 
   0


rule sata-hdd-misc {
ruleset 4
type replicated
min_size 1
max_size 4
step take sata-hdd-misc
step choose firstn 2 type host
step chooseleaf firstn 2 type osd
step emit
}
===

CephFS folders pool affinity is done via setfattr. For example:

===
# file: mail
ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=fs_mail"

===
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to 

infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting

2015-11-10 Thread Alexandre DERUMIER
Hi,

I'm trying to build infernalis packages on debian jessie,
and I have this error on package build


dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting


I think it's related to lltng change from here

https://github.com/ceph/ceph/pull/6135


Maybe is it missing an option in debian rules to generate libos_tp.so ?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html